Learning to Create 3d Rendered Synthetic Datasets

Glasses rendered in Blender 3D with an HDRI background from HDRI Haven.

Glasses rendered in Blender 3D with an HDRI background from HDRI Haven.

Why 3d Rendered Synthetic Datasets?

Being able to generate training datasets for advanced AI has obvious value. Manual annotation is tedious and expensive. It's also sometimes impossible. Imagine you're training an AI drone to spot cracks on a hydroelectric dam. What are you going to do? Crack the dam in a bunch of places so that you can get thousands of training examples? Doubt it. What about different times of day, weather, seasons, dirt, and degradation?

Now imagine if we could simulate cracks and render them with every combination of lighting, weather, material, at any resolution, and with any kind of lens. That suddenly feels a lot more do-able. 3d rendering has more up-front work, and I won't argue that it's never tedious. It's not nearly as tedious and un-fulfilling as manual image annotation. In my experience as a 3d artist, creating 3D renders is far more enjoyable and rewarding. The customization and power of automatic, flawless annotations mean incredible potential.

From a personal perspective, I've always loved 3d modeling and rendering. When I was a teenager, I got my hands on 3ds Max in high school (circa 2002), I poured hours into it and loved every minute. I wanted to work for Pixar. When it came time to apply to colleges, I was trying to find a good 3d animation program, but eventually ended up pursuing programming instead. Fortunately, over the past 5 years or so, I've been able to pick it back up and absolutely love it.

Before I left Microsoft, I learned about a team that was creating synthetic training data for HoloLens AI applications. I was blown away and I really wanted to work on it. Unfortunately, I wasn't able to switch to that team, so a few months later, Kayla and I were ready for a change and moved to Austin, TX for my current job at GM.

When I first started learning AI at GM, I was drawn to the computer vision portion for multiple reasons. First, I'm a visual person. I like to see what I'm working on. Second, and probably more importantly, I saw immediate use cases for computer vision in my job on the manufacturing side at General Motors. We were experimenting with robotics and it was obvious that having powerful vision AI available would have a big return on investment. The missing piece was always data, and I didn’t have the time or desire to manually annotate thousands of images needed to train an AI. Synthetic data was the obvious answer.

What I've Learned and Done So Far

Between my day job at GM and my nights/weekends working on Immersive Limit, I've actually done a fair bit of both 2d synthetic dataset and 3d rendered dataset prototyping.

I go into more detail about my 2d synthetic dataset experiments with a cigarette butt detector and weed detector in other blog posts.

For 3d, I started out with Unity, which at the time didn't have any official tools for creating synthetic data. There was this ML Image-Synthesis repo, which is super helpful for the legacy 3d rendering system, but it never got much attention after the initial release and hasn't been updated for URP or HDRP. Fortunately, Unity is working on this again in the form of the Perception SDK, which has just been released and is in active, early development. I'm excited to see it progress.

I experimented a tiny bit with photogrammetry for synthetic datasets, but my photogrammetry scans just weren't high enough quality to even bother testing. I'm holding out for better photogrammetry to go down that path, but I think it has a ton of potential. Imagine being able to 3d scan something at high fidelity and then automatically render it from any angle in any lighting conditions with any setting.

I've been working on creating synthetic datasets with Unreal Engine at GM. Unreal is able to achieve more realistic rendering than Unity (in my opinion), but the learning curve is quite a bit steeper. As an example, I'm a bit embarrassed at how long it took me to figure out how AirSim, Carla, and UnrealCV did instance segmentation. It turns out instance segmentation and semantic segmentation in Unreal Engine are not really that hard, but these open source libraries just didn't do a great job documenting how it all fits together. As a hint, you want to use the Custom Stencil pass and Post Process materials to accomplish this. I've been considering making a tutorial for this, but haven't had time. Feel free to reach out to me if you want this tutorial made sooner.

Most recently, I made a 3d rendered synthetic dataset with a pair of reading glasses in Blender. Blender is an amazing, free piece of software for 3d modeling, animating, rendering, and compositing. I think it shows a lot of promise for creating hyper-realistic synthetic datasets. The Cycles renderer combined with HDRI lighting, the Principled BSDF shader, and some incredible OptiX AI-Accelerated denoising result in some very impressive results. I was able to create realistic renders, then compose them over real photo backgrounds by using a Python library I wrote for creating 2d image datasets. It worked really well, but it felt hacked together. I know there's a lot more I can do to improve the synthetic dataset generation pipeline.

What I'm Learning and Doing Now

There's a phenomenon where the more you learn about something, the less you feel like you know. That's kinda how I'm feeling about Blender and 3d rendering right now. As part of my 3d glasses experiment, I learned about some VFX techniques that are used in the film industry, namely render passes and compositing. As I've done further research on that, I've realized that I know pretty much nothing. This is my new area of focus.

I'm going to dig in with VFX techniques but try not to go too far down the rabbit hole. If I get lost in the VFX world, I'll probably end up working on Fast and Furious 12 and Star Wars: Return of the Skywalker Force Ghosts and you'll never hear from me again.

I've already learned that you can save out various render passes and then modify them individually in a compositor. You can also automatically generate masks for each item or material. Not only that, everything in Blender can be done with Python scripting, so if I want to, I can automate most of the process. The holy grail would be to run a single command that would render a dataset, do some extreme augmentation in the compositing step, generate rich annotation data, package it all up nicely, and feed it into a neural network for training. In this fantasy world, I can run this command, go play Xbox for a few hours, and come back to a fully trained neural network!

 

My Learning Approach

I generally learn by doing. I pick a project, then learn how to do it along the way via tutorials, documentation, and web searches, but I RARELY ask for help. Even asking questions in forums is a last resort because I'm so impatient and find that usually I find the answer to my problem about 10 minutes after posting my question. I'm definitely going to do all of those things, but this time I think I might also ask for help. I know some people at Microsoft (Pedro and Jon, whom I mentioned in the 3d glasses video) and I've also been contacted by a couple of other companies who are doing synthetic data since posting that video. They all know a hell of a lot more than I do about this subject, so I'm going to see what I can learn and then share it with the world.

Only somewhat surprisingly, I'm having trouble finding good learning resources on this topic. There are some good YouTube tutorials on render passes and compositing in Blender (like this one, from blenderBinge and this one, from Dylan Neill), but they’re more targeted at the film industry. I have found a few interesting research papers and ironically, the Autodesk documentation for Arnold has been really helpful for understanding render layers in Blender.

I'll make sure to keep track of what I find and do my best to curate it for anyone else trying to follow this same path.