Composing Images with Python for Synthetic Datasets

 An image composed of a foreground (with transparency) and a background, alongside its accompanying mask, both generated by Python.

An image composed of a foreground (with transparency) and a background, alongside its accompanying mask, both generated by Python.

Composing images with Python is fairly straight forward, but for training neural networks, we also want additional annotation information. In this tutorial, I'll teach you how to compose an object on top of a background image and generate a bit mask image for training. I've provided a few sample images to get started, but if you want to build your own synthetic image dataset, you'll obviously need to collect more images.

To learn how to create foreground cutouts of your own images, you can follow my previous tutorial.

 

Cutting Out Image Foregrounds with GIMP

Learn how to cut out image foregrounds with GIMP to use in synthetic image datasets.

 


Python Code

I've provided a full working example, with sample images, on GitHub. You will need to go there for full detail since it's just too much code to share here.

I will highlight some of the interesting parts below with commentary.

Transforming the Foreground

The PIL Image library makes simple rotations and scaling very easy. For this example, I'm choosing a random degree of rotation between 0 and 359 and randoming scaling somewhere between 50% and 100%

# Rotate the foreground
angle_degrees = random.randint(0, 359)
foreground = foreground.rotate(angle_degrees, resample=Image.BICUBIC, expand=True)

# Scale the foreground
scale = random.random() * .5 + .5 # Pick something between .5 and 1
new_size = (int(foreground.size[0] * scale), int(foreground.size[1] * scale))
foreground = foreground.resize(new_size, resample=Image.BICUBIC)


Generating the Bit Mask

The alpha mask of the foreground will become our bit mask, but first we have to paste it on a black background, just as we do for the composite image. After that, we use Numpy to calculate which pixels are above a certain threshold of transparency. I chose 200, which is about 80% opaque.

# Extract the alpha channel from the foreground and paste it into a new image the size of the background
alpha_mask = foreground.getchannel(3)
new_alpha_mask = Image.new('L', background.size, color=0)
new_alpha_mask.paste(alpha_mask, paste_position)

# Grab the alpha pixels above a specified threshold
alpha_threshold = 200
mask_arr = np.array(np.greater(np.array(new_alpha_mask), alpha_threshold), dtype=np.uint8)
hard_mask = Image.fromarray(np.uint8(mask_arr) * 255, 'L')


Calculating the Bounding Box

We use Numpy again to find the smallest and largest x, y values in the bit mask. These values define our bounding box.

# Get the smallest & largest non-zero values in each dimension and calculate the bounding box
nz = np.nonzero(hard_mask)
bbox = [np.min(nz[0]), np.min(nz[1]), np.max(nz[0]), np.max(nz[1])]

Output Images

After you've run the code, you should get a series of images and corresponding masks. I've also written code to output a csv (Comma Separated Value) file with annotations.

Notice that with only 2 foregrounds and 4 backgrounds, we're able to create plenty of variation. None of these images are real, but in my experiments, they will work very well for training neural networks! With some additional creativity on your part, you can add all sorts of other variation to your own synthetic dataset.

Hopefully this was helpful! Please let me know if anything is unclear and I will do my best to improve it.