Training an AI to Recognize Cigarette Butts

cig_butt_viz.png

The Project

Depending on whom you ask, artificial intelligence is either going to destroy the world or solve all of its problems. Wait... is that too black and white? I think so. As I've been learning the fundamentals of deep learning, I've been considering what problems it could solve (rather than how I can take over the world). One of the problems we are currently failing to solve (rather spectacularly, IMO) is controlling litter. It's everywhere and I find it disgusting. As an experiment, I decided to see if I could train a neural network to recognize cigarette butts in images.

My reasoning was that if we could spot them reliably, a small robot could be unleashed to pick them up with minimal supervision. I'm not a robotics expert (yet...), but I suspect there are a lot of robotics experts out there that could take this project further if they could reliably target garbage with an onboard camera.

My Approach

Creating a Synthetic Dataset of Cigarette Butt Images

There are plenty of open source neural networks available, so I knew that the most important part was going to be the training dataset. I also knew that there was no way I was going to manually annotate thousands of images of cigarette butts. The good thing is that cigarette butts look mostly the same with a few variations in color. The question was, could I create a fake dataset taking advantage of that? Good news: the answer is yes.

Initially, I planned to create a completely 3D rendered dataset in Blender, but after experimentation, I realized that composing 2D photos yielded better results for less work and in a fraction of the time. I'm sure that some synthetic datasets will require 3D renderings, but not this one.

My Process

First, I used my iPhone 8 to take pictures of cigarette butts on the ground from about waist  height. Then I took pictures of the ground without butts from about waist height (about 300 photos).

Second, I cut out 25 different cigarette butts and saved them as images with transparent backgrounds. I knew that I wasn't going to cover all variations with this, but I had to start small.

I made a tutorial for this: Cutting Out Image Foregrounds with GIMP

An example of a cut out cigarette butt with transparency (this one is way higher resolution than what I was actually using)

An example of a cut out cigarette butt with transparency (this one is way higher resolution than what I was actually using)

Third, I wrote a Python program to paste these images on top of the photos of the ground. I randomly rotated the cigarette butts and cropped the backgrounds to allow for a lot of variation.

I made a tutorial for this too (including source code): Composing Images with Python for Synthetic Datasets

Finally, I automatically generated 2000 training images and 200 validation images at 512x512 resolution. It took a bit less than 20 minutes to generate (not including development time). Below are some samples.

sample_training_512.png

I simultaneously generated mask images, which are used by Mask R-CNNs.

sample_training_masks_512.png

 

Training

I used Matterport's implementation of Mask R-CNN (Mask Region based Convolutional Neural Network), which is based on the Mask R-CNN published by Facebook Research.

My graphics card isn't too impressive (GeForce GTX 970), but with some tweaking, I was able to get past the initial Out of Memory errors and train my neural net to recognize cigarette butts with surprising accuracy.

Training Details

(Skip this part if you aren’t interested in AI programming) 

- I used 2000 synthetic 512x512 px training images (plus 200 synthetic validation images)
- I did ZERO training on real images
- I trained the "heads" for 4 epochs of 500 steps and the full network for another 4 epochs
- Resnet50 backbone (Resnet101 caused OOM errors) using COCO pre-trained weights
- Training took about 20 minutes on my computer
- Final losses after the 8th epoch:
loss: 0.3615 - rpn_class_loss: 0.0023 - rpn_bbox_loss: 0.1823 - mrcnn_class_loss: 0.0232 - mrcnn_bbox_loss: 0.0636 - mrcnn_mask_loss: 0.0901 - val_loss: 0.3261 - val_rpn_class_loss: 0.0013 - val_rpn_bbox_loss: 0.1875 - val_mrcnn_class_loss: 0.0043 - val_mrcnn_bbox_loss: 0.0535 - val_mrcnn_mask_loss: 0.0794

Results

It's far from perfect, and probably isn't practical to bolt onto a robot right now, but with more training and improvements in GPU hardware, I think this could be a viable solution to pick up cigarette butts on a huge scale. Check out some of the results. Colored regions are what were thought to be cigarette butts.

You will notice that it missed a few cigarette butts and it was confused by some leaves and other debris. I suspect this could be improved with more training.

Even with these issues, I'm really impressed. To try to improve further, I want to increase the variation by adding more cigarette butts. I might also try more backgrounds.

What's Next

I want to try this method to generate more synthetic datasets. I see no reason why it couldn't be expanded to lots of different types of items (including other types of litter). I'll report more results in the future. I also plan to release tutorials for others who want to try this approach to create their own, synthetic image datasets.

Update

 

Check out the tutorial I've created that will teach you how to do this yourself!

Using Mask R-CNN with a Custom COCO-like Dataset

Thanks for reading! If you liked the post and want to see more like it, please follow Immersive Limit on Facebook and @ImmersiveLimit on Twitter.