Semantic Segmentation with Deep Learning: A guide and code

Semantic Segmentation with Deep Learning: A guide and code

Most people in the deep learning and computer vision communities understand what image classification is: we want our model to tell us what single object or scene is present in the image. Classification is very coarse and high-level.

Many are also familiar with object detection, where we try to locate and classify multiple objectswithin the image, by drawing bounding boxes around them and then classifying what’s in the box. Detection is mid-level, where we have some pretty useful and detailed information, but it’s still a bit rough since we’re only drawing bounding boxes and don’t really get an accurate idea of object shape.

Semantic Segmentation is the most informative of these three, where we wish to classify each and every pixel in the image, just like you see in the gif above! Over the past few years, this has been done entirely with deep learning.

In this guide, you’ll learn about the basic structure and workings of semantic segmentation models and all of the latest and greatest state-of-the-art methods.

If you’d like to try out the models yourself, you can checkout my Semantic Segmentation Suite, complete with TensorFlow training and testing code for many of the models in this guide!

The basic structure of semantic segmentation models that I’m about to show you is present in all state-of-the-art methods! This makes it very easy to implement different ones, since almost all of them have the same underlying backbone, setup, and flow.

The U-Net model has a great illustration of this structure. The left side of the model represents any feature extraction network trained for image classification. This includes networks like VGGNet, ResNets, DenseNets, MobileNets, and NASNets! You can really use anything you want there.

The main thing when selecting your classification network for feature extraction is to keep in mind the tradeoffs. Using a very deep ResNet152 will get you great accuracy but won’t be nearly as fast as a MobileNet. The tradeoffs that appear when applying those networks to classification also appear when using them for segmentation. The important thing to remember is that these backbones will be major drivers when designing / selecting your segmentation network, and I can’t stress that enough.

Once those features are extracted they are then further processed at different scales. The reason for this is two-fold. Firstly, your model will very likely encounter objects of many different sizes; processing the features at different scales will give the network the capacity to handle those different sizes.

Second, when performing segmentation there is a tradeoff. If you want good classification accuracy, then you’ll definitely want to process those high level features from later in the network since they are more discriminative and contain more useful semantic information. On the other hand, if you only process those deep features, you won’t get good localisation because of the low resolution!

The recent state-of-the-art methods have all followed the above structure of feature extraction followed by multi-scale processing. As such, many are quite easy to implement and train end-to-end. Your selection of which one to use will depend on your need for accuracy vs speed/memory, as all have been trying to come up with new methods of addressing this tradeoff, while maintaining efficiency.

In the following walkthrough of the state-of-the-art I’m going to focus on the latest methods, since these will be the most useful to the most readers after understanding the basic structure above. We will walkthrough in rough chronological order, which also roughly corresponds to the advancing of the state-of-the-art.

The FRRN model is a very clear example of the multi-scale processing technique. It accomplishes this using 2 separate streams: the residual stream and the pooling stream.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

5 Keys to Getting Your Big Data Transformation Back on Track

24 May, 2017

We’re almost a month into the Major League Baseball season, and every year there’s at least one fan base that …

Read more

10 Reasons Sustainability Needs To Be Part Of Your Digital Transformation Strategy

11 Nov, 2020

The pandemic is uncovering gaps and weaknesses in supply chains that require organizations to digitally transform themselves quickly starting with …

Read more

Pushing the limits of automation in business processes

7 Jan, 2018

Last summer saw a grim landmark in the history of artificial intelligence: the first death caused by an autonomous vehicle. …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.