RGB to Depth AI

Can computers get depth information from a 2D picture?

As humans, we can make a very good guess on the depth of the pixels in a single photograph. This is most likely a combination of context clues (vanishing lines, partial occlusions, etc…) and past experience (a chair is usually a certain size).

To answer this question I built a machine learning model (a U-Net) to predict the depth image from an rgb image. The training data I got by buying an intel-realsense D435 (a stereo camera than outputs depth and rgb) and recording a bunch of frames around the building.

training

sample outputs

architecture

In the demo, I decided to use a raspberry pi to control a model car to avoid obstacles while going forward using the depth information from the model.

Raspberry pis are very slow at machine learning, so I had to lower the resolution and make the car stop at every input frame, but it works!

Demo of the depth information from the model Link to heading

Demo of the autonomous car indoors Link to heading

Demo of the autonomous car outdoors (outdoors not in training data) Link to heading

code