Skip to main content

RGB to Depth AI

·184 words·1 min·
Author
Mark Ogata
AI and Robotics Researcher at BAIR
Table of Contents

Can computers get depth information from a 2D picture?

As humans, we can make a very good guess on the depth of the pixels in a single photograph. This is most likely a combination of context clues (vanishing lines, partial occlusions, etc…) and past experience (a chair is usually a certain size).

To answer this question I built a machine learning model (a U-Net) to predict the depth image from an rgb image. The training data I got by buying an intel-realsense D435 (a stereo camera than outputs depth and rgb) and recording a bunch of frames around the building.

training

sample outputs

architecture

In the demo, I decided to use a raspberry pi to control a model car to avoid obstacles while going forward using the depth information from the model.

Raspberry pis are very slow at machine learning, so I had to lower the resolution and make the car stop at every input frame, but it works!

Demo of the depth information from the model
#

Demo of the autonomous car indoors
#

Demo of the autonomous car outdoors (outdoors not in training data)
#

code