Object Recognition with 3D Time-of-Flight Cameras and Neural Networks

Mark Patrick
Object Recognition with 3D Time-of-Flight Cameras and Neural Networks
Machine vision - the ability for computers to see and recognise the world around us - is becoming more important for a variety of fields, from IoT and manufacturing through to augmented reality.

One key aspect of machine vision is object recognition. Today two leading technologies are being used to help computers distinguish objects from each other. 3D time-of-flight (ToF) cameras rely on multiple LEDs along with near-infrared (NIR) imaging to detect 3D information about a scene, while machine learning techniques are a leading method for performing object recognition using artificial intelligence (AI), even with a 2D image. 

3D ToF Cameras

To recognise objects in a scene, it is extremely helpful to have a 3D understanding of what is going on. While human beings have the benefit of binocular vision and a brain to process what we see into a 3D environment, computers require other methods.

There are several techniques for a computer to determine 3D information about a scene, but 3D ToF distinguishes itself by being relatively affordable, yet accurate and fast enough to be used in real-time.

A ToF camera, such as the 640x480 pixel resolution, 57°x43° field-of-view Basler Time-of-Flight camera, will consist of a lens, integrated light source with multiple points of light and a sensor. This camera will be able to capture depth and intensity information for every pixel in the image.

The way it works is that pulsed or continuous waves of light are cast onto the scene (usually in the NIR spectrum), then the reflected light is observed. If using a pulsed light source, the time between the light emission and the light being reflected back to the camera is measured. Since the speed of light is known, the distance can be calculated. With a continuous wave light source the same procedure is performed, but instead of calculating time between emission and reception, the phase shift between the emitted light and reflected light is determined.  

No matter which principle is employed, as long as a light source capable of illuminating all points in the scene is implemented, it is possible to gather all depth information about a scene in one shot. This results in a point field where the depth information about each pixel in the scene is recorded.

The ToF method of capturing depth information is quite high resolution, can be accurate to 1cm, uses relatively simple algorithms suitable for processing by embedded systems, and is fast enough to be used at high frame rates. Consequently, in logistics, production, robotics and even gaming application contexts, ToF cameras have quickly developed many use cases by adding extremely valuable depth information to an image, allowing objects to be separated from each other, as well as from their background.

Machine Learning

ToF cameras are powerful, accurate, and affordable, but they are not suitable for every application scenario. There are some situations, such as outdoors, when there is strong external lighting which can overpower the image sensor and so ToF cameras cannot be used. Environments with mirrors or other objects which can create multiple reflections are also hard for ToF cameras to interpret.


Neural networks are an AI technology developed from neuroscience. Based on research about the way brains processed visual information, algorithms were developed by French computer scientist Yann Lecun. This AI technique can be applied to machine vision to help recognise objects in these situations. Neural networks have to be trained on large data sets, but once this is done they are capable of outperforming humans. Luckily, machine vision users can take advantage of existing databases with classified images to easily and quickly train their AI. 

After training, a machine vision AI can be installed on a relatively low-power embedded system. GPUs or FPGAs are then used to process image data in parallel at high speed. For embedded systems, FPGAs make the most sense as they consume less power, can be more efficient and have a more direct connection to the image data. Also, they can store high resolution images for real-time processing. To develop machine vision system like this it helps to use a development kit, like the Basler Embedded Vision kits. These include camera, lens, cable and processing board, allowing a full machine vision system to be developed without having to source components from multiple suppliers.

Seeing into the Future 

As our embedded systems get more powerful, techniques like 3D ToF and AI-driven object recognition will become increasingly accessible. Whereas today, object recognition is used mostly by large enterprise and consumer brands, these technologies will democratise object recognition and allow it to be applied to many more applications in the coming years. 

Share this page


Want more like this? Register for our newsletter
Securing the future of IoT | Rutronik
Securing the future of IoT
Co-authored by Bernd Hantsche, Head of the GDPR Team of Excellence and Marketing Director Embedded & Wireless and Richard Ward, ‎Semiconductor Marketing Manager at Rutronik.