Stereo Vision System

By: Satyarth Praveen and Abhinav Rai

Stereo Vision System is inspired by the biological setup of the pair of eyes.

The intention here is to estimate the depth of the objects in a scene by taking two images of the scene from two different locations in the 3d world. This causes the objects in the scene to take up different locations in the image space. The disparity between the two locations of the same object in the two image frames is inversely proportional to the distance of the object from the camera setup.

The relationship between them is governed by the following equation:

$$z = {{f × B} \over d}$$
z = distance from the camera
f = focal length of the camera
B = baseline distance between the 2 camera units
d = disparity between the two corresponding pixels in the two images.

The main challenge here is to find the corresponding pixels of the same object in both the images, which further gives us the disparity. Once the disparity is known, we can directly use the above formula for pixel-wise distance.

A lot of techniques can be found in the literature for disparity estimation, but running them real-time on a low-powered hardware is the real challenge.

Based on the correspondence algorithm, the disparity estimation is also prone to a lot of noise. So a suitable post-processing in almost always necessary to filter out all the noisy estimations and slightly improve the disparity output.

The disparity estimation is in itself a heavy algorithm so to operate in real-time, the computation for the disparity is taken to the GPUs. Given the limited computational resources on the embedded system, the Stereo Block Matching algorithm was coded to run on the GPU using CUDA parallel processing APIs. Advanced CUDA programming techniques such as efficient memory storage, coalescing memory accesses, streams, texture memory, etc were used. A pipeline was designed to efficiently compute disparity and post-process it to achieve real-time disparity computation.

Stereo Block Matching is one of the most famous algorithms for the stereo disparity. The idea here is to consider a patch of the image from the left camera, and find the most similar patch in the right camera image.

For any stereo system to work, the two cameras must have significant overlap of the scene in their image output. Also, the two cameras must be calibrated and rectified so as to reduce the search space for finding the similar patch. After rectification, we can search for the similar patch along the same horizontal row (assuming a horizontal stereo rig). The pixel difference along the x-axis between the pixel positions of the object in the left and the right image is the disparity for that particular pixel.