Stereo Vision

Location Tracking and Positioning Technology using Stereo Vision

Stereo Vision captures scenes simultaneously with two cameras to analyze disparity and calculate depth information to recognize objects in 3D. This allows for precise estimation of the distance and location of people, vehicles, and assets, supporting stable positioning and real-time location tracking even in field environments.

What is Stereo Vision?

Stereo Vision is a computer vision technology that uses two cameras to capture images and estimate depth information in 3D space. To achieve this, two cameras photograph the same object from different positions, and the captured image pair is used to calculate depth information. This involves calculating the disparity between the two images and using it to estimate depth. This technology is similar to the principle of humans using two eyes to estimate depth information. It allows for the identification of object positions and distances in a 3D environment and is used in various fields such as robotics, autonomous vehicles, video games, and image processing. It provides more accurate and practical results compared to conventional computer vision technologies, playing a crucial role in various applications.

Special Features of RTLS using Stereo Vision

Real-Time Location System (RTLS) using Stereo Vision is a system that analyzes visual information collected through two cameras to track and monitor the location of objects or people in real-time. Stereo Vision enables highly accurate location inference, ensuring high precision and reliability. Furthermore, because location information can be updated in real-time, it provides critical data for logistics, manufacturing, and construction. Additionally, Stereo Vision-based RTLS does not use tags, eliminating the need to purchase and maintain additional tags or signal-generating equipment for targets. This reduces system configuration and maintenance costs, making it cost-efficient. Furthermore, the high accuracy and reliability of target tracking combine cost-effectiveness with superior system performance. Therefore, Stereo Vision-based RTLS is a highly useful technology that simultaneously achieves cost reduction and performance enhancement.

How does positioning using Stereo Vision work?

Depth Estimation

To track the location of objects, we use two images taken from different perspectives to infer depth information. This is similar to how humans perceive depth using two eyes. We utilize Deep Learning technology to recognize depth more accurately than traditional computer vision techniques.

Coordinate Calculation

By utilizing computer vision technology to perform object recognition, distance estimation, and angle calculation, the location information of an object can be computed. This enables various applications such as real-time tracking or determining positions.

Core Technologies of ORBRO's Stereo Vision-based RTLS

Original Image Streaming
To increase location tracking accuracy in RTLS, high-level Depth Map inference technology is essential. While humans can quickly and accurately perceive distance with rough outlines through experience and intuition, Depth Map technology focuses on image details, directly linking precise inference to location tracking. Therefore, we do not use lossy compressed video streaming methods typically used in general CCTV, but instead utilize lossless video to minimize the loss of low-level features. In this way, Stereo Vision-based RTLS minimizes the loss of location information.
By utilizing lossless original images, the surfaces of objects are represented more continuously in the Depth Map.
Predict Image

Inference Background

Original Image

Original

Encoding Image

Encoding

Advanced Depth Estimation
Utilizing stereo camera images for depth inference is of great interest in the field of computer vision research. In particular, the application of deep learning methodologies over the past decade has shown superior results compared to previous classical algorithms. However, classical algorithms are still used in many fields because they guarantee adequate accuracy and high real-time performance. This typically means processing is possible at frame rates of 15 FPS or higher.
Others Picture Image
Others Vision Image

Competitor Depth Map using Computer Vision

Depth inference results using classical computer vision algorithms faced issues where distortion occurred in detailed parts and object outlines. For example, parts like the top of a hand holding a mouse or a person's head obscured by a partition were not inferred. However, the deep learning-based model we developed has overcome these issues. This model is optimized for use in 'RTLS using Stereo Vision' and ensures real-time performance (>15 FPS) through lightweighting and fine-tuning. While the processing speed may slow down to approximately 4 FPS when processing inputs from multiple cameras, it still provides satisfactory location information considering the movement speed of people on screen.
Orbro Picture Image
Orbro Vision Image

ORBRO Depth Map using Deep Learning

Depth Estimation Finetuning Technology
Images taken under natural light versus artificial lighting have different characteristics, which can manifest even when lighting hits only one lens of a stereo camera. In this case, inconsistent characteristics between the left and right images occur due to differences in exposure values of the left/right image sensors.
Light Left Image
Light Right Image

Differences in characteristics between left/right stereo images caused by direct lighting exposure

In addition to these external environmental factors, internal factors that may occur during the stereo camera manufacturing process must also be considered. These include vertical alignment errors or rotation errors that may occur when mounting image sensors on a board, and magnification differences due to changes in the distance between the lens and the image sensor.
WideLens Deviation A Image

(a)

WideLens Deviation B Image

(b)

WideLens Deviation C Image

(c)

WideLens Deviation D Image

(d)

Distortion patterns by wide-angle lenses and individual deviations between Camera A and B (Green: 2.7m, Purple: 6.3m, Red: 8.8m)

To solve these problems, the process of accurately modeling distortion patterns and correcting them to match actual depth is a critical part of depth inference technology. This correction process must account for individual camera uncertainties, such as deviations occurring during lens production or assembly. Without applying a distortion correction model, deviations of 30% or more are typically observed in depth values inferred from wide-angle stereo cameras. However, applying a distortion correction model not only eliminates these errors but also achieves an average error rate reduction of about 50% in depth inference. Therefore, we perform calibration using our sophisticated, internally developed distortion correction model before every stereo camera is shipped. This ensures superior depth inference technology and provides customers with the highest performance and reliability.

Key Advantages of Stereo Vision

Accurate Location Estimation

Using stereo cameras allows for the simultaneous acquisition of two images to accurately identify the location of objects in 3D space. This enables precise estimation of object position and movement.

Scalability

Stereo vision technology is not constrained by space size and can be applied in various scales and types of indoor and outdoor environments. Additionally, RTLS systems based on stereo cameras can expand functionality through additional sensors as needed.

Cost-Effectiveness

Stereo vision technology is relatively inexpensive compared to other location estimation technologies. Furthermore, RTLS systems can be built using existing stereo cameras, reducing installation costs.

ORBRO Solutions Powered by Stereo Vision

Precisely connect real-time location data of assets and personnel with Stereo Vision-based RTLS, and explore representative solutions for each site.

Manage vehicle flow in parking lots in real-time to reduce congestion
Public Institution Solution

Manage vehicle flow in parking lots in real-time to reduce congestion

Utilize AI technology optimized for parking lots to identify vehicle locations and space efficiency in real-time and adjust parking flow.

Learn More
Detect pedestrian movement in real-time to prevent intersection accidents
Public Institution Solution

Detect pedestrian movement in real-time to prevent intersection accidents

Identify pedestrians around crosswalks with AI-based detection technology and automatically warn vehicles of dangerous situations.

Learn More
Automatically count steel asset inventory and analyze flow
Manufacturing Solution

Automatically count steel asset inventory and analyze flow

Automatically record hourly inventory of coils and plates with AI and visually analyze asset flow.

Learn More
Detect dangerous behavior in real-time to respond immediately to school safety
School Solution

Detect dangerous behavior in real-time to respond immediately to school safety

Detect signs of student violence or falls in real-time through AI video analysis and send immediate notifications to administrators.

Learn More
ORBRO Inquiry

ORBRO Inquiry

Implement ORBRO Solutions with Experts

We provide world-class technology and experience.

Location Tracking and Positioning Technology using Stereo Vision :: ORBRO | Platform Specialized in Location Tracking and Digital Twins