Computer vision

From Virtual Reality and Augmented Reality Wiki
Jump to: navigation, search
Information icon1.png This page is a stub, please expand it if you have more information.
Computer vision is an important aspect of augmented reality. It allows AR Devices to analyze and understand real world environments.

Goals and Challenges for Computer Vision

Object recognition is so trivial for humans that we often take for granted the complexities associated with a proper implementation of computer vision. A computer-based machine has to be able to process and understand a large number of different scenarios under unpredictable conditions. It is also necessary to solve the issue of a dynamic scene, estimate the camera position, track the position and orientation of objects, handle various scales of the same object, and, most importantly, comprehend the scene as a whole. Thus, the main goal of computer vision is to duplicate the complexities of the Human Vision System (HVS) by reconstructing and interpreting a 3D scene from its 2D projection.

Approaches for Implementation of Computer Vision

Main Approach

This process is often called the ill-posed inverse problem and it is used to describe the procedure of calculating from a set of observations the causal factors that produced them. Currently, the most commonly used approaches are performed at the lowest levels of object recognition. This means that software algorithms blindly look for basic features like corners, edges, contours, and motion estimation, without understanding the visual scene in general. After the initial capture of the scene, it is necessary to establish a position and orientation of objects in relation to viewer’s own position. The augmented vision is then carefully and precisely superimposed on a real scene. Commonly used headsets are already able to gather all necessary positional and movement data with an easy and quickness that is needed for real-time use.

Other Approaches

There are four main methods for implementing computer vision and object detection – salient point and blob detection, scale-space methods, template-matching methods, edge and boundary detection.

All of these methods look primarily at local features of objects. Even with our advanced recognition systems, we are still unable to fully consider the scene in its entirety and capture the big picture in the same way humans can. To overcome this issues, it is crucial to develop higher-level scene classification systems and frameworks, which will definitely be the focal point of all future development.

Salient point and blob detection

Salient point and blob detection method use a so-called salient point, which is a small area of the image with some discriminative characteristics that are different from the rest of the image. A blob is a region of an image in which some properties are constant or almost constant. The detected blobs are commonly used for peak detection, texture analysis and recognition.

Scale-space methods

Scale-space methods examine different scales of the same image instead of looking at it though a fixed-size windows. This method commonly uses Gaussian filter for enlargement because it does not introduce new structures that could interfere if detection mechanisms.

Template matching methods

Template matching methods are most commonly used for face detection. They rely on preexisting templates, which are superimposed on a real image and analyzed for similarities. The biggest limitation is in the maximum number of templates that can be stored, analyzed, and acquired.

Edge and boundary detection

Edge and boundary detection methods have been a fundamental tool in image processing for a long time. By detecting sharp changes in image brightness, we are able to observe distinctive features of objects and their surroundings.


VR and AR  Wiki Discord Logo