Facial tracking

From Virtual Reality, Augmented Reality Wiki
Jump to: navigation, search


Figure 1. Face detection. (Image: Mathworks.com)
Figure 2. Features detection. (Image: Mathworks.com)
Figure 3. Virtual try-on. (Image: ulsee.com)

Facial tracking is a computer vision technology designed to obtain data from still images and video sequences by tracking certain facial features in real-time. While this technology has been used in motion capturing using facial markers, currently it is widely implemented in augmented reality (AR) apps using a markerless mode. For example, face tracking can be used to perceive head poses and facial expressions, rendering the information obtained to an application that applies specific face filters. A camera is used to map and track facial landmarks in real-time. [1] [2]

The face is one of the most important parts in human communication, and it has been researched in computer vision for a while. It is crucial to precisely estimate head pose and detect specific facial features such as eyebrows, corners of the eyes, and lips to achieve a viable face tracking technology. Research in this field leads to face tracking software that correctly identifies the points of interest, analyzing their structure and motion, and helps to avoid problems such as illumination or occlusion that can interfere with the tracking process. The accuracy of tracking impacts the ability to recognize subjects in video. [3] [4] [5]

Facial tracking is not the same as face recognition. While face recognition uses face tracking technology, facial tracking does not analyze and archive the identity of the person being tracked. It only detects and tracks facial movements with a camera. [1]

Facial tracking phases

A simple face tracking system can be divided into three parts. The first is face detection; second, identification of facial features; and the third part is the tracking of the face. [6]

The human face is a complex, multidimensional, meaningful visual stimulant, and therefore a challenge to create computational models for its detection. According to Al-Allaf (2014), “The process of face detection in images is complex because of variability present across human faces such as: pose; expression; position and orientation; skin color; presence of glasses or facial hair; differences in camera gain; lighting conditions; and image resolution.” [7]

Face detection (Figure 1) is based on learning algorithms to allocate human faces in images or video sequences that are used as input. The algorithm distinguishes face areas from non-face background regions and the feature extraction (Figure 2) proceeds to locate relevant features such as eyes, mouth, nose and eyebrows in the detected face. Eye pupils are also used in some infra-red (IR) based eye tracking techniques, being a stable approach to detect pupils and face location. [3] [7]

Inside the detected frontal or near frontal face, the program tries to find the eye centers and eyebrow inner endpoints. Tian et al. (2003) developed an algorithm that “Searches for two pairs of dark regions which correspond to the eyes and the brows by using certain geometric constraints such as position inside the face, size and symmetry to the facial symmetry axis.” The algorithm uses an iterative thresholding to find the dark regions under different lighting conditions. After detecting the positions of the eyes, it predicts the location of the mouth and the proceeds to find the horizontal borders of the line between the lips. [8]

A lot of face tracking software is based on the Viola-Jones object detection framework. This provides an algorithm to separate faces from non-faces. After the initial face detection, the software begins to recognize the components of the face to build a general map. The software uses the contrast of light in the live image to locate the eyes, nose, and mouth besides looking for shapes and differences. All the relevant features found are marked (Figure 2). With the face detection and features extracted, the software is able to track the face. [9]

The Viola-Jones algorithm was proposed in 2001 and it has become more relevant in recent years because mobile hardware is now powerful enough to run these kind of applications in real-time with the advent of mobile augmented reality. [9]

According to Cao and Liu, “Robustness to various target appearances and scene conditions are the main problems that need to be considered” when trying to achieve facial tracking. An accurate face tracking is impaired by the changing appearance of targets due to their nonrigid structure, 3D motion, interaction with other objects, and differences in illumination. There are several methods that have been researched to overcome this problem such as a probability estimation method based on intensity normalized color histogram, the Active Appearance Model that “Computes face shape and texture models in training, and fits them with the query image to locate the face”, or the Incremental Visual Tracker (IVT) that attempts to “Solve these problems using adaptive target appearance models. They represent the target in a low-dimensional subspace which is updated adaptively using the images tracked in the previous frames.” [5] [10]

Non-rigid face tracking refers to locating specific landmarks of interest from an image such as the nose tip, corners of the eyes, or outline of the lips. Rigid head pose tracking attempts to estimate the location and orientation of the head. There is also a third approach - rigid and non-rigid face tracking - that combine head pose estimation and feature point tracking. [4]

Marker-based and markerless facial tracking

Facial performance capture still relies on markers placed in the face to enable dense and accurate geometry tacking of facial expressions. This kind of facial tracking is used in the cinema industry to animate a digitized model of the actor’s face or transfer the motion to a different one [11] [12]

Markerless facial tracking does not depend on the use of markers to capture the expressions and movements of a person. It relies on specific facial landmarks and the more facial points it tracks, the more accurate the depiction of the facial features will be. [1]

Application of facial tracking in the mobile app industry

A number of different mobile applications that use facial tracking have emerged and become popular. Snapchat made popular the face swapping and augmentation. Indeed, AR has contributed to the implementation of face tracking technologies in the mobile market, having the potential to do much more that just fun applications. [9]

The photography app Line Camera uses facial tracking through its motion stickers, and Facerig’s mobile app uses the tracking to power its 3D animated avatars. Virtual try-on is another area where face tracking has contributed to the development of apps that allow the user to experiment products before buying them (Figure 3). These kind of applications try to tackle the gap between the customer experience for sizing and fitting in the e-commerce industry. With facial tracking, the AR technology “Uses a camera to precisely maps and tracks facial landmarks in real-time, offering customers the ability and ease to try on accessories and looks via mobile device. The screen–whether it be your computer, tablet, or phone–literally acts like a mirror and allows customers to overlay products and styles directly onto their own face. And because the technology captures motion in real-time, it automatically translates the movements so you can see the product on in a variety of angles. It’s fast, simple, efficient, and provides online shoppers a more personalized experience.” [1] [2]


  1. 1.0 1.1 1.2 1.3 Angela, H. (2016). What is face tracking and how it boosts entertainment apps. Retrieved from https://ulsee.com/en/blog/9-news/75-what-is-face-tracking-and-how-it-boosts-entertainment-apps
  2. 2.0 2.1 Angela, H. (2016). Is virtual try-on the future of e-commerce? Retrieved from https://ulsee.com/en/blog/9-news/78-is-virtual-try-on-the-future-of-e-commerce
  3. 3.0 3.1 Gu, H., Qiang J. and Zhiwei Z.. (2002). Active facial tracking for fatigue detection. Proceedings. Sixth IEEE Workshop on: 137-142
  4. 4.0 4.1 Baltrušaitis, T., Robinson, P. and Morency, L.P. (2012). 3D constrained local model for rigid and non-rigid facial tracking. Computer Vision and Pattern Recognition (CVPR)
  5. 5.0 5.1 Kim, M., Kumar, S., Pavlovic, V. and Rowley, H. (2008). Face tracking and recognition with visual constraints in real-world videos. IEEE Computer Vision and Pattern Recognition (CVPR)
  6. MathWorks. Face detection and tracking using the KLT algorithm. Retrieved from https://www.mathworks.com/help/vision/examples/face-detection-and-tracking-using-the-klt-algorithm.html?s_tid=gn_loc_drop
  7. 7.0 7.1 Al-Allaf, O.N. (2014). Review of face detection systems based artificial neural networks algorithms. The International Journal of Multimedia & Its Applications, 6(1)
  8. Tian, Y., Brown, L., Hampapur, A.,. Pankanti, S., Senior, A. and Bolle, R. Real world real-time automatic recognition of facial expression. IEEE Workshop on Performance Evaluation of Tracking and Surveillance (PETS)
  9. 9.0 9.1 9.2 Dachis, A. (2016). How face tracking & augmentation technology works. Retrieved from https://augmented.reality.news/news/face-tracking-augmentation-technology-works-0172574/
  10. Cao, Q. and Liu, R. Real-Time Face Tracking and Replacement. Retrieved from http://web.stanford.edu/class/cs231m/projects/final-report-cao-liu.pdf
  11. Weise, T., Li, H., Gool, L.V. and Pauly, M. (2009). Face/Off: Live facial puppetry. Eurographics/ACM SIGGRAPH Symposium on Computer Animation
  12. Failes, I. (2017). From Performance Capture To Creature: How The Apes Were Created In ‘War for the Planet of the Apes.’ Retrieved from http://www.cartoonbrew.com/vfx/performance-capture-creature-apes-created-war-planet-apes-152357.html