VR audio

From Virtual Reality, Augmented Reality Wiki
Revision as of 18:09, 19 September 2017 by Paulo Pacheco (talk | contribs)

Jump to: navigation, search
Information icon1.png This page is a stub, please expand it if you have more information.
See also: Oculus Audio SDK


Introduction

VR audio is a technology that simulates sound in a realistic manner for virtual reality (VR). When properly executed, it increases the user’s immersion and the sense of presence in the virtual environment.

Localization is the process by which the human brain -- with input signals coming from the ears- can precisely pinpoint the position of an object in 3D space only based on auditory clues. This characteristic of human biology is useful in different activities of day-to-day life and also it can be used to create immersive VR experiences. Indeed, while humans have five senses, only two of these are currently relevant to VR: sight and sound. Since only these are available to develop an immersive experience, they have to be explored to the fullest by means of high-caliber 3D graphics and truly 3D audio. [1]

Immersion is essential in virtual reality, with the concept of presence being emphasized - the feeling of being physically present in an environment. Vision and sound both contribute to make this sensation emerge in VR. Graphically, one way immersion and presence are achieved is through low-latency head tracking, with the VR experience matching the user’s movement and field of vision in real time. Head tracking is also a reason for the necessity of virtual reality audio. Sound is often pinpointed by moving the head slightly or rotating it. Therefore, it is essential to have truly 3D audio in a VR experience to maintain the illusion of reality. [1] [2]

Maintaining the audio cues that the brain needs to correctly localize the sound is still a challenge. The ears pick up audio in three dimensions, and the brain processes multiple cues to spatialize the sound. One of the cues is proximity, with the ear closer to the sound source picking up sound waves before the other. Distance is another cue, changing the audio levels. But these cues don’t apply to all directions. According to Lalwani (2016), “sounds that emerge from the front or the back are more ambiguous for the brain. In particular, when a sound from the front interacts with the outer ears, head, neck, and shoulders, it gets colored with modifications that help the brain solve the confusion. This interaction creates a response called Head-Related Transfer Function (HRTF), which has now become the linchpin of personalized immersive audio.” A person’s HTRFs is unique since the ears’ anatomy is different from person to person. [2]

Historically, audio has been a vital part of the computer and video gaming experience. It evolved from simple wave generators to FM synthesis, to 8-bit mono samples and 16-bit stereo samples, to today’s surround sound systems on modern gaming consoles. However, virtual reality is changing the traditional way that sound was used in computer and gaming experiences. VR brings the experience closer to the user through a head-mounted display (HMD) and headphones, and the head tracking changes how audio is implemented - being interdependent with the user’s actions and movements. [3]

With the advent of VR, virtual reality audio has gained more interest. Companies want to implement a VR audio solution that realistically reproduces audio in a virtual environment while not being computationally restrictive. The development of PC audio is more tumultuous than the field of graphics, but with the rise of VR, 3D audio it is expected to gain traction and prominence. [4] [5]

Importance of VR audio

VR audio is extremely important in a VR context in order to increase the user’s sense of presence by making the experience more immersive. VR developers cannot develop a virtual experience that only engages the sense of sight and expect to truly create an immersive environment. For the alternate worlds of VR to become real to the human brain, immersive graphics have to be matched by immersive 3D audio that simulates the natural listening experience. When properly implemented, it can solidify a scene, conveying information about where objects are and what type of environment the user is in. Visual and auditory cues amplify each other, and a conflict between the two will affect immersion. Indeed, truly 3D audio is vital to augment the entire VR experience, taking it to a level that could not be achieved by graphics only. [1] [2] [3] [4] [5] [6]

Evolving VR audio

Les Borsai (VP of Business Development at Dysonics) has made some suggestions to move VR audio technology forward. He focuses mainly on three areas: better VR audio capture, better VR audio editing tools, and better VR audio for games. [6]

Improved VR audio recording means a device that captures true spherical audio, for the best reproduction over headphones. This enables the user to hear sounds change relative to the head movement, and is essential for live-captured immersive content - one that adds an essential layer of contextual awareness and realism. According to Borsai, “the incorporation of motion restores the natural dynamics of sound, giving your brain a crystal-clear context map that helps you pinpoint and interact with sound sources all around you. These positional audio cues that lock onto the visuals are vital in extending the overall virtual illusion and result in hauntingly lifelike and compelling VR content.” [6]

The second suggestion made by Borsai - better VR audio editing tools - asserts that VR content creators need powerful but easy-to-use tools that will encompass all the stages of VR audio production, from raw capture to the finished product. Preferably the solution should be modular and easy-to-use since most content creators do not have the skill or time to focus on audio. Borsai’s suggestion of a complete audio stack includes “an 8-channel spherical capture solution for VR, plus post-processing tools that allow content creators to pull apart original audio, placing sounds around a virtual space with customizable 3D spatialization and motion-tracking control.” [6] His final suggestion touches on how significant developments in VR audio will come with the creation of plugins for the major gaming engines, such as Unity or Unreal. Borsai mentions that audio-realism is essential to gaming, that even the most subtle of audio cues allows the player to interact with sound sources around him resulting in an increase in overall immersion and natural reaction time. [6]

VR audio and the human auditory system

Humans depend on psychoacoustics and inference in order to locate sound sources within a three-dimensional space, taking into consideration factors like timing, phase, level, and spectral modifications. The main audio cues that humans use to localize sounds are interaural time differences, interaural level differences, and spectral filtering. [3] [7]

Interaural time differences: this relates to the time of arrival of a sound wave to the left and right ears. The time difference varies according to the sound’s origin in relation to the person’s head. [7]

Interaural level differences: Humans are not able to discern the time of arrival of sound waves for higher frequencies. The level (volume) differences between the ears is used for frequencies above 1.5 KHz in order to identify the sound’s direction. [7]

Spectral filtering: The outer ears modify the sound’s frequencies depending on the direction of the sound. The alterations in frequency is used to determine the elevation of a sound source. [7]

Researchers have been tackling the VR audio problem, trying to measure individual audio modifications that allow the brain to localize simulated sounds with precision. In VR, the visual setting is predetermined, and the audio is best generated on a rendering engine that attaches sound to objects as they move and interact with the environment. Lalwani (2016) refers that, “this object-based audio technique uses software to assign audible cues to things and characters in 3D space.” [2]

Head-related Transfer Functions (HTRFs)

The HRTF is the foundation for the majority of current 3D sound spatialization techniques. Spatialization - the ability to reproduce a sound as if positioned at a specific place in a 3D environment - is an essential part of VR audio and a vital aspect to produce a sense of presence. Direction and distance are spatialization main components. Depending on its direction, sounds are differently modified by the human body and ear geometry, and these effects are the basis of HRTFs that are used to localize a sound. [3]

Accurately capturing an HRTF requires an individual with microphones placed in the ears inside an anechoic chamber. Once inside, sounds are played from every direction necessary and recorded by the microphones. Comparing the original sound with the recorded one allows for the computation of the HRTF. To build a usable sample set of HRTFs, a sufficient number of discrete sound directions need to be captured. [3]

While custom HRTFs to match a person’s body and ear geometry would be ideal, it is not a practical solution. HRTFs are similar enough from one person to the other to allow for a generic reference set that is adequate for most situations, particularly when combined with head tracking. There are different publicly available datasets for HRTF-based spatialization implementations such as the IRCAM Listen Database, MIT KEMAR, CIPIC HRTF Database, and ARI (Acoustics Research Institute) HRTF Database. [3]

While HRTFs help identify a sound’s direction, they do not model the localization of distance. Several factors affect how humans infer the distance to a sound source, which can be simulated with different levels of accuracy and computational cost. These are loudness, initial time delay, direct vs. reverberant sound, motion parallax, and high-frequency attenuation. [3]

Microphones

AMBEO VR Mic

Dysonics RondoMic

References

  1. 1.0 1.1 1.2 Chase, M. (2016). How VR is resurrecting 3D audio. Retrieved from http://www.pcgamer.com/how-vr-is-resurrecting-3d-audio/
  2. 2.0 2.1 2.2 2.3 Lalwani, M. (2016). For VR to be truly immersive, it needs convincing sound to match. Retrieved from https://www.engadget.com/2016/01/22/vr-needs-3d-audio/
  3. 3.0 3.1 3.2 3.3 3.4 3.5 3.6 Oculus. Introduction to virtual reality audio. Retrieved from https://developer.oculus.com/documentation/audiosdk/latest/concepts/book-audio-intro
  4. 4.0 4.1 Lang, B. (2017). Valve launches free steam audio SDK beta to give VR apps immersive 3D sound. Retrieved from https://www.roadtovr.com/valve-launches-free-steam-audio-sdk-beta-give-vr-apps-immersive-3d-sound/
  5. 5.0 5.1 Lang, B. (2017). Oculus to talk “Breakthroughs in spatial audio technologies” at Connect Conference. Retrieved from https://www.roadtovr.com/oculus-talk-breakthroughs-spatial-audio-technologies-connect-conference/
  6. 6.0 6.1 6.2 6.3 6.4 Borsai, L. (2016). This is why it’s time for VR audio to shine. Retrieved from https://www.roadtovr.com/this-is-why-its-time-for-vr-audio-to-shine/
  7. 7.0 7.1 7.2 7.3 Google. Spatial audio. Retrieved from https://developers.google.com/vr/concepts/spatial-audio