Spatial Audio Work
in the Multimedia Computing Group

What is Spatial Audio?

Spatial audio is sound that has been processed to give the listener a sense of the location of a virtual sound source and the characteristics of a virtual listening space. True binaural spatial audio, when presented over headphones, appears to come from a particular point in the space outside of the listener's head. This is different from ordinary recorded stereo, which is generally restricted to a line between the ears when listened to with headphones.

What Use is Spatial Audio?

Spatial audio can be useful whenever a listener is presented with multiple auditory streams, requires information about the positions of events outside of the field of vision, or would benefit from increased immersion in an environment. Possible applications of spatial audio processing techniques include:

complex supervisory control systems such as telecommunications and air traffic control systems
civil and military aircraft warning systems
teleconferencing and telepresence applications
virtual environments
computer-user interfaces and auditory displays, especially those intended for use by the visually impaired
arts and entertainment, especially video games and music

How Does it Work?

The principle of spatial audio is simple: if the sound waves arriving at your eardrums are identical to those of a real audio source at a particular position, you will perceive that sound as coming from a source at that particular position. Because people only have two ears, you only need two channels of sound to create this effect, and you can present this sound over ordinary headphones.

It is possible to recreate the effects of the ears and upper body on incoming sound waves by applying digital filters to an audio stream; we use a different pair of filters (left and right ear) for every position in the space around the listener. Most current spatial audio systems are based on digital filters derived from recordings made in the ear canals of live human subjects (Wightman & Kistler, 1989). True binaural spatial audio, when presented over headphones, appears to come from a particular point in the space outside of the listener's head. This is different from ordinary recorded stereo, which is generally restricted to a line between the ears when listened to with headphones.

What are the Problems to be Overcome? (ie, Research Areas)

Cost

Until very recently, the biggest barrier to the widespread use of spatial audio was cost. The first spatial audio systems, build in during the first World War, were bulky mechanical structures as big as houses. This remained the case until the late 1980's, when small, fast computers allowed engineers to build electronic spatial audio systems that could operate in real time (Wenzel et al, 1988). The first such commercial system, the Convolvotron, had a market price of $25,000 and was the size of a desktop computer. While this was a great improvement over mechanical systems, spatial audio was still a long way from the consumer market.

In the early 1990's, another technological advance affected spatial audio. The demands of the telecommunications industry lead to the development of special computer chips optimized for digital filter applications. Many of these chips, called Digital Signal Processors (DSPs), have been found to be sufficiently fast to produce spatial audio in real time (Burgess, 1992). Now, the manufacturer's cost of a spatial audio system could potentially be the same as the cost of a DSP chip, which was as low as $10 in 1993. We are presntly seeking funding for the development of a mask-programmed DSP chip for embedded 3-D sound applications. Contact David Burgess, burgess@cc.gatech.edu, for more information.

Now that hardware costs are diminishing and developers are trying to integrate spatial audio into applications, we face a new set of implementation problems.

Environmental Modeling

Environmental cues, such as early echoes and dense reverberation, are important for a realistic listening experience and are known to improve localization and externalization of audio sources. Unfortunately, the cost of exact environmental modeling is extraordinarily high; the generalized technique for computing the echoes in a listening environment is equivalent to raytracing. However, by borrowing statistical techniques from the field of architectural acoustics researchers are developing cost-effective methods for approximating the effects of listening environments. A typical approach might involve pre-calculation of reverberation patterns in a room, a technique analogous to luminosity calculation for a graphic renderer (Astheimer, 1993). Others have developed very efficient modeling techniques for special-case environmental geometries (Kendall et al, 1989).

Headphones

Existing spatial audio systems are designed for use via headphones. This requirement may result in certain limitations on their use. For example, spatial audio may be limited to those applications for which a user is already wearing some sort of headgear, or for which the advantages of spatial sound outweigh the inconvenience of a headset.

Headphones are used because they fix the geometric relationship between the physical sound sources (the headphone drivers) and the ears. Headphones also eliminate crosstalk between the binaural signals. With additional signal processing, we can conceivably compensate for these effects, allowing spatial audio to be presented over free field speakers. However, to compensate for the effects of speakers, the spatial audio system must have knowledge of the listener's position and orientation with respect to the speakers; i.e., even without headphones, we need head tracking.

Without head tracking, we cannot produce true 3-dimensional spatial sound in any practical way. However, multi-speaker surround-sound systems are still possible, and may prove useful in many applications.

Effectiveness

Auditory localization is still not fully understood, and thus developers cannot make effective price/performance decisions in the design of spatial audio systems. Furthermore, when systems do not perform effectively, developers are often at a loss to explain why. There are about half a dozen known auditory localization cues (interaural delay, head shadow, shoulder echoes, pinna effects, etc.), but there relative importance is not known. Furthermore, it is entirely possible that other, presently unknown cues may be of great importance.

Fortunately, human factors psychologists around the world are constantly improving our understanding of auditory localization. For example, the Georgia Tech GVU center is currently conducting a series of localization experiments in cooperation with the Georgia Tech Psychology Department and researchers from the Georgia Tech Research Institute. We are hopeful that the knowledge gained from this work will result in more effective, less expensive spatial audio systems.

Software Engineering

There is an additional barrier to the use of spatial sound in large software systems, such as user interfaces and virtual environments; most spatial audio systems provide little in the way of environmental modeling, synchronization, or network support. In fact, most existing spatial audio systems offer no more features than typical audio device drivers, which support the simple playback of canned audio, at best.

A control system for spatial audio should at least have the features one would find useful in a general-purpose audio server: network control interfaces, support for multiple clients, support for simultaneous playback of multiple sounds, prioritization of requests for limited resources, and mechanisms for synchronizing client applications with an audio stream (Arons, 1992). A server for spatial audio must also include mechanisms for the choreography of moving sound sources, and, ideally, an acoustic renderer for modelling the listening environment automatically (Burgess & Verlinden, 1993).

(Click here for more information on audio server work in the Multimedia Computing Group.)

References

Arons, B. (1992) Tools for building a synchronous service to support speech and audio applications, ACM Fifth Annual Symposium on User Interface Software and Technology (UIST '92), Monterey, November 1992.

Astheimer, P. (1993) What you see is what you hear: acoustics applied in virtual worlds, IEEE Symposium on Virtual Reality, San Jose, October, 1993.

Burgess, D.A. (1992) Techniques for low-cost spatial audio, ACM Fifth Annual Symposium on User Interface Software and Technology (UIST '92), Monterey, November 1992.

Burgess, D.A. and Verlinden, J.C. (1993) An architecture for spatial audio servers, VR Systems Fall '93 Conference, New York, November 1993.

Kendall, G.S., Martins, W.L. & Decker, S.L. (1989) Spatial reverberation: discussion and demonstration, Current Directions in Computer Music Research, MIT Press: Cambridge, MA.

Wightman, F.L. & Kistler, D.J. (1989) Headphone simulation of free-field listening I: stimulus synthesis, J. Acoust. Soc. Am., 85, 858-867.

Wenzel, E.M., Wightman, F.L. & Foster S.H. (1988) A virtual display system for conveying three-dimensional acoustic information, Proceedings of the Human Factors Society - 32nd Annual Meeting, 86-90.

Who's Doing this Work at the GVU Center?

David Burgess, College of Computing (contact)
Elizabeth Mynatt, College of Computing
Mark D. Lee, Department of Psychology

Who Else is Doing this Work Around the World?

For discussion of spatial sound implementation issues, the Multimedia Computing Group operates a mailing list. To subscribe to this list, send mail to majordomo@cc.gatech.edu with the message body

subscribe spatial-audio

Mail sent to spatial-audio@cc.gatech.edu will then be echoed to the list.

We are also working on an archive of FTP-able papers on spatial audio.

There is also finally a set of HRTF filters available for research use thanks to the MIT Media Lab. This page provides a good deal of general information about techniques for measuring the HRTF, as well.

Of course, the only real commercial supplier of true 3-D audio gear is Crystal River Engineering.