Frank's Thesis Chapter 2

Go back to Chapter 1, skip to Chapter 3 or return to the Index

Chapter 2 OVERVIEW

2.0 Objective
2.1 Facilities
2.2 Methods
2.3 Definition of the coordinate system

2.0 Objective

While visual three-dimensional synthesis for virtual environments has seen a great expenditure of time and effort, audio spatialization is in its infancy. Present systems incorporate crude algorithms running on several high-powered DSPs to accomplish simple placement of a single sound source. The goal of this project is to develop refinements to the current state of the art, reducing the temporal computation demands placed on processing systems.

A software implementation of current algorithms will be developed to serve as a baseline. Several points of consideration for increasing realism in auralization systems will be identified and implemented. A module will then be added which incorporates bandwidth analysis to identify and eliminate unnecessary computation. These refinements demonstrate a marked savings in computation time for auralization processing.

2.1 Facilities

The primary development environment for this project was a 486dlc/25 personal computer, operating under DOS 5.0 and Windows 3.1 with 5MB of RAM and 440MB of disk storage. A Media Vision Pro Audio Studio 16 soundcard was used to record and playback Microsoft Type I Wave (.WAV) format sound files through JVC HA-D500 headphones. Turtle Beach WAVE for Windows was utilized extensively for viewing the .WAV files, and MATLAB for Windows was used for algorithm development and graphical analysis. Software development was completed with the GNU C++ compiler version 2.5.7, which offers true 32-bit executables (for speed) and a flat memory model (avoiding DOS's 640k memory restrictions), as well as enhanced portability across platforms. A Sun workstation was also used at various stages, both for file transfer and for fast execution of tested code segments.

Sound samples used in development were recorded directly to the PAS 16 soundcard from an Alesis D4 drum module. Voice samples were recorded from a pre-recorded CD, again directly to the PAS 16. Sine waves and noise samples were synthesized in software.

2.2 Methods

Available facilities rendered the development of a real-time auralization system unreasonable; instead, a "preprocessing" software system was created. Sound sources were recorded (in mono) using a PC soundcard, and were stored in Microsoft Type 1 .WAV format files. An "auralized" stereo .WAV file was generated by invoking one of the programs developed in chapters 4 and 5, along with a desired source position. This .WAV file was then ready for playback though the soundcard and headphones. The provided source code was written with portability in mind. Every effort was made to avoid non- standard C++ functions and conventions. As a result, executables may be generated on a wide variety of machines. The code has been tested on 486-based systems, and a Sun workstation under UNIX. Once an executable has been compiled from the source code, the command for processing a file from the system prompt (independent of computing platform) is

auralize input.wav output.wav [q] [f] [r]

where

input.wav
is the name of the mono source file

output.wav
is the name of the stereo output file

[q]
is the desired azimuth in degrees [default 0]

[f]
is the desired elevation in degrees [default 0]

[r]
is the desired distance in meters [default 1]

2.3 Definition of the coordinate system

When working with auralization concepts, it is much more intuitive to work in spherical rather than Cartesian coordinates. For clarity, the specific definition of this coordinate system is described here.

Figure 2.1: Diagram of spherical coordinate system
(from Makous & Middlebrooks [24])

All locations (source and ear positions) are referenced to the center of the head, and are given as a triplet of azimuth (), elevation (f), and distance (r).

Azimuth: is defined as the deflection from front center (0º ) in the horizontal plane, with positive angles defined to the right. Therefore, 90º is directly to the right and -90º is directly left. Positions directly behind the head may be described as either 180º or -180º ; the two are functionally equivalent
Elevation: is defined as the deflection from horizontal (0º ), with positive values defined above and negative below. Therefore, 90º is directly overhead and -90º is directly below. Angles greater than 90º are redundant and are not used.
Distance: is defined in meters from the center of the head. The reference distance for all level calculations is one meter.

For many of the manipulations involved in auralization, it is necessary to compute the distance between two arbitrary points in space - the ear and the source. While the distance formula for three-dimensional Cartesian coordinates is well known, one must be derived for our spherical system. This derivation has the following result:

Given two points in spherical coordinates :

This distance computation is generalized and implemented as a function for flexibility. Within the context of the auralization programs developed here, it is used for measuring the distance from each ear to the source; this distance is then used to compute the ILD and ITD for that ear. The distance is recomputed each time the source changes location, for non-static sources.

The distance calculation is also used to remove intrinsic ILDs and ITDs from HRTFs recorded using a centro-cranial origin (see Chapter 3).

Continue on to Chapter 3...

Frank Filipanits Jr. - franko@alumni.caltech.edu