Sndlib: Audio File, Format Conversion, and I/O Utilities

Roger B. Dannenberg

History

Abstract

This document describes a set of portable C utilities for digital audio input and output to and from files and audio interfaces. The purpose is to read and write sound files in a variety of formats and to play and record audio. This code is intended for use in interactive and general purpose audio systems, and should be portable to virtually any computer system that supports C and has a file system.

Overview

There is basically one interesting data type: snd_type is a pointer to a descriptor for an audio stream, which is either being read from or written to a file or audio interface. The snd_type contains a structure that describes the sample format, sample rate, number of channels, etc.

Routines exist to initialize sound transfer (snd_open()), perform transfers (snd_read(), snd_write()) and to finalize a transfer (snd_close()). Other routines allow you to transfer data to/from buffers and to convert formats. Sample rate conversion is not currently supported, but would be a welcome addition.

typedef struct {
    long channels;	/* number of channels */
    long mode;		/* ADPCM, PCM, ULAW, ALAW, FLOAT, UPCM */
    long bits;		/* bits per sample */
    double srate;	/* sample rate */
} format_node;

#define snd_string_max 258


/* the snd_type structure for applications to use: */
typedef struct snd_struct {
    short device; 	/* file, audio, or memory */
    short write_flag;	/* SND_READ, SND_WRITE, SND_OVERWRITE */
    format_node format;	/* sample format: channels, mode, bits, srate */
    snd_fns_type dictionary;    /* for internal use only */
    union {
      struct {
	char filename[snd_string_max];	/* file name */
	char filetype[snd_string_max];  /* file type if known */ 
	int file;             /* OS file number */
	long header;          /* None, AIFF, IRCAM, NEXT, WAVE */
	long byte_offset;     /* file offset of first sample */
	long end_offset;      /* byte_offset of last byte + 1 */
        long current_offset;  /* current (computed) file offset */
	int swap;         /* flag to swap bytes on input/output */
	/* fields from AIFF sample files: */
	int loop_info;    /* Boolean: is this loop info valid? */
	double native_hz; /* original pitch in hz */
        float gain;       /* gain: scale factor */
        double low_hz;
        double high_hz;
        char low_velocity;
        char high_velocity;
	loop_node sustain_loop;
	loop_node release_loop;
      } file;
      struct {
        char interfacename[snd_string_max]; /* (optional) to specify interface */
	char devicename[snd_string_max]; /* (optional) to specify device */
	void *descriptor;
	long protocol;	/* SND_REALTIME or SND_COMPUTEAHEAD */
	double latency;	/* app + os worst case latency (seconds) */
	double granularity; /* expected period of app computation (s) */
	/* note: pass 0.0 for default latency and granularity */
      } audio;
      struct {
	long buffer_max;    /* size of buffer memory */
	char *buffer;       /* memory buffer */
	long buffer_len;    /* length of data in buffer */
	long buffer_pos;    /* current location in buffer */
      } mem;
    } u;
} snd_node, *snd_type;

The meanings of fields are as follows:

These are additional fields for when device is SND_DEVICE_FILE:

These are additional fields for when device is SND_DEVICE_AUDIO:

The following fields are for when device is SND_DEVICE_MEM (in-memory data):

Routine descriptions

Terminology

A sample is one amplitude measurement from one channel. Ordinarily, a sound file can contain multiple channels. All channels have the same sample rate and length, and samples are treated as occurring simultaneously across all channels. The collection of samples occurring simultaneously is called a frame. Samples are stored in frame order. Within a frame, samples are stored in order of increasing channel number.

All routines described here measure data in units of frames (not samples, not bytes). One exception is snd_seek(), which measures file position in seconds of time, expressed as a double.

int snd_open(snd_type snd, long *flags);

To open a file, fill in fields of a snd_type and call snd_open(). If there is header information in the file or device characteristics for the audio interface, fields of snd are filled in. The flags parameter tells which fields were specified by the snd_open() process. E.g. if you open a raw file, there is no header info, so the format will be as specified in snd. On the other hand, if you open an AIFF file, the file will specify the sample rate, channels, bits, etc., so all these values will be written into snd, and bits will be set in flags to indicate the information was picked up from the file.

Returns SND_SUCCESS iff successful. If not successful, attempts to open a file will place the (system dependent) return code from open() into the u.file.file field.

Before calling snd_open(), all general fields and fields corresponding to the device (e.g. u.file for SND_DEVICE_FILE) should be set, with the following exceptions: u.file.header (for SND_WRITE), byte_offset, end_offset, descriptor. The field filetype is set but not read by snd_open().

NOTE: For SND_DEVICE_MEM, fill in the u.mem fields directly and call snd_open(), which merely sets the dictionary field with function pointers. The application is responsible for maintaining u.mem: u.mem.buffer_len is the write pointer (snd_write() data goes here), and u.mem.buffer_pos is the read pointer (snd_read() data comes from here).

NOTE 2: for SND_DEVICE_MEM, you can set write_flag to SND_WRITE, write data into the buffer, then set write_flag to SND_READ and read the buffer. Use snd_reset() before reading the buffer again to read from the beginning of the buffer, or simply reset read and write pointers directly to read/write different parts of the buffer.

int snd_close(snd_type snd);

Closes a file or audio device. There is no need to call snd_close for SND_DEVICE_MEMORY, but this is not an error.

Returns SND_SUCCESS iff successful.

int snd_seek(snd_type snd, double when);

After opening a file for reading or overwriting, you can seek ahead to a specific time point by calling snd_seek(). The when parameter is in seconds and indicates seconds of time relative to the beginning of the sound.

Returns SND_SUCCESS iff successful.

int snd_reset(snd_type snd);

Resets non-file buffers. If snd has SND_DEVICE_AUDIO, then the sample buffers are flushed. This might be a good idea before reading samples after a long pause that would cause buffers to overflow and contain old data, or before writing samples if you want the samples to play immediately, overriding anything already in the buffers.

If snd has SND_DEVICE_MEM and SND_READ, then the buffer read pointer (buffer_pos) is reset to zero. If SND_WRITE is set, then the buffer read pointer (buffer_pos) and write pointer (buffer_len) are reset to zero.

If snd has SND_DEVICE_FILE, nothing happens.

Returns SND_SUCCESS iff successful.

long snd_read(snd_type snd, void *buffer, long length);

Read up to length frames into buffer.

Returns the number of frames actually read.

int snd_write(snd_type snd, void *buffer, long length);

Writes length frames from buffer to file or device.

Returns number of frames actually written.

long snd_convert(snd_type snd1, void *buffer1,
                 snd_type snd2, void *buffer2, long length);

To read from a source and write to a sink, you may have to convert formats. This routine provides simple format conversions according to what is specified in snd1 and snd2. The number of frames to convert is given by length, and the number of frames is returned.

long snd_poll(snd_type snd);

The standard way to play files is to put something in the event loop that refills an output buffer managed by the device driver. This routine allows you to ask whether there is space to output more samples. If SND_REALTIME is selected, the number returned by snd_poll() will grow fairly smoothly at the sample rate, i.e. if the sample rate is 8KHz, then the result of snd_poll() will increase by 8 per millisecond. On the other hand, if SND_COMPUTEAHEAD is selected, then snd_poll() will return zero until a sample buffer becomes available, at which time the value returned will be the entire buffer size.

In some implementations, with SND_REALTIME, snd_poll() can be used to furnish a time reference that is synchronized to the sample clock. In other words, the number of frames written plus the value returned by snd_poll() increases steadily in steps no larger than granularity.

Note: some low-level functions are implemented for conversion from buffers of floats to various representations and from these representations back to floats. See snd.h for their declarations.

int snd_flush(snd_type snd);

When the device is SND_DEVICE_AUDIO, writes are buffered. After the last write, call snd_flush() to transfer samples from the buffer to the output device. snd_flush() returns immediately, but it only returns SND_SUCCESS after the data has been output to the audio device. Since calling snd_close() will terminate output, the proper way to finish audio output is to call snd_flush() repeatedly until it returns SND_SUCCESS. Then call snd_close() to close the audio device and free buffers.

If snd_flush is called on any open snd_type other than a SND_DEVICE_AUDIO opened for output, it returns SND_SUCCESS. Results are undefined if snd_flush() is called on a non-open snd_type.

long snd_bytes_per_frame(snd_type snd);

Calculates the number of bytes in a frame (a frame has one sample per channel; sound files are stored as a sequence of frames).

char *snd_mode_to_string(long mode);

Returns a string describing the mode (SND_MODE_PCM, etc.).

int snd_device(int n, char *interf, char *device);

Sets strings describing the n-th audio device. interf is set to the interface name and device is set to the device name. Both should be allocated to be at least snd_string_max bytes in length. Returns NULL if n is greater or equal to the number of audio devices. Available devices are numbered, starting with the default device at n=0. Before opening an audio device, an application can use this to enumerate all possible devices, select one (e.g. by presenting a list to the user), and then copy the strings into the devicename and interfacename fields of the snd_type structure. If the devicename field is the empty string, device 0 will be opened.

Unknown File Types

Normally, the caller does not need to know anything about a file other than its name in order to open and read it. The library figures out the file encoding from header information. Only in the case of headerless files does the caller need to supply format parameters. If the file format is not recognized, snd_open() opens the file according to the supplied format parameters. If the file format is recognized but not supported, snd_open() will return SND_FAILURE, and the flags parameter will indicate what format information was detected. For example, if the number of channels is 4, format.channels is set to 4 and the SND_HEAD_CHANNELS bit is set in flags. If the name of the format is known, e.g. "MP3," but there is no decoder for the format, then the filetype field is set to that string name, and the SND_HEAD_FILETYPE bit is set in flags.

To detect and report an unknown file type, test the result of snd_open() for SND_SUCCESS. If not successful, test flags to see what information is valid. You may wish to report back to the user, especially if the SND_HEAD_FILETYPE bit is set.

Examples

See convert.c for examples of:

Compiling the source code

To compile convert.c under Visual C++, add all the .c files to a console application project and add these libraries to the Object/library modules list under the Link tab in the Project Settings dialog box: winmm.lib wsock32.lib. If wsock32.lib does not work, try ws2_32.lib. The purpose of these libraries is to get functions that perform byte swapping in a system-independent way.

Inner architecture description

To modify or extend the Sndlib code, it is important to understand the architecture and design. The main issues are the structure used to obtain portability, and the support for multiple device interfaces within a given system.

Portability

The include file snd.h declares most of the library structures and routines. snd.h includes sndconfig.h, which handles system dependencies.

System-dependent code is selected using conditional compilation. The following compile-time symbols are defined:

sndconfig.h is responsible for defining a number of routines and/or macros, including the macro that selects the system. E.g. under Visual C++, the macro _WIN32 is defined, so sndconfig.h defines WIN32. If _WIN32 is defined. The other routines and macros to be defined are described in sndconfig.h itself. To avoid too many conditional compilation statements that make code hard to read, sndconfig.h works by conditionally including another .h file. The files are named sndwin32.h, sndlinux.h, sndirix.h, and sndmac.h, and other systems should implement include files in the same manner.

In addition to the Unix, Windows, Macintosh distinction, this library also supports WX, a graphical user interface library. WX provides its own routines for file IO, and WX runs under Windows, Unix, and the Macintosh. This adds some confusion because WX functions cut across the Windows, Unix, Macintosh spectrum for things like file IO, but WX functions do not implement audio, so we still need to distinguish systems. To handle WX, a set of IO functions have been created, e.g. snd_open(), snd_read(), and snd_write(), and these are defined in sndwin32.h etc. only if WX is not defined. If WX is defined, then another include file sndwx.h is included into sndconfig.h to define the IO routines independently of what system is being compiled.

Multiple interface support

In the original designed, it was assumed that each operating system would provide one and only one audio interface, and this library would provide an abstract layer above that. It turned out that many operating systems offer multiple interfaces, e.g. Windows has both the multimedia interface and DirectSound, and Linux has several competing interfaces. Windows machines also have ASIO and other interface possibilities.

To support multiple interfaces, the library has the call snd_devicename() which returns the name of the nth audio device. Where do these names come from? System-specific parts of the library call a non-system specific function as follows:

/* these types are for internal use: */
typedef int (*snd_reset_fn)(snd_type snd);
typedef long (*snd_poll_fn)(snd_type snd);
typedef long (*snd_read_fn)(snd_type snd, void *buffer, long length);
typedef long (*snd_write_fn)(snd_type snd, void *buffer, long length);
typedef int (*snd_open_fn)(snd_type snd, long *flags);
typedef int (*snd_close_fn)(snd_type snd);
typedef int (*snd_reset_fn)(snd_type snd);
typedef int (*snd_flush_fn)(snd_type snd);

typedef struct {
    snd_reset_fn reset;
    snd_poll_fn poll;
    snd_read_fn read;
    snd_write_fn write;
    snd_open_fn open;
    snd_close_fn close;
    snd_reset_fn reset;
    snd_flush_fn flush;
} snd_fns_node, *snd_fns_type;

void snd_add_device(char *devicename, snd_fns_type dictionary);

This is called for each different device or interface. In the general case, there might be several physical devices, each supporting several logical devices (front stereo, rear stereo, and quad), and there might be several different system APIs that access these (MM and DirectSound). The system-specific code provides a string name for each of these and a dictionary of function pointers for each.

When does the system-specific code call snd_add_device()? When either snd_open() or snd_devicename() is called for the first time, a call is made to snd_init(), which is defined in system-specific code. snd_init() is responsible for calling detection and initialization code for each supported device.

The snd_fns_node structure contains function pointers that implement the library functions. A pointer to this structure is found in the snd_type structure which is passed to nearly every library function. These library functions are implemented by making indirect calls through these function pointers.

Note that most of these functions take byte-counts as length parameters rather than frames. This is because standard system calls such as read() and write() use bytes.