Audio File, Format Conversion, and I/O Utilities Roger Dannenberg 18 June 97 Revised 6 July 97 Revised 7 May 00 with multiple interface support and inner architecture documentation This document describes a set of portable C utilities for digital audio input and output to and from files and audio interfaces. The goals are to be able to read and write sound files in a variety of formats and to play and record audio. This code is intended for use in Nyquist, Aura, Amulet, and other systems, and should be portable to virtually any computer system that supports C and has a file system. Overview: There is basically one interesting data type: snd_type is a pointer to a descriptor for an audio stream, which is either being read from or written to a file or audio interface. The snd_type contains a structure that describes the sample format, sample rate, number of channels, etc. Routines exist to initialize sound transfer (snd_open()), perform transfers (snd_read(), snd_write()) and to finalize a transfer (snd_close()). Other routines allow you to transfer data to/from buffers and to convert formats. Sample rate conversion is not currently supported, but would be a welcome addition. typedef struct { long channels; /* number of channels */ long mode; /* ADPCM, PCM, ULAW, ALAW, FLOAT, UPCM */ long bits; /* bits per sample */ double srate; /* sample rate */ } format_node; typedef struct { short device; /* file, audio, or memory */ short write_flag; /* SND_READ, SND_WRITE, SND_OVERWRITE */ union { struct { char filename[258]; /* file name */ int file; /* OS file number */ long header; /* None, AIFF, IRCAM, NEXT, WAVE */ long byte_offset; /* file offset of first sample */ long end_offset; /* byte_offset of last byte + 1 */ } file; struct { long buffer_max; /* size of buffer memory */ char *buffer; /* memory buffer */ long buffer_len; /* length of data in buffer */ long buffer_pos; /* current location in buffer */ } mem; struct { char devicename[258]; /* (optional) to specify device */ void *descriptor; long protocol; /* SND_REALTIME or SND_COMPUTEAHEAD */ double latency; /* app + os worst case latency (seconds) */ double granularity; /* expected period of app computation (s) */ /* note: pass 0.0 for default latency and granularity */ } audio; } u; format_node format; /* sample format: channels, mode, bits, srate */ } snd_node, *snd_type; The meanings of fields are as follows: device: one of SND_DEVICE_FILE (data to/from file), SND_DEVICE_AUDIO (data to/from audio I/O device), SND_DEVICE_MEM (data to/from in-memory buffer), or SND_DEVICE_NONE (records that snd_open failed). write_flag: one of SND_WRITE (create a file and write to it), SND_READ (read from a file), SND_OVERWRITE (overwrite some samples within a file, leaving the header and other samples untouched). format: contains number of channels, mode, number of bits per sample, and sample rate. mode is SND_MODE_ADPCM (adaptive delta modulation), SND_MODE_PCM (pulse code modulation, i.e. simple linear encoding), SND_MODE_ULAW (Mu-Law), SND_MODE_ALAW (A-Law), SND_MODE_FLOAT (float), or SND_MODE_UPCM (unsigned PCM) These are fields for SND_DEVICE_FILE: filename: string name for file. file: the file number or handle returned by the operating system. header: the type and format of header, one of SND_HEAD_NONE (no header), SND_HEAD_AIFF, SND_HEAD_IRCAM, SND_HEAD_NEXT (Sun and NeXT format), or SND_HEAD_WAVE. byte_offset: the byte offset in the file. After opening the file, this is the offset of the first sample. This value is updated after each read or write. end_offset: offset of the byte just beyond the last byte of the file. These are fields for SND_DEVICE_AUDIO devicename: string name for device (to select among multiple devices). This may be set to the empty string (devicename[0] = 0) to indicate the default audio device, or it may be set to a name obtained from snd_devicename(). descriptor: a field to store system-dependent data protocol: SND_REALTIME (use this if you are trying to compute ahead by a constant amount, especially for low-latency output) or SND_COMPUTEAHEAD (use this if you want to keep output buffers as full as possible, which will cause greater compute-ahead). latency: (minimum) amount to be kept in buffer(s). This should be at least as great as the longest computational delay of the application PLUS the worst case latency for scheduling the application to run. granularity: expected period of the periodic computation that generates samples. Also, granularity indicates the largest number of samples that will be written with snd_write or read with snd_read by the application. The following fields are for SND_DEVICE_MEM (in-memory data): buffer_max: the size of the buffer memory (in bytes) buffer: the memory buffer address buffer_len: the length of data in the buffer buffer_pos: the current location of the input/output in memory. Routine descriptions: int snd_open(snd_type snd, long *flags); To open a file, fill in fields of a snd_type and call snd_open. If there is header information in the file or device characteristics for the audio interface, fields of snd are filled in. The flags parameter tells which fields were specified by the snd_open process. E.g. if you open a raw file, there is no header info, so the format will be as specified in snd. On the other hand, if you open an AIFF file, the file will specify the sample rate, channels, bits, etc., so all these values will be written into snd, and bits will be set in flags to indicate the information was picked up from the file. Returns SND_SUCCESS iff successful. If not successful, attempts to open a file will place the return code from open() into the u.file.file field. Before calling snd_open, all general fields and fields corresponding to the device (e.g. u.file for SND_DEVICE_FILE) should be set, with the following exceptions: u.file.header (for SND_WRITE), byte_offset, end_offset, descriptor. NOTE: do not call snd_open for SND_DEVICE_MEM, just fill in the fields. u.mem.buffer_len is the write pointer (snd_write() data goes here), and u.mem.buffer_pos is the read pointer (snd_read() data comes from here). NOTE 2: for SND_DEVICE_MEM, you can set write_flag to SND_WRITE, write data into the buffer, then set write_flag to SND_READ and read the buffer. Use snd_reset() before reading the buffer again. int snd_close(snd_type snd); Closes a file or audio device. There is no need to call snd_close for SND_DEVICE_MEMORY, but this is not an error. Returns SND_SUCCESS iff successful. int snd_seek(snd_type snd, double skip); After opening a file for reading or overwriting, you can seek ahead to a specific time point by calling snd_seek. The skip parameter is in seconds. Returns SND_SUCCESS iff successful. int snd_reset(snd_type snd); Resets non-file buffers. If snd has SND_DEVICE_AUDIO, then the sample buffers are flushed. This might be a good idea before reading samples after a long pause that would cause buffers to overflow and contain old data, or before writing samples if you want the samples to play immediately, overriding anything already in the buffers. If snd has SND_DEVICE_MEM and SND_READ, then the buffer read pointer (buffer_pos) is reset to zero. If SND_WRITE is set, then the buffer read pointer (buffer_pos) and write pointer (buffer_len) are reset to zero. If snd has SND_DEVICE_FILE, nothing happens. Returns SND_SUCCESS iff successful. long snd_read(snd_type snd, void *buffer, long length); Read up to length bytes into buffer. Returns the number of bytes actually read. int snd_write(snd_type snd, void *buffer, long length); Writes length bytes from buffer to file or device. Returns number of bytes actually written. long snd_convert(snd_type snd1, void *buffer1, long length1, snd_type snd2, void *buffer2, long length2); To read from a source and write to a sink, you may have to convert formats. This routine provides simple format conversions according to what is specified in snd1 and snd2. The maximum length of buffer2 is passed in length2 and the actual number of bytes written to buffer2 is returned from the call. It is an error if buffer2 is too short for the full conversion of buffer1. long snd_poll(snd_type snd); The standard way to play files is to put something in the event loop that refills an output buffer managed by the device driver. This routine allows you to ask whether there is space to output more samples. If SND_REALTIME is selected, the number returned by snd_poll will grow fairly smoothly at the data rate, i.e. if the data rate is 8KB/s, then the result of snd_poll will increase by 8 bytes per millisecond. On the other hand, if SND_COMPUTEAHEAD is selected, then snd_poll will return zero until a sample buffer becomes available, at which time the value returned will be the entire buffer size. Note: some low-level functions are implemented for conversion from buffers of floats to various representations and from these representations back to floats. See snd.h for their declarations. int snd_flush(snd_type snd); When the device is SND_DEVICE_AUDIO, writes are buffered. After the last write, call snd_flush() to transfer samples from the buffer to the output device. snd_flush() returns immediately, but it only returns SND_SUCCESS after the data has been output to the audio device. Since calling snd_close() will terminate output, the proper way to finish audio output is to call snd_flush() repeatedly until it returns SND_SUCCESS. Then call snd_close() to close the audio device and free buffers. If snd_flush is called on any open snd_type other than a SND_DEVICE_AUDIO opened for output, it returns SUCCESS. Results are undefined if snd_flush is called on a non-open snd_type. long snd_bytes_per_frame(snd_type snd); Calculates the number of bytes in a frame (a frame has one sample per channel; sound files are stored as a sequence of frames). char *snd_mode_to_string(long mode); Returns a string describing the mode (SND_MODE_PCM, etc.). char *snd_devicename(int n); Returns a string describing the n-th audio device. Returns NULL if n is greater or equal to the number of audio devices. Available devices are numbered, starting with the default device at n=0. Before opening an audio device, an application can use this to enumerate all possible devices, select one (e.g. by presenting a list to the user), and then copy the string into the devicename field of the snd_type structure. If the devicename field is the empty string, device 0 will be opened. It is easy to construct a higher-level function to play a file, e.g. aio_node my_player; aio_play_init(&player, "mysound.wav"); playing = TRUE; Then, in the polling loop: if (playing) { if (!aio_play_poll(&player)) playing = FALSE; } Examples: see convert.c for examples of: Printing information about a sound file Converting sound file formats Playing audio from a file Reading audio from audio input To compile convert.c under NT: add all the .c files to a console application project and add these libraries to the Object/library modules list under the Link tab in the Project Settings dialog box: winmm.lib ws2_32.lib Inner architecture description Audio buffer and time management