Wave input/output in Windows

by W.A. Steer PhD
Back to contents About...

For the programmer, Windows' Multimedia API functions make it straightforward to record or replay sound samples using your PC's soundcard, and hence opens up many opportunities in digital audio analysis and processing. This page shows you how to get started.

Introduction

My Audio Tools and Toys page shows some examples of what is possible once Windows' wave sound I/O has been mastered, and includes an audio-frequency oscilloscope, frequency counter, spectrum analyser, tone generator, etc. In response to emails asking how I did it, I'm writing this page!

Basics

Sound is recorded digitally by sampling the audio signal (waveform) many times per second; commonly 8000, 11050, 22100, 44100, or 48000 times per second. This is described as a sampling rate of 8, 11.05, 22.1, 44.1, or 48 kilohertz (kHz). The higher the sampling rate, the more accurately the sound is defined, and the better the quality. Quality is also affected by the resolution of the digitisation process, usually described 8-bit or 16-bit. An 8-bit number can have one of 256 values (2 to the power of 8), while a 16-bit number has one of 65536 values (2^16). Clearly more bits gives more accuracy, and inherently a greater dynamic range - that is, difference between the loudest and quietest sound which can be coded. The dynamic range of the coding doubles (or is said to increase by 6 decibels (dB)) for each bit. In practice, particularly with cheap soundcards there is commonly some kind of background noise which masks the quieter sounds, and reduces the useful dynamic range.

Examples:

Application Sampling rate Resolution Dynamic range Channels

Telephone quality speech 8kHz 8-bit 48dB Mono

CD quality music 44.1kHz 16-bit 96dB Stereo

On the PC, 8-bit sound samples are always stored as 'unsigned integers', from 0 to 255 (and centred around 127), while 16-bit samples are 'signed integers' from -32768 to +32767 (centred on 0).

A sound clip is stored as a block of memory consisting of one sample after the next in time-sequence, unless the clip is stereo, when samples are given in the order LR,LR,LR,LR,... In a standard (uncompressed) .WAV file, the sound sample itself is preceeded by a header which identifies parameters such as the number of channels (i.e. mono or stereo), the sampling rate, and the bit resolution.

Programming Windows Wave I/O API functions

My programs were written using Borland's C++ Builder; the examples should work essentially unchanged with any Windows C++ compiler, such as MS Visual C++. If you program using some other language (e.g. Delphi or Visual Basic) then the syntax of the commands and memory allocation will be different (please don't ask me for advice!), though the gist of the procedure will be similar.

Commands you will need include:

waveOutOpen
waveOutPrepareHeader
waveOutWrite
waveOutReset

waveInOpen
waveInPrepareHeader
waveInAddBuffer
waveInStart
waveInStop
waveInReset

I'll introduce the main calls and parameters below, but for a full understanding you really should look up the definitions in the Windows Multimedia Help file supplied with your programming language.

The following example is a bare-bones routine to capture a short sound-sample from the currently-selected input (or "Recording") mix and store it in a memory array.

In this very simple program, the execution of the program will pause (and so will become non-interactive) during the recording.

 #include <mmsystem.h>

 const int NUMPTS = 44100 * 10;   // 10 seconds
 int sampleRate = 44100;
 short int waveIn[NUMPTS];   // 'short int' is a 16-bit type; I request 16-bit samples below
                             // for 8-bit capture, you'd use 'unsigned char' or 'BYTE' 8-bit types

 HWAVEIN      hWaveIn;
 WAVEHDR      WaveInHdr;
 MMRESULT result;

 // Specify recording parameters
 WAVEFORMATEX pFormat;
 pFormat.wFormatTag=WAVE_FORMAT_PCM;     // simple, uncompressed format
 pFormat.nChannels=1;                    //  1=mono, 2=stereo
 pFormat.nSamplesPerSec=sampleRate;      // 44100
 pFormat.nAvgBytesPerSec=sampleRate*2;   // = nSamplesPerSec * n.Channels * wBitsPerSample/8
 pFormat.nBlockAlign=2;                  // = n.Channels * wBitsPerSample/8
 pFormat.wBitsPerSample=16;              //  16 for high quality, 8 for telephone-grade
 pFormat.cbSize=0;

 result = waveInOpen(&hWaveIn, WAVE_MAPPER,&pFormat,
            0L, 0L, WAVE_FORMAT_DIRECT);
 if (result)
 {
  char fault[256];
  waveInGetErrorText(result, fault, 256);
  Application->MessageBox(fault, "Failed to open waveform input device.",
              MB_OK | MB_ICONEXCLAMATION);
  return;
 }

 // Set up and prepare header for input
 WaveInHdr.lpData = (LPSTR)waveIn;
 WaveInHdr.dwBufferLength = NUMPTS*2;
 WaveInHdr.dwBytesRecorded=0;
 WaveInHdr.dwUser = 0L;
 WaveInHdr.dwFlags = 0L;
 WaveInHdr.dwLoops = 0L;
 waveInPrepareHeader(hWaveIn, &WaveInHdr, sizeof(WAVEHDR));

 // Insert a wave input buffer
 result = waveInAddBuffer(hWaveIn, &WaveInHdr, sizeof(WAVEHDR));
 if (result)
 {
  MessageBox(Application->Handle, "Failed to read block from device",
                   NULL, MB_OK | MB_ICONEXCLAMATION);
  return;
 }


 // Commence sampling input
 result = waveInStart(hWaveIn);
 if (result)
 {
  MessageBox(Application->Handle, "Failed to start recording",
                   NULL, MB_OK | MB_ICONEXCLAMATION);
  return;
 }


 // Wait until finished recording
 do {} while (waveInUnprepareHeader(hWaveIn, &WaveInHdr, sizeof(WAVEHDR))==WAVERR_STILLPLAYING);

 waveInClose(hWaveIn);

I get the impression that Microsoft doesn't recommend capturing too long a recording in one go using this method, though it will work for tens of seconds, maybe minutes.

The official way of collecting long sound clips is to set up several buffers (at least 2) and use the waveInAddBuffer(...) call to insert new buffers into the queue as old ones become full and released. This begins to get more complicated and requires you set up a callback function to handle the buffer-filled message. With this approach you can collect an arbitrarily-long sound clip, or perform some continuous processing such as that performed by my Musical Tuner and Spectrum Analyser applets. Furthermore your application doesn't then appear to 'hang' during the process. I may give some examples of this later.

Waveform sound output can be achieved in a very similar way.

to be continued...

Created: April 2002
Last modified: 7 September 2006
Source: http://www.techmind.org/wave/

Application	Sampling rate	Resolution	Dynamic range	Channels
Telephone quality speech	8kHz	8-bit	48dB	Mono
CD quality music	44.1kHz	16-bit	96dB	Stereo