Introduction
Delay effect design draws on a number of previously explored techniques. Wavetable interpolation will be used to read samples from the delay. Compression and saturation will be used to maintain the gain within the feedback loop. Filtering will be used to remove high or low frequencies from the echoes. A delay starts with a fairly straightforward design, but becomes complex with the integration of these techniques, and gathers its individual character as design decisions are made.
Design
To create a delay, we need a sample buffer to hold audio until the delayed signal is played. Input samples will continually be written to this buffer, and output samples will continually be read from this buffer. To do this, point the input sample to a starting location, write the sample, and move the input pointer one sample over. The output pointer is positioned relative to the input pointer, with the delay time in samples separating them. If the delay time is 1.4 seconds, and the sample rate is 48k, the separation between input and output pointer is 1.4 x 48000 or 67200 samples.

If unlimited storage is available, the input pointer and output pointer could move indefinitely. A large storage space could be used: a 1TB flash drive could run a delay at 48k/32 bit for 30 days before running out of memory. However, much smaller and faster memory is more typically used for the sample buffer. Because of this, it is typical to reuse the memory – after the input pointer reaches the last sample, it resets to the first sample. This is called a circular buffer. A large enough buffer is needed for the longest delay time desired. For a 42 second delay, with a sample rate of 48k, enough memory to hold 42 * 48,000 or 2,016,000 samples is needed. For sample addressing convenience (discussed later), we will use the nearest power of 2. In this case it is 2^21 or 2,097,152 samples.

Implementation
On embedded processors, it is typical to use physical memory for the circular buffer. In that case we could just use an array declaration. For a plugin or app, memory allocation is necessary. The functions malloc (C) or new (C++) would be used to create the sample buffer.
// C example of delay memory allocation float *delayBuffer; long delaySize, delayMask; delaySize = 2097152; delayMask = delaySize - 1; delayBuffer = (float *)malloc(delaySize * sizeof(float));
Pointer arithmetic is used to determine the memory location to write the input sample. After the sample is written, writePointer is moved one sample. In the diagrams above writePointer is incremented. However, it makes our program simpler if writePointer is decremented. This is because readPointer is calculated by adding the delay in samples to writePointer. After decrementing, a bitwise AND is used to maintain the circular buffer (to make sure writePointer is between 0 and delaySize).
// C example of putting sample in delayBuffer, // decrementing and wrapping writePointer *(delayBuffer + writePointer) = inputSample; writePointer--; writePointer &= delayMask;
To complete our echo, the delayed sample should be read out using readPointer. Its position is relative to writePointer: readPointer = writePointer + (delayTime * sampleRate) . As with writePointer, a bitwise AND will keep readPointer between 0 and delaySive.
readPointer = writePointer + (long)(delayTime*sampleRate); readPointer &= delayMask; outputSample = *(delaybuffer+readPointer); *(delaybuffer + writePointer) = inputSample; writePointer--; writePointer &= delayMask;
Note that the delay buffer is read before it is written. The reason for this will become apparent when feedback is added to create multiple echoes.
