Skip to content

Instantly share code, notes, and snippets.

@bojanrajkovic
Created February 20, 2018 03:41
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bojanrajkovic/ce241b23b7424ddccec7becb5747a58d to your computer and use it in GitHub Desktop.
Save bojanrajkovic/ce241b23b7424ddccec7becb5747a58d to your computer and use it in GitHub Desktop.
A workbook on generating sound
uti platforms
com.xamarin.workbook
Console

Generating sounds in C#

Sound is all around us—very few of us ever spend time in a totally quiet room. Similarly few of us ever consider what sounds are made up of. In this workbook, we’ll explore generating some simple tones, including a few different kinds of sound waves: square waves, sine waves, and triangle waves.

Some basics about sound

Sounds are entirely composed of two things: frequency and amplitude.

Frequency determines how high or low the sound is, and is expressed in hertz (Hz). Theoretically, it starts at 1 Hz, but our human range of hearing is roughly 20 Hz to 20 kHz (k here being the SI prefix for thousand).

Lower numbers are “lower” sounds—a kick drum produces frequencies from around 20 Hz to 100 Hz. These are your floor-shaking beats.

On the high end, the highest note ever sung by a human belongs to Gloria Brown, who’s managed what’s known in musical terms as a G10. This is a G note in the 10th octave of the equal temperament system of tuning. Equal temperament is defined by being increasing multiples of the twelfth root of two in its most common appearance—it’s known as 12-TET, and we’ll refer to it as such going forward. This equal division means that it’s easy to calculate any note’s frequency from any other.

Gloria Brown’s G10 is well off the high end of a standard piano (C8)—a full two and a half octaves. There’s no easy frequency table that goes to here, so we’re going to have to make our own calculation!

The formula for calculating a 12-TET note’s frequency from a known one is as follows: f_n = f_0 * (2^(1/12))^n. Here, f_0 is the “known” frequency, f_n is the target note’s frequency, and n is the number of half-steps between the two notes.

In this case, we know that there are 12 half steps per octave (in other TET systems, there may be different numbers of steps!), and we can calculate that there are 6 whole octaves (72 steps) to A10, and then 2 steps back to G10, meaning n = 70. We’ll define a function to calculate f_n, and we can even make it somewhat generic to different TET systems.

double CalculateFrequencyOfNote(int numberOfHalfSteps, double baseFrequency = 440.0, int numberOfStepsPerOctave = 12)
{
    // This is our (2^(1/12))^n step
    double tetMultiple = Math.Pow(2, 1/(double)numberOfStepsPerOctave);
    // And this is the f_0 multiplication.
    return baseFrequency * Math.Pow(tetMultiple, numberOfHalfSteps);
}

Now, let’s use our defined function to calculate the frequency of that G10.

double g10 = CalculateFrequencyOfNote(numberOfHalfSteps: 70, baseFrequency: 440.0, numberOfStepsPerOctave: 12);

This means that the frequency of the note Brown sang was around 25.1 kHz! This is above the range of human hearing, and in fact, scientists used special equipment to verify that she was able to reach that pitch.

Amplitude described how loud a sound is—it’s usually measured by measuring air pressure in the wave, or the displacement of air or a speaker diaphragm. Our common volume measurement, the decibel (dB) is typically calculated by taking the base 10 logarithm of the amplitude squared. The amplitude measured here is typically a ratio between the largest and smallest point on the wave, so a volume of 100 dB means there was a 100,000:1 ratio between those two absolute amplitudes.

Digital Sound

Now that we’ve covered analog sound, we need to cover digital sound. Digital sound, as a representation of analog sound, is composed of some number of samples per second of the analog sound. This sampling has some interesting mathematical properties—we won’t go over them here due to their complexity, but the most important part is the Nyquist-Shannon sampling theorem and its implications.

Digital sound is composed of a few baasic properties:

  • Sample rate: this is the number of samples we are going to take along our “analog” curve (referred to as a continuous-time signal in the Nyquist-Shannon theorem). Expressed as a number “per second”, or Hertz (Hz). For this example, we’ll use 8000. Modern digital music is sampled mostly at 44.1 kHz or 48 kHz, though high-quality digital music can come in at 96 kHz or 192 kHz. Even the lowest of these frequencies (44.1 kHz) is enough to cover the range of human hearing.

  • The “bit depth” of the sound—this tells us the dynamic range of the signal (i.e. the total set of amplitudes we can produce). We don’t need a defined constant for it, but it’s implicit in our choice of short to serve as a buffer in which to store samples. This means we have 65535 possible amplitudes that we can generate.

  • The frequency of the generated sound. In this case, we’re going to generate a 1000 Hz sound. This is just slightly lower than a C6, usually considered to be a fairly high voice—well into the soprano range. The highest note typically called for in the voice repertoire is an F6, during an aria sung by the Queen of the Night in Mozart’s The Magic Flute. I highly recommend watching this video, but turn down the volume first.

Let’s define a function again that will take some parameters (our sample rate and frequency), and generate a sine wave with it. Sine waves use the mathematical sine function to generate samples along the frequency curve, and are among the more common ways to generate sound. We’ll revisit different types of waves later in this workboook. The function will return to us a buffer filled with however many seconds of sound we specify. We don’t want to customize the amplitude right now, so we’ll make that a constant at 1/4 of the maximum. The function to generate a sample from a sine wave is as follows:

y(t) = A\sin(2 \pi f t + \varphi) = A\sin(\omega t + \varphi)

where f is the frequency, t is the time component (in our case the sample number), and \varphi is the phase of the wave. We’re going to ignore phase for the time being, so our formula simplifies to the amplitude multiplied by the sine of 2 times π times the frequency we want times the sample number.

short[] GenerateSound(int sampleRate = 8000, double frequency = 1000, TimeSpan? length = null)
{
    // If no length was given, assume 1 second.
    length = length ?? TimeSpan.FromSeconds(1);

    // Allocate our buffer. Truncate the number of seconds to an integer.
    var buffer = new short[sampleRate * (int)length.Value.TotalSeconds];

    // Constant amplitude
    const double amplitude = 0.25 * short.MaxValue;

    // Now all that's left is to fill the buffer!
    for (int n = 0; n < buffer.Length; n++)
        buffer[n]  = (short)(amplitude * Math.Sin((2 * Math.PI * n * frequency) / sampleRate));

    return buffer;
}

In order to play our sound, however, we must convert it to some audio container format that our computer can play. WAV is an easy format that all computers will understand, so let’s write some really quick code. I’m not going to explain all the vagaries of the WAV format, but will try to keep this code commented so that if you want to learn the format, you can read and modify this code.

using System.IO;
using System.Linq;
using System.Text;

Stream GenerateWavStream(short[] soundSamples, TimeSpan duration)
{
    var output = new MemoryStream();
    using (var writer = new BinaryWriter(output, Encoding.ASCII, leaveOpen: true)) {
        // We need to define a header size, a format chunk size and a size for the "WAVE" magic.
        // The header is always 8 bytes, and the "WAVE" size is 4 bytes (4 characters).
        // The format chunk size is also a known size, 16 bytes.
        const int headerSize = 8, waveSize = 4, formatChunkSize = 16;

        // We also need to define the format type, and the number of tracks. Format type
        // 1 is PCM, and we have only a single track.
        const short formatType = 1, tracks = 1;

        // We know we have 16 bits per sample, so we can calculate our expected frame
        // size. The frame size is the number of tracks multiplied by the number of
        // bytes per sample. Adding 7 and then dividing by 8 ensures that if we have
        // samples that are, say 17 bits, we'll allocate the appropriate 3 bytes to
        // write them.
        const short bitsPerSample = 16;
        short frameSize = (short)(tracks * ((bitsPerSample + 7) / 8));
        
        // Knowing the frame size and the number of samples, we can calculate the data
        // chunk's size.
        int samplesPerSecond = soundSamples.Length / (int)duration.TotalSeconds;
        int bytesPerSecond = samplesPerSecond * frameSize;
        int dataChunkSize = soundSamples.Length * frameSize;

        // Next we need to compute the file size. The file size is the
        // magic number size, plus the format chunk header size, plus the format chunk
        // size, plus the data chunk header size, plus the data chunk size.
        int fileSize = waveSize + headerSize + formatChunkSize + headerSize + dataChunkSize;

        Console.WriteLine ($"Frame size: {frameSize}");
        Console.WriteLine ($"Samples per second: {samplesPerSecond}");
        Console.WriteLine ($"Bytes per second: {bytesPerSecond}");
        Console.WriteLine ($"Data Chunk size: {dataChunkSize}");
        Console.WriteLine ($"File size: {fileSize}");

        // OK, let's write our file! The WAVE format is a subset of the
        // RIFF specification, so let's start by saying we're in a RIFF file...
        writer.Write(Encoding.ASCII.GetBytes("RIFF"));
        
        // Next, we'll write the computed file size...
        writer.Write(fileSize);

        // Introduce ourself as a WAVE file.
        writer.Write(Encoding.ASCII.GetBytes("WAVE"));

        // The fmt subchunk is the first one in a WAVE file. It contains information
        // about the audio format, number of tracks, sample rate, byte rate, frame size,
        // and bits per sample. The extra space at the end of "fmt " is important, as
        // all chunks/subchunk names are in the "FourCC" format, which requires that they
        // be 4 characters long, and padded with spaces if they are not.
        writer.Write(Encoding.ASCII.GetBytes("fmt "));
        writer.Write(formatChunkSize);
        writer.Write(formatType);
        writer.Write(tracks);
        writer.Write(samplesPerSecond);
        writer.Write(bytesPerSecond);
        writer.Write(frameSize);
        writer.Write(bitsPerSample);

        // Then, the data subchunk. It contains the actual PCM data that is our sound.
        writer.Write(Encoding.ASCII.GetBytes("data"));
        writer.Write(dataChunkSize);
        for (int i = 0; i < soundSamples.Length; i++)
            writer.Write(soundSamples[i]);
    }

    // Reset the stream, and return it.
    output.Seek (0, SeekOrigin.Begin);
    return output;
}

Now let’s use this to get a Stream, from which we can play a sound!

var sound = GenerateSound(44100, 440, TimeSpan.FromMinutes(1));
var stream = GenerateWavStream(sound, TimeSpan.FromMinutes(1));

var tempFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), $"Sound");
using (var tempStream = File.Open(tempFile, FileMode.Create))
    await stream.CopyToAsync(tempStream);

Console.WriteLine($"Wrote to file {tempFile}.");
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment