sroccaserra/Amiga_samples_conversion.md

## Amiga_samples_conversion.md

      
    Raw
  

              Amiga_samples_conversion.md
            
          
    Archive:

https://archive.org/details/AmigaSTXX_originals_plus_conversions

These are 8 bit samples often used with the first Amiga trackers of the late
80s and early 90s, like Ultimate Soundtracker. In original and updated formats.
Following the info found in this other archive, I tried to accurately
convert the original files to modern & self documenting formats (.wav or
.aiff), so they can easily be used in modern DAWs or modern trackers, like
Renoise. Try to disable interpolation, and adding a 2 poles low pass filter at
around 7 kHz for an old school experience.
Disclaimer

The original files were not collected by me. They are on Aminet:

http://aminet.net/mods/inst.

I am not an Amiga or sound files expert, just an enthousiast and curious
developer. As such, I might have done a few errors. I would appreciate if more
knowledgeable people took the time to check my work, and if you happen to find
an error and have a way to fix it, please do so.
To gather some data about the original files, I explored mostly the ST-01 and
ST-02 directories for reference, then I did a few stats on all the
directories. But I didn't check the 10500+ original files individually.
Notes on the original files

First, using hexadecimal file viewers like GNU od and xxd, and D3.js to display
the data, I found that interpreting files as 8 bit signed ints shows nice
curves for some files.
IFF 8SVX files

I also found that a few of them (only 1 in the ST-01 directory) started by
the 'FORM' word, followed by '8SVX' a few bytes later. This indicates an IFF
header for sound data.
After a few more research (see reference links below), I found out that 8SVX
IFF files contain info about the bit precision, and more importantly the
sample rate of the original file.
A few stats: There are 4662 files with an IFF header in the original
collection, from 10500+ files total. So around 44 % of the originals are IFF
files, and 56 % are raw PCM data with no header.
Command to count files with an IFF header in the original files:
$ find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 grep -l '^FORM' | wc -l

Note: 22 files have the FORM keyword, but not at the start of the file (3 of them have several FORM keywords!).
$ find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 ggrep -obUa 'FORM' | cut -d':' -f1-2 | grep -v ':0$' | sort
./ST-05/cc1+2:12574
./ST-05/cc3+4:23399
./ST-05/cc3+4:35757
./ST-05/cc4:13200
./ST-05/cc4:842
./ST-06/jamigobass2:145
./ST-06/snare drum easy:255
./ST-06/voice 2:88
./ST-07/cc2:23
./ST-07/iso bdrum2:90
./ST-07/iso bdrum:447
./ST-07/iso explod:298
./ST-07/iso sdrum:372
./ST-07/iso shut2:443
./ST-07/iso shut:154
./ST-08/zent lion:6344
./ST-16/argh2:3185
./ST-18/gng-bass:104
./ST-18/gng-bdrum:284
./ST-18/gng-gui.sample:15214
./ST-18/gng-gui.sample:46
./ST-18/gng-gui.sample:6858
./ST-18/gng-piano-moll:180
./ST-18/tv-go:1065
./ST-18/tv-select:101
./ST-43/blast1:52

In addition, 17 more files have an 8SVXVHDR keyword, but not at the 8th byte which is the usual position:
ST-07/endblaster8:253
ST-12/desertsnare3:55
ST-20/supertomdrum:112
ST-31/adolf4:30576
ST-31/devestating:1032
ST-32/guitar3:62
ST-32/guitar4:62
ST-32/guitar5:62
ST-32/guitar6:62
ST-33/claps6:12
ST-34/ass:1034
ST-34/claps7:19
ST-34/claps8:59
ST-34/house2:10
ST-41/supercrash:74
ST-A9/WILDGITARB:112
ST-A9/WILDGUITARS:112

Then 30 more files have a VHDR chunk in wrong position (usually 12th byte):
ST-07/iso sdrum3:1
ST-15/a-snarek:116
ST-18/gng-eguitar:152
ST-18/tv-noise2:2914
ST-18/tv-noise:122
ST-18/tv-spectator:31356
ST-24/atom-piano:177
ST-24/bat-guitar:9227
ST-24/bat-sdrum:157
ST-26/jump-tomtom:122
ST-26/th-basscool:165
ST-27/animate-bass:157
ST-27/klax-klopfen2:118
ST-27/klax-klopfen3:117
ST-27/klax-typemachine:120
ST-38/jb-hit:133
ST-45/cadaver-pauke+drum:104
ST-45/puznic-bass:109
ST-45/puznic-bdrum1:117
ST-45/puznic-snare1:143
ST-46/battle-bdrum:126
ST-46/battle-sdrum:121
ST-51/robocop-slapbass:107
ST-51/robocop-tiptip:422
ST-52/lemmings-3egui:1747
ST-52/lemmings-3egui:859
ST-52/lemmings-awebdrum:21
ST-56/spysample:123
ST-56/spysample:13203
ST-56/spysnaredrum:20

Some of them could easily be fixed. Note: I didn't, but you can do it if you want. Have a look at the bytes around the positions I listed with a hex viewer.
Oktalyzer files

I then found out that some 8SVX files had the Oktalyzer string as annotation
(in the ANNO 8SVX chunk).
Most of them in the first bytes, but some of them later in the file. Probably samples exported or
simply chopped from Oktalyzer files, a late 80's Oktalyzer Amiga tracker as I
discovered.
$ find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 ggrep -obUa -l 'lyzer' | wc -l

Almost all of them (2317) are well formed 8SVX files and have the Oktalyzer
string at byte 76, in the ANNO 8SVX chunk.
Two of them have the string Octalyzer in a different place :
$ find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 rg -obUa 'Oktalyzer' | cut -d':' -f1-2 | rg -v ':76$'
./ST-08/zent lion:6420
./ST-79/HES.bigodnare2:164

The file ST-08/zent lion is 7696 bytes long, has no header at the start, and
has a well formed 8SVX header starting at byte 6344 (!). It is probably two
samples pasted together, the first without header and the second with a header.
The file ST-79/HES.bigodnare2 has an almost good header: it has two body
chunks. Probably an error.
Then three files only have the string lyzer in the middle of their data, with
no 8SVX header, probably indicating they where badly chopped from a 8SVX file
and pasted together :
$ find ST-29 -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 ggrep -obUa 'lyzer' | cut -d':' -f1-2 | grep -v ':80$'
ST-29/xen-megablast:10444
ST-29/xen-tapsample:16166
ST-29/xen-we...:3100

Notes on the conversion

Converting raw files

For the raw files, I wrote a Python Script that can:

read a raw file
generate an AIFF header with values corresponding to the raw data,
create an AIFF file, by pasting the header to the raw data.

This means that if you remove the generated header of the AIFF files, you
should find exactly the original raw file.
Here is the command I used, from the root of the archive:
$ find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 grep -ZL '^FORM' | xargs -0 -n1 -I {} python3 convert_to_aiff.py {}

The script is at the root of the archive. I would be grateful if someone would
check it and review it. It works only for raw 8 bit signed mono data.
Note: since I use the -Z grep option that is not supported by macOs's old
version of grep, I had to use ggrep, which points to GNU grep on my mac. You
can install most GNU replacements for macOs with Homebrew. If you're on a real
Linux, you're fine, your grep is probably a recent version of GNU grep.
Note: another option is to let sox do the job, below is a
convert_raw_to_44100_Hz_16_bit_wave.sh script that does just that.
Converting IFF files

Since IFF files have many optional chunks, I wanted to avoid writing a parser
for 8SVX header to extract the sample rate and other data about the IFF files.
So I used SoX (Sound eXchange) to read and convert all files starting by
a 'FORM' magic number header. SoX had no problem parsing them and converting
them to .wav.
IFF files can contain basic looping points, by spliting the data into a first
part (one shot) and a repeating part. The whole data should be present in the
.wav files, but if the original files contained looping points, I was not able
to preserve them.
Here is the command I used, from the root of the archive:
$ find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 grep -Zl '^FORM' | xargs -0 -n1 -I {} sox {} {}.wav

Notes on the sample rates

.wav files where converted from 8SVX IFF files using SoX. These files had
information about their sample rate, and should be tuned as well as their
original sample was.
Some stats : on the 4662 IFF files, 1783 are in 16726 Hz, and 2401 are in 8363
Hz. That would be 90 % of the IFF files.
.aiff files where generated from raw 8 bit signed PCM data, without header
and without info about their original sample rate. Since 90 % of the IFF files
had a sample rate of either 16726 Hz or 8363 Hz, I chose to write 16726 HZ in
the AIFF header for the .aiff files. If the sample plays too fast in your
DAW, try to play it an octave lower, it should sound ok in most cases.
Command to gather stats on the IFF files sample rates:
find . -type f -not -name '*.aiff' -not -name '*.wav' -print0 | xargs -0 ggrep -Zl '^FORM' | xargs -0n1 xxd -p -s32 -l2 | sort | uniq -c

Note: counts are decimal numbers, sample rates are hexadecimal numbers.
Using the original files in Renoise


https://www.renoise.com/

Renoise can load raw PCM data, so you can load the original files and tune them (sort of). This method should also work with other DAWs.
To do that, click on the *.* button to show the files without extension. Then right click on the original sample and choose Load file with Options....
For the raw PCM files

In the dialog box choose 8 bit signed and 11025 (Renoise doesn't propose 8363 or 16726 as sample rate, the sample rates of 90 % of the original files). Then add 7 semitones to the pitch, and the sample should be tuned: play C3 or C4 for the base note of the original.
Note: this should give you the exact same result than double clicking the .aiff converted files, without the additional semitones.
For the IFF original files

This is the remaining 44 % of the collection. The same should work with an aditional step: skip 104 header bytes in the load options. But the number of header bytes can vary, it should be four bytes after the BODY marker. You should check the header length with xxd, od, or hexedit on Linux / macOs, HexEdit or HxD on Windows.
For those, the problem is to identify that it's an IFF file in the first place, as there are no extensions in the original files. Again, your favourite hex editor is your friend (look for the first four bytes, they should spell FORM).
And if there is a loop start info in the IFF file, you loose it.
Note: this should give you the exact same result as double clicking on the corresponding .wav converted file, without the additional semitones. You also loose the potential loop start info if present, I couldn't find a  way to preserve that.
Renoise tool

I also wrote a Renoise tool to directly load the original files. For most of them it can guess the file format, and it preserves loop info in the 8SVX files. Downside: it does not allow to preview the file, you have to load it to hear it.
You can find it here:

https://github.com/sroccaserra/renoise-8-bit-sample-loader

References I used

Original files:

https://archive.org/details/AmigaSTXX
http://aminet.net/mods/inst.

About ST-01 samples:

http://www.polynominal.com/Commodore-Amiga/soundtracker-st01-original-synthesizer-source.html
https://modarchive.org/forums/index.php?topic=1577.0

About IFF and 8SVX:

https://wiki.amigaos.net/wiki/8SVX_IFF_8-Bit_Sampled_Voice
http://sox.sourceforge.net/AudioFormats-11.html#ss11.3
https://en.wikipedia.org/wiki/8SVX

About AIFF:

http://paulbourke.net/dataformats/audio/
https://github.com/audacity/audacity/blob/fa00dd0/lib-src/libsndfile/src/aiff.c
https://en.wikipedia.org/wiki/Audio_Interchange_File_Format

I did not explore enough, but Python can do a few things with IFF or AIFF
data and audio files in general:

https://docs.python.org/3/library/mm.html
https://docs.python.org/3/library/chunk.html
https://docs.python.org/3/library/sndhdr.html

Interesting series about trackers / Soundtracker:

https://xavier.borderie.net/blog/2021/07/11/soundtracking-sur-amiga-passion-explications-et-exemples/
https://xavier.borderie.net/blog/2021/09/22/soundtracker-origins-part-1-where-in-the-world-is-karsten-obarski/
https://xavier.borderie.net/blog/2023/01/01/soundtracker-origins-part-2-welcome-to-turrican-aah-hahahaha/
https://xavier.borderie.net/blog/2023/10/25/soundtracker-origins-part-3-facing-a-stone-mountain/


## convert_raw_to_44100_Hz_16_bit_wave.sh
#!/bin/bash

# Note: inspired by https://www.youtube.com/watch?v=eDCA1Tn52_E
# Note: since the source is 8 bit data, the "--endian big" option is not necessary.

mkdir -p ../wav

for i in *
do
  sox -r 8363 -c 1 -b 8 -e signed-integer -t raw -v 0.8 "$i" -b 16 -r 44100 -V3 "../wav/$i.wav"
done

## convert_raw_to_aiff.py
import argparse

"""

Works only for raw 8 bit signed mono data.

This script was writen as an easy way to convert original Amiga ST-XX raw
samples to a more documented format that keeps the original data untouched.

See also:
- http://paulbourke.net/dataformats/audio/
- https://github.com/audacity/audacity/blob/fa00dd0/lib-src/libsndfile/src/aiff.c
- https://archive.org/details/AmigaSTXX

"""

sample_rate_in_extended_precision = {
        11025: b'\x40\x0c\xac\x44\x00\x00\x00\x00\x00\x00',
        16000: b'\x40\x0c\xfa\x00\x00\x00\x00\x00\x00\x00',
        16726: b'\x40\x0d\x82\xac\x00\x00\x00\x00\x00\x00',
        22050: b'\x40\x0d\xac\x44\x00\x00\x00\x00\x00\x00',
        44100: b'\x40\x0e\xac\x44\x00\x00\x00\x00\x00\x00',
        }

DEFAULT_SAMPLE_RATE = 16726

def convert(input_file_name, output_file_name, sample_rate):
    sample_data = bytearray(open(input_file_name,  'rb').read())

    nb_samples= len(sample_data)

    form_chunk_size = 4 + 4 + 4
    comm_chunk_size = 4 + 4 + 2 + 4 + 2 + 10
    ssnd_chunk_size = 4 + 4 + 4 + 4

    total_size = form_chunk_size + comm_chunk_size + ssnd_chunk_size + nb_samples
    nb_channels = 1
    sample_size = 8

    form_chunk = b'FORM' + \
            (total_size - 8).to_bytes(4, byteorder='big') + \
            b'AIFF'

    comm_chunk = b'COMM\x00\x00\x00\x12' + \
            nb_channels.to_bytes(2, byteorder='big') + \
            (sample_rate*nb_channels).to_bytes(4, byteorder='big') + \
            sample_size.to_bytes(2, byteorder='big') + \
            sample_rate_in_extended_precision[sample_rate]

    ssnd_chunk = b'SSND' + \
            (total_size-34).to_bytes(4, byteorder='big') + \
            b'\x00\x00\x00\x00' + \
            b'\x00\x00\x00\x00'

    header = form_chunk + comm_chunk + ssnd_chunk

    open(output_file_name, 'wb').write(header+sample_data)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('input_file')
    parser.add_argument('-o', '--output-file', help='The name of the output file (default is the name of the input file with ".aiff" extension)')
    parser.add_argument('-r', '--sample-rate', help='The sample rate of the source in Hz. The default is {}.'.format(DEFAULT_SAMPLE_RATE), default=DEFAULT_SAMPLE_RATE)
    args = parser.parse_args()

    input_file_name = args.input_file
    output_file_name = args.output_file or input_file_name + '.aiff'

    convert(input_file_name, output_file_name, args.sample_rate)
	#!/bin/bash

	# Note: inspired by https://www.youtube.com/watch?v=eDCA1Tn52_E
	# Note: since the source is 8 bit data, the "--endian big" option is not necessary.

	mkdir -p ../wav

	for i in *
	do
	sox -r 8363 -c 1 -b 8 -e signed-integer -t raw -v 0.8 "$i" -b 16 -r 44100 -V3 "../wav/$i.wav"
	done
	import argparse

	"""

	Works only for raw 8 bit signed mono data.

	This script was writen as an easy way to convert original Amiga ST-XX raw
	samples to a more documented format that keeps the original data untouched.

	See also:
	- http://paulbourke.net/dataformats/audio/
	- https://github.com/audacity/audacity/blob/fa00dd0/lib-src/libsndfile/src/aiff.c
	- https://archive.org/details/AmigaSTXX

	"""

	sample_rate_in_extended_precision = {
	11025: b'\x40\x0c\xac\x44\x00\x00\x00\x00\x00\x00',
	16000: b'\x40\x0c\xfa\x00\x00\x00\x00\x00\x00\x00',
	16726: b'\x40\x0d\x82\xac\x00\x00\x00\x00\x00\x00',
	22050: b'\x40\x0d\xac\x44\x00\x00\x00\x00\x00\x00',
	44100: b'\x40\x0e\xac\x44\x00\x00\x00\x00\x00\x00',
	}

	DEFAULT_SAMPLE_RATE = 16726

	def convert(input_file_name, output_file_name, sample_rate):
	sample_data = bytearray(open(input_file_name, 'rb').read())

	nb_samples= len(sample_data)

	form_chunk_size = 4 + 4 + 4
	comm_chunk_size = 4 + 4 + 2 + 4 + 2 + 10
	ssnd_chunk_size = 4 + 4 + 4 + 4

	total_size = form_chunk_size + comm_chunk_size + ssnd_chunk_size + nb_samples
	nb_channels = 1
	sample_size = 8

	form_chunk = b'FORM' + \
	(total_size - 8).to_bytes(4, byteorder='big') + \
	b'AIFF'

	comm_chunk = b'COMM\x00\x00\x00\x12' + \
	nb_channels.to_bytes(2, byteorder='big') + \
	(sample_rate*nb_channels).to_bytes(4, byteorder='big') + \
	sample_size.to_bytes(2, byteorder='big') + \
	sample_rate_in_extended_precision[sample_rate]

	ssnd_chunk = b'SSND' + \
	(total_size-34).to_bytes(4, byteorder='big') + \
	b'\x00\x00\x00\x00' + \
	b'\x00\x00\x00\x00'

	header = form_chunk + comm_chunk + ssnd_chunk

	open(output_file_name, 'wb').write(header+sample_data)

	if __name__ == "__main__":
	parser = argparse.ArgumentParser()
	parser.add_argument('input_file')
	parser.add_argument('-o', '--output-file', help='The name of the output file (default is the name of the input file with ".aiff" extension)')
	parser.add_argument('-r', '--sample-rate', help='The sample rate of the source in Hz. The default is {}.'.format(DEFAULT_SAMPLE_RATE), default=DEFAULT_SAMPLE_RATE)
	args = parser.parse_args()

	input_file_name = args.input_file
	output_file_name = args.output_file or input_file_name + '.aiff'

	convert(input_file_name, output_file_name, args.sample_rate)