Skip to content

Instantly share code, notes, and snippets.

@AkBKukU
Created September 23, 2022 14:58
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AkBKukU/dba8d5fb3cd428a35f80aabc08fc60f2 to your computer and use it in GitHub Desktop.
Save AkBKukU/dba8d5fb3cd428a35f80aabc08fc60f2 to your computer and use it in GitHub Desktop.
DECtalk DTC01 - 1984 Speech Synthesizer (Full Script with speech examples))
[:np] Hello, my name is Perfect Paul and I’m here to introduce you to the
DECtalk speech synthesizer. In particular, the DECtalk DTC01 released in 
1984. The DECtalk had revolutionary features for computer speech, but don’t 
just take my word for it.

[:nb] I can agree with that. Hi, my name is Beautiful Betty and my voice 
represents just one of the many ways the DECtalk is a flexible advanced 
system. Today we’re going to take an in depth look at the DTC01 device 
itself, and how it raised the bar for speech synthesis in ways that are 
still difficult to match today.

Uh, what they said.

[Intro music]

So like it said itself, this is the DECtalk DTC01 which was the first model of DECtalk released by Digital Equipment Corporation in 1984. This was a tremendously advanced device that not only changed lives but still has echos of cultural relevance today. But before we take a look at the DTC01 I was to briefly cover the history of consumer speech synthesis up to this point.

Perhaps the most well known example of early computer speech was the Texas Instruments Speak & Spell in 1978. This device was the culmination of decades of research into vocoder technology that utilized simple principals of encoding a buzzer to represent vocal chords and filters to change the sound. “Say it, Iron. Say it. most” TI created a series of ICs that implemented this by loading patterns from ROM into a primary synthesizer chip. In the Speak & Spell these patterns could be expanded with additional game cartridges. These patterns were for specific words and phrases not arbitrary speech though. These chips were very popular and found their way into many devices and even cars for alert systems.

Votrax was another company working on computer speech that released the Type & Talk. It was a serial device compatible with nearly any computer. And it would speak any text sent to it and had support for phonemes allowing for pronunciation of any word by building it from discrete sounds. Votrax was not tremendously successful with it though and the company exited the consumer market.

TI later released a version of the Speak & Spell chip as an addon for their TI/99 computer line. It was initially a limited vocabulary device programmed from BASIC, but later the Terminal Emulator II cartridge allowed using sections of the words in ROM as allophones to create arbitrary speech. It also allowed control of pitch for even more advanced speech output including signing. But being an addon module its use was limited to a single line of computers.

Individually though, none of these devices managed to meet the needs of a speech synthesis system, either from being tied to a specific device or not being flexible enough for general use. This is where the DECtalk revolutionized the concept but it also added many more features that the other devices lacked. Though, there was a pretty good reason for this. The two most advanced devices were the Speech Module which cost $150 and the Votrax for $375. The DECtalk DTC01 was …$4000. So it better make these other devices look like child's play in comparison.

[:np] Well then it’s a good thing we do.

The DTC01 device itself is designed to fit in with other products Digital had at the time, especially terminals. Physically, it is very large and shockingly heavy. Being about the same width as rack mount equipment and weighing around 16lbs. This is partly due to the internal PSU and speaker, and partly to the massive amount of steel shielding inside the unit. Something you will see if you happen to have one of these because you will definitely want to open it up and get to the three Rifa line filter caps it has in there. While we’re in here we can see the Motorola 68000 that works as the brains here. This was the same processor you would get in the Macintosh that released in the same year s this did. So it was a respectably powerful device. Inside we can also find static RAM and the 16 ROMs that make up its firmware and phonetic dictionary. The example I have here came with the earlier 1.8 firmware revision, something the DECtalk itself tells you every time you turn it on.

DECtalk version 1.8 is running.

Bugs were found and fixed in later firmwares revisions, but I’m not too keen on writing 16 ROMs at the moment so I’m going to leave that be. Back outside the unit on the rear we find connections for two serial ports, two phone connections, and two audio outputs. As well as a volume knob for the internal speaker. These connections all have different purposes so let's get started with this to take a look at what they do.

Interfacing with the DTC01 is a little confusing so let's follow the manual's recommendation and begin by connecting a serial terminal, I’ll use my TRS-80 DT-1, to the right or Terminal serial port. Serial settings are configured in software on the unit itself. When you first power it on you are presented with a prompt on your terminal. Pressing the BREAK key brings you into a setup menu that is extremely well made with internal help information for all options available. If you don’t know what the serial settings are though you cannot get into this menu. But there is another option to fix that we’ll get to in a bit. Sending exit will bring you out of the menu and back to the main prompt. This prompt is an experimentation area to allow you to see how the speech system works and allows you to directly command the DECtalk.

[:np] This is a line of text entered through the terminal.

You can actually give this exact setup a test yourself thanks to an emulator available on archive.org that I’ll link in the description.

Before we try this, let me cover some basic information about how DECtalk text processing works. First off, it is designed to process complete lines. So if you send it just a single word it will seem like it isn’t working and not respond. It does this because it varies the inflection of the speech through the duration of the line to make it flow more naturally. This is one of the most advanced aspects of the DECtalk and you can even tune this!

[:dv pr 0] This is a line spoken by the dectalk to demonstrate inflection!
[:dv pr 100] This is a line spoken by the dectalk to demonstrate inflection!
[:dv pr 250] This is a line spoken by the dectalk to demonstrate inflection!

So be sure to include punctuation when sending text. Now I’ve been including the lines I’ve been sending to the DECtalk on screen so you can see how it interprets them and you’ve probably noticed the sections in square brackets. These are the commands for the DECtalk that you can put inline to change parameters while it is speaking. There are a LOT of options you can tweak and tune for this, pitch, rate, voices, pause durations, and a lot more. If you were seriously developing for a DECtalk system you would be able to extremely fine tune the speech sounds to match your needs. There are too many options for me to go through in this video and this isn’t meant to be a complete tutorial. But these commands follow a basic structure, everything inside square brackets is treated as some kind of command. Configuration options have a colon in front of the parameter name and you can have multiple parameters in one command bracket. Anything without a colon is mostly interpreted as a phoneme but we’ll get back to those later. Putting this into practice we can easily go through and try out all of the voices on the DECtalk to hear what they sound like:

[:np] I’m Perfect Paul, the standard male voice, and you have already heard 
from me before, I may even sound similar to someone else. But I’m getting 
ahead of myself.

[:nb] My name is Beautiful Betty and I am the standard female voice. I, like 
all the voices, still use the same sound sources as Paul.

[:nh] I am Huge Harry. I am the Deep male voice. I am the lowest voice 
available out of the box.

[:nf] Oh hello, they call me Fail Frank. An Older male voice is what I 
represent. I’m mostly just a high pitched version of Paul though.

[:nk] Hi there! I’m Kit the Kid! I’m a child’s voice, a whole 10 years old!
Although if I was 10 when I came out, I’d be 48 years old now!

[:nr] They call me Rough Rita. I’m a deep female voice, but the way the 
samples are slowed down makes me sound like I have a pack a day habit. 

[:nu] And I round out the bunch, I’m Uppity Ursula, the light female voice. 
I almost sound like Kit but am just a bit lower pitch.

[:nv] Actually, I’m really the last voice. No, I’m not Paul, I’m Variable Val.
I’m a user definable voice. You can modify and save a custom defined voice for 
me to reload without needing to set all the parameters again. What can a 
custom voice sound like?

[:np :dv pr 0 g1 60 hs 150 ap 60] This [:dv ap 90] is a [:dv ap 120] test.
[:dv ap 60 pr 250] It can be a very different experience.

Those custom voice parameters are extremely adjustable, though Betty brought up a good point that all of the voices use the same basic set of samples that create Paul’s voice. And the origin of Paul’s voice is actually the origin of the DECtalk itself.

The DECtalk owes a great deal of its existence to Dennis Klatt without him and his research, it wouldn’t have the voice it does, literally. Klatt had both a Masters in Electrical Engineering and Ph.D. in communication sciences which made him perfectly equipped with the knowledge for a problem like this. He gave an interview with Popular Science in 1986 about the technical challenges of speech synthesis and the development of the DECtalk. Like earlier speech synthesis technologies, the DECtalk uses Linear Predictive Coding which is a strategy that breaks down speech components into larger elements rather than raw sample recordings. These are used to literally synthesize speech from overall shapes of waveforms, something Klat mentions spending an immense amount of time analyzing to make the output of the DECtalk match his source recordings. The recordings he would use are of his own voice making the DECtalk, and specifically the Perfect Paul voice, an emulation of his voice in particular. He mentions this is an imperfect process and that matching waveforms only gets you so far. Which is where some of the DECtalk’s real power comes into play. The intonation of lines I demonstrated earlier is a large component of the work Klatt did in making speech synthesis more than just a recreation of phonetic building blocks. The DECtalk is context aware of sentences for aspects like punctuation and even some abbreviations like mentioned in its initial announcement in Byte Magazine.

[:np] St. Paul rode his horse down 12th st. to Mr. Frank’s store to buy a board 12 ft. long.

The context aware difference between “st.” meaning “saint” or “street” is very impressive and for bonus it knew to pronounce “12th” as the word “twelfth”, “Mr.” as mister, and “ft.” as feet! This is what really made the DECtalk stand apart and is much more than just making sounds that resemble words.

The article also mentions two other things that are worth covering. One of them relates to why the DTC01 has phone jacks, but I want to quickly cover the other one first, because in many places they mention DECtalk they also mention Calltext which is most notable for other reasons.

Klatt actually worked at MIT as a professor and researcher, DECtalk was a product with technology he developed while there, and it wasn’t the only one. Speech Plus In. also licensed the work from Klatt to use in their Calltext line of computer speech products which included the 5000 series of ISA cards. Calltext as a product didn’t go on to be as successful as DECtalk had, but more people have probably heard a Calltext speak than a DECtalk because a Calltext 5010 ISA card was what enabled Stephen Hawking to communicate after he lost the ability to speak himself. The voice Hawking used with the Calltext card was extremely similar to the Perfect Paul voice of the DECtalk but [had] differently tuned parameters and rules for intonation. Hawking became accustomed to this voice over the years and preferred it over other options. To the extent that it was eventually emulated on a Raspberry Pi for him to continue using with newer technology.

Now about the phone jacks. That article goes into detail on uses of the DECtalk and at that point it had been available for 2 years and had started to find its niche. At $4000 a unit and its massive size it was much too all around to be used for assistive purposes, at least for now. Instead it found use in information industries. Stock prices, medical records, bank balances, and plenty of other pieces of information can be easily provided over the phone and the DECtalk did just that. And, I can even demonstrate it.

In this setup I have two active phone lines that are able to call each other. One I’ll connect to a touch tone phone, and the other I’ll put into the line phone jack of the DTC01. But I also need to change to using the host serial port instead of the terminal port. The host serial port operates differently than the terminal port and is better suited to the kinds of larger pieces of information you would want to convey over the phone. Primarily it supports software flow control which is critically important because the input buffer is very small, only 12 words according to the manual. But also because the host port supports additional commands. There are a bunch of these that we also don’t have time to get into because of how feature packed this thing is. But using an escape character prefix with a command string you can access even more features. With this we can fully control the dectalk as a programmable phone. Let’s test this by using the command to dial a phone number to the other line and send it a message.

[:nk] Hello, I’m trying to reach a Mr. Ron, first name Moe?

“Please stick to the 7 digit numbers you’re used to”

Making phone calls isn’t even the impressive part though, receiving them is where this becomes really cool. When the DECtalk receives a call it sends a response to the computer to let it know it is coming in. The computer can then send a message to the caller through the DECtalk. The DECtalk will even translate touch tones input by the user to the computer allowing you to program a menu. Now I couldn’t easily find software I could use to demonstrate this process and it would be really time consuming to setup my own for this video. But the DECtalk actually had a cool feature we can use to experience this with built in instead. After it has been first turned on and before it receives any commands or calls, you can call it and it will bring up its own system menu that is a perfect demo of how this works. It also has another test message we can play to hear more about this unit.

[Dial phone.]

“Hello, this is DECtalk. The firmware is version 1.8. Press any key for audible echo. Press star to return to factory settings. Press sharp to run self tests.”

[Presses sharp]

“You presses sharp. Enter test number. Terminate by sharp. Enter star to quit.”

[Enters 5, sharp]

“Test is speak a canned message. Enter pass count. Terminate by sharp. Enter star to quit.”

[Enters 5, sharp]

“Hello, this is DECtalk. The firmware is version 1.8. The code memories were generated on December 5th, 1983. The dictionary memories were generated on October 11th, 1983. There are 17,506 bytes free. If you can here this, there is a good chance that your DECtalk is working. Passed”

[Hangs up]

This thing is from 1984 and fits on my desk, it is so cool that it has this kind of functionality built in like this. Now this unit is not the kind of thing that would be put into mass service for this kind of use. They actually made rack mount versions that could fit many modules in them to handle many concurrent calls at once for larger customers. Now another part about this…

[:np] Hey! Uh, yes? [:np] When are we going to get to the good part? I thought this all has been pretty good so I’m not sure wh [:np] You know what I mean, I want to [:dv ap 200] sing! Oh alright we can do that.

If you know DECtalk from something more recently, it is probably from its ability to sing. The real intent behind this feature was to allow you to manually pronounce words that may be difficult for it to say. Like,uh, maybe someone's old channel name.

[:np] AkBKukU

That was actually better than I would have guessed, but we can do better by specifying the phonemes we want directly.

[:np][ae<100>][k<50>][aa<100,100>][b<50>][aa<100>][k<50>][uw<100,110>][k<50>][uw<100,110>]

That is much better, and we can even slow it down so you can here all the different parts as it says them:

[:np][ae<400>][k<400>][aa<400,100>][b<200>][aa<400>][k<200>][uw<400,110>][k<200>][uw<400,110>]

The phoneme commands have several components with most of them being optional. First is the phoneme itself. There is a list of all the possible phonemes in the manual for the device that you can reference. The old channel name sounds like this with just those

[:np][ae][k][aa][b][aa][k][uw][k][uw]

The second part is the duration in milliseconds that is put between angle brackets. This is just how long the sound will be made and lets you pace out the annunciation of the word more naturally. Here is the channel name with those now:

[:np][ae<100>][k<50>][aa<100>][b<50>][aa<100>][k<50>][uw<100>][k<50>][uw<100>]

That’s really close now, but we can do better with one more control, pitch. This can be put after the duration with a comma to change the frequency in Hz, or as a not letter, and this is the average pitch that the sound is spoken at. Put it all together and we get:

[:np][ae<100>][k<50>][aa<100,100>][b<50>][aa<100>][k<50>][uw<100,110>][k<50>][uw<100,110>]

Which sounds pretty much dead on.

There are even a few other things you can do like pitch slides:

[:np][_<1000,60>aa<2000,440>].

And these all combined allow you to make the DECtalk sing, if you are willing to put in the work to program the voice. Now, there are lots of videos out there already that demonstrate what this sounds like, but I’ll go ahead and do my own cover of something short here as a small example. It didn’t turn out quite as well as I would have liked but it is a lot of trial and error to get it to match something in particular.

[:nv][:dv gn 73][WEH<150,18>NAY<150>FAY<150>N<100>D<50>MAY<300>SEH<150,20>
L<100>F<50>IH<200,14>N<100>TAY<150,18>M<100>Z<50>AH<200>V<100>TRAH<300,23>
BEL<300,25>MAH<300,27>DHRR<300>MEH<300>RIY<300,25>KAH<150>M<100>Z<50>
TUW<300,23>MIY<600>

SPIY<300,27>KIH<200>NX<100>WRR<200,28>D Z<100>AH<200,27>
V<100>WIH<200>Z<100>DAH<200,25>M<100>_<300>LEH<150,27>DXIH<100,25>T<50>
BIY<600>IY<1200,23>]

Here’s another example of a more well known song I found online, again this is not perfect but you can get a feel for it still:

[:nv][:dv gn 73][WEH<150,18>NAY<150>FAY<150>N<100>D<50>MAY<300>SEH<150,20>
L<100>F<50>IH<200,14>N<100>TAY<150,18>M<100>Z<50>AH<200>V<100>TRAH<300,23>
BEL<300,25>MAH<300,27>DHRR<300>MEH<300>RIY<300,25>KAH<150>M<100>Z<50>
TUW<300,23>MIY<600>

SPIY<300,27>KIH<200>NX<100>WRR<200,28>D Z<100>AH<200,27>
V<100>WIH<200>Z<100>DAH<200,25>M<100>_<300>LEH<150,27>DXIH<100,25>T<50>
BIY<600>IY<1200,23>]

Oh, that didn’t sound quite right and there was more than just bad singing going on there. I think I have a solution, but just give me one moment.

[:nv][:dv gn 73][WEH<150,18>NAY<150>FAY<150>N<100>D<50>MAY<300>SEH<150,20>
L<100>F<50>IH<200,14>N<100>TAY<150,18>M<100>Z<50>AH<200>V<100>TRAH<300,23>
BEL<300,25>MAH<300,27>DHRR<300>MEH<300>RIY<300,25>KAH<150>M<100>Z<50>
TUW<300,23>MIY<600>

SPIY<300,27>KIH<200>NX<100>WRR<200,28>D Z<100>AH<200,27>
V<100>WIH<200>Z<100>DAH<200,25>M<100>_<300>LEH<150,27>DXIH<100,25>T<50>
BIY<600>IY<1200,23>]

That’s better! But it wasn’t running on the DTC01 and I don’t actually know what causes those problems. Before I got this though, I got a DECtalk express which is a much later portable model from 1994 that was sold specifically as an assistive device. But uh, mine’s just a bare PCB. I think these might have been getting shucked and put in stuff to add voice synth easily and at $1200 it was a better deal. Now we don’t really have time to get into all of the differences, but the protocol for the DECtalk interface continued to advance over time adding new phoneme structures and more features like phone dialing as a native text command. The Express has a lot of the improvements and a whopping 1MB of cache for incoming text which eliminates almost all flow control concerns. It is a much more capable unit overall and I’ve had a lot of fun with this one because I have it set up to act as a TTS reward when I stream on Twitch which has been stupid amounts of fun.

Well, that covers a lot of the history of the DECtalk and gives what I hope is a revealing in depth look at the original DTC01 model that hasn’t had a lot of coverage. I was initially interested in these devices for the novelty of the singing and the connection to Stephen Hawking. But as I have experienced them first hand I have gained a new appreciation for just how powerful they really are. They weren’t cheap, they weren’t first, but they were the best and still hold up well today in an age where many of us may interact with devices that do both speech synthesis and recognition. I hope you enjoyed this look at the DECtalk and if you did you may want to subscribe. If you want to help support the channel you can find me on Patreon or pick up one of my shirt designs. But that’s it for now, and I’ll see you next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment