Two of the instruments covered earlier in this series of the history of early synthesis came directly from the experiences of engineers, Lev Termen and Maurice Martenot, who during WWI became fascinated with electronics and their possibilities for musical purposes. However, this article focuses on the work of a man whose inventions revolutionized long distance communication and cryptography, Homer Dudley. His inventions were not intended for musical or artistic purposes, as far as we can tell—but rather, they were technologies that helped with radio and telephone communication, as well as speech synthesis. Artists and musicians later took his technologies and recontextualized them for their own purposes.
Spoken communication is one of the things that many people contend makes us human: it separates us from animals without varied and articulated auditory communication. While mating songs and territorial howls do transmit information, the subtlety and complexity of thought expressed by the voice is, to some, the pinnacle of intelligence. This fundamental aspect of interaction is so intrinsic to us, that one of the easiest ways to make something seem self-aware is to add the capability of spoken communication to it.
Some of the early examples in so-called constructed speech were more or less parlor tricks, elaborate disguises more akin to puppetry than technology. Like the Mechanical Turk, a false chess-playing automaton which concealed an operator in the machine, these speech machines were elaborate ways to disguise and manipulate the voice of a speaker. However, advancements such as Wolfgang von Kempelen’s “Acoustic-Mechanical Speech Machine” which used hand pumped bellows and Joseph Faber’s organ-like “Euphonia” that added additional complexity and control paved the way for future advancements. Today, synthetic speech is commonplace, heard everywhere from Apple’s Siri to AI reconstructions of the voices of famous people, many pieced together from recordings without the blessing or consent of the person or their loved ones. The jump from puppetry to air forced through bellows to fully synthesized speech came from Homer Dudley.
Dudley & the Vocoder

Homer Walter Dudley, born 1896, started his career as a teacher, but changed paths after becoming frustrated by student behavior and his time during WWI. After completing his bachelor’s degree in Electrical Engineering at Pennsylvania State College, he landed a job at Bell Labs—which, at the time, functioned as the research division of AT&T. Over his 40-year career with Bell Labs, he was responsible for 37 patents: indeed, he was a standout engineer at one of the most prestigious engineering institutions in American history.
By 1926, Dudley was enmeshed with research into the field of speech, hoping to recreate the sound of a human voice electronically. His research led him to what is called the source-filter model of speech synthesis, where speech is constructed by the mouth filtering vibrations from the mouth, nose, and throat, a form of subtractive synthesis.
While it may sound strange to describe human speech as subtractive synthesis, the mechanics are incredibly similar. Speech is made up of vowels and consonants, which are voiced and unvoiced respectively. In synthesis terms, these two building blocks of speech sounds are analogous to a sawtooth oscillator and a noise source. For voiced vowel sounds, such as an “a” or “o”, the vocal folds of the throat vibrate, or put another way, oscillate, creating the base frequency of a speech sound. If you took a look at this vibration on a spectroscope, it would appear much like a sawtooth waveform. Unvoiced sounds, such as the sound of a “p” (known as a plosive) or an “s” (known as a fricative), are created by sharply forcing air through the mouth, creating a noisier sound than the vibration of vocal cords.

The different parts of the mouth work in tandem in order to turn these raw sounds into intelligible speech through filtering. Each part of speech is identifiable by a formant, a resonant peak within a speech sound. The human tongue and jaw articulate in order to shape sounds into speech, moving the cutoff frequency and resonance to turn the raw movement of air and muscle into the most human method of communication.
By 1928, Dudley began work on the vocoder, whose name comes from the portmanteau of VOice and enCODER. Its first applications were for transmitting voice over telephone, allowing speech to be deconstructed and reconstructed both remotely and electronically, a breakthrough for telephone technology.
For more information on vocoders, see this article, but we’ll quickly summarize. A vocoder is made up of two matched sets of bandpass filters, one for analysis, the other for synthesis. It takes two input signals, a carrier and a modulator, and maps the spectral characteristics of the modulator to the carrier. Typically, the modulator is a human voice, and the carrier is either a sawtooth oscillator or noise source. The reason that a sawtooth waveform is used over another of the common periodic waveforms such as triangle or square is due to its harmonic nature.
The harmonic characteristics of a waveform is how our ears distinguish one sound from another, often called a sound’s timbre. A sine wave only contains a fundamental frequency, with no overtones, which is why it sounds the most “pure,” and making it typically undesirable as a source for subtractive synthesis. Both triangle and square waves contain harmonics, but only odd harmonics. However, the difference in amplitude of the harmonics between the two allows us to perceive their differences. A square wave contains more harmonics, making it sound “buzzier,” while the smoother roll off of harmonics in a triangle waveform sounds less harsh. A sawtooth waveform contains both even and odd harmonics, making it the best candidate of the common waveforms to impart the spectral characteristics of a human voice onto. Noise contains all harmonics within its audible spectrum, but lacks a fundamental frequency that these harmonics ascend from, making it sound unpitched. The most common forms of noise are white noise and pink noise, created by low pass filtering white noise, removing energy from the higher harmonics at a slope of -3dB/oct. Pink noise is often used in noise machines, even those labeled as white noise, as the filtering makes it sound less harsh to our ears. Different types of noise contain different amounts of energy throughout their frequency spectrum, which are detailed in this article.
The analysis bank of bandpass filters splits the incoming audio into a number of frequency bands, typically between eight and 20 bands. Then, the vocoder turns the amplitude, or loudness of each of the frequency bands into envelopes using an envelope follower. The parts of speech that are most emphasized, i.e. formants, impart more energy into those frequency bands, making the envelope followers higher in amplitude.
The sawtooth or noise carrier signal is routed into the synthesis filter bank. The envelopes from the analysis filter bank control the amplitudes of the filters with the same cutoff frequency in the synthesis bank. This takes the overall harmonic characteristics of one sound, and transfers it to another. The process of breaking down a sound into a discreet number of bands means the overall level of precision and information for reproduction decreases, which is most notable to humans in the case of speech. However, by deconstructing the voice into separate and reproducible bands of frequencies, it became much easier to transmit them over a single telephone line.
The VODER
Dudley wanted to not only reproduce the sound of a voice, but also to synthesize it from scratch. This led to the development of the VODER, or Voice Operator DEmonstratoR. The final machine operated, to an outside observer at least, much like a piano or organ with a series of keys and a foot pedal. However, these keys did not change the pitch of the VODER. It included a noise source and a sawtooth oscillator, whose base pitch could be switched for higher or lower speech sounds, and manipulations of a footbar allowed for subtle changes in inflection and expression.
A wrist bar selected between the oscillator and the noise source, and the keys routed either sound source to the set of 10 bandpass filters. Ten white keys were used to route the sound sources to the filters, while three specialized black keys produced plosives, or “stop consonants,” such as “D”, “P”, and "K”. The cutoff frequencies of these filters were carefully selected to sit in the range of formants, and different combinations of them produced basic speech sounds. The virtuoso of the VODER was a woman named Helen Harper, who mastered it after a year of dedication and careful practice. She trained the other operators, all of whom were women. Of the over 300 applicants, less than 30 were able to operate it in an intelligible way. After years of research and development, Dudley brought the machine to the 1939 New York World Fair. Helen Harper said during the demonstration:
“In producing the word ‘concentration’ on the VODER, I have to form thirteen different sounds in succession and make five up and down movements of the wrist bar and vary the position of the foot pedal from three to five times according to what expression I want the VODER to give the word. And of course, all this must be done with exactly the correct timing.”

[Above: user controls of the VODER]
During WWII, the United States government contracted Bell Labs to create a new way to encrypt communications. The only real way to relay information over distance with any kind of reliability was radio. Broadcasting waves of information is an incredibly insecure method of communication, even with some of the rudimentary forms of encryption at the time. Early forms of encryption involved inverting portions of radio waves in order to render the raw audio unintelligible to listeners. However, these methods were rather crude, and could be reverse engineered if intercepted by those with the right technology. Dudley’s work with telecommunications proved invaluable, and the vocoder became part of a larger project known interchangeably as Project X, Green Hornet, and SIGSALY.
The Vocoder as a Musical Instrument
Although Bell Labs' primary concern was not music, there was some recognition of the musical possibilities for the vocoder. One of the first recorded musical applications for the vocoder was a rendition of an old Irish folk song called “Love's Old Sweet Song”, with orchestral accompaniment. There is little indication that this was seen as much more than an oddity, and the full use of its potential applications would not be seen until decades later.
As the military moved on to other forms of encryption, the vocoder began to emerge more fully into the artistic realm. Homer Dudley visited Bonn University in 1948 where he met Werner Meyer-Eppler, head of the Phonetics department. Meyer-Eppler recognized the possibility for not only speech synthesis for communication and accessibility, but also for musical purposes. He, along with Robert Beyer and Herbert Eimert proposed an electronic music studio to the Nordwestdeutscher Rundfunk NWDR. His most influential student, both at the university and electronic studio, was Karlheinz Stockhausen.
Wendy Carlos' frequent collaboration with Bob Moog led to the development of the Moog vocoder. The first iteration was not hardwired, but manually patched by Wendy using a pair of modified fixed filter banks. This was most clearly heard on the score for Stanley Kubrick's A Clockwork Orange. Wendy would comment years later in her Secrets of Synthesis album:
“I asked Bob Moog to put together something like a Vocoder using his standard modules. We originally called it a Spectrum Encoder Decoder, what a terrible name. Of course, Vocoders may be well known now, but they were hardly on everyone’s lips in those days.”
And on a section of her site entitled Vocoder Questions, an excerpt from an interview with Kurt B. Reighley:
Question #1 -- What attracted you initially to the vocoder? What what [sic] it about the timbre/application/etc. of this device that made you want to utilize it in your work?
Reading about it in tech journals. It seemed like magic! Then I got a chance to try one at the NY World's Fair of 1964-65, at the Bell Labs pavilion. I was hooked! When I figured how to make it not just speak, but sing, it earned an assured place in a forthcoming new album. You know, it seemed an exciting idea to share! The first reactions were unanimous: everyone hated it! A playing synth was bad enough, but a "singing" synth? Too much, turn it off! Thus Timesteps was born, to "ease into" the first experience most folks would have with a "singing machine"... All to [sic] easy to forget this history now.”
From there, its use spread to bands such as ELO, Kraftwerk, and Afrika Bambaataa. It’s an iconic part of electronic music, and led to the eventual development of the ubiquitous Auto-Tune.
Today
Harald Bode's 1978 Bode 7702 Vocoder was licensed to Moog, and is what is most commonly recognized as the Moog vocoder. EMS produced a number of vocoders, including the 2000, 3000, and 5000, but like many of EMS's lesser-known products, documentation remains scant to this day. In 2020, Moog reissued their classic, 16-band Bode-style vocoder, directly referenced from the original schematics. Moog's recent announcement of the Spectravox continues the legacy in the development and proliferation of the vocoder for musical purposes. Other modern vocoders include the GRP V22, AnalogFX VXC-2220, Arturia Microfreak, and—of course—the Korg microKORG.
While Dudley's contributions to music in the form of the vocoder cannot be overstated, the lesser-known VODER is in many ways more influential in the modern age. Dudley’s early work informed future researchers in speech synthesis. Although Dudley left Bell Labs in the early 1960s, his colleagues carried on his work. In 1961, John Larry Kelly Jr. and Louis Gerstman demonstrated one of the earliest examples of computer-based speech synthesis with an IBM 704. Kelly and Carol Lockbaum programmed it to sing the now iconic song "Daisy Bell," instantly recognizable to the Kubrick fans reading this. The writer of 2001: A Space Odyssey, Arthur C. Clark was at this demonstration, and it made its mark on the film. As Dave finally confronts the HAL 9000 computer head on and manually disassembles it, it sings a slowly degrading version of "Daisy Bell." (If this is a spoiler, finish this article and go watch 2001. How have you not seen it? It’s 2024.)
The source filter model of speech synthesis influenced future models of speech synthesis, including Linear Predictive Coding used in Texas Instruments’ LPC Speech Chips. This is seen most famously in the Speak & Spell. Modules such as Mutable Instruments’ Braids and Plaits use a modified version of this algorithm, as well as Synthesis Technology’s Circuit Bent VCO, all discontinued. A manual and rudimentary form of speech synthesis comes in the form of formant filters. These include 2HP’s Vowel, Doepfer’s A-104, and Rare Waves’ Grendel Formant Filter.
The artificial construction of speech from its base acoustic building blocks laid the groundwork for modern accessibility, both in terms of giving voice to those rendered unable to speak, such as Stephen Hawking, to the screen readers necessary for blind and seeing impaired to navigate the Internet. Vocoders and speech synthesis shaped the world we live in today, all from a former high school teacher who purportedly just got fed up with unruly students and changed history.