The Art of Demonic Voice Design: More Than Just Pitching Down
The most famous demon voice in cinema history was not created with a computer. In 1973, actress Mercedes McCambridge recorded the voice of the demon Pazuzu for The Exorcist using methods that sound almost as disturbing as the film itself. She chain-smoked cigarettes to destroy her vocal clarity, swallowed raw eggs with pulpy apple chunks to produce the sounds of violent expectoration, had a scarf pulled tight around her neck to the point of near-strangulation to create groaning, strained textures, and was bound to a chair so her physical struggling would translate into audible strain. Her chronic bronchitis provided the wheezing quality that made the possession sequences so unsettling. Director William Friedkin then recorded her lines multiple times and layered them on top of each other, dubbing in separate laughs, cries, and animal noises to create a voice that sounded like multiple entities inhabiting the same throat simultaneously. The wailing sound just before Pazuzu is driven out was based on a keening sound McCambridge had once heard at an Irish wake. She asked her priest to be on standby during the recording sessions for counseling. That performance set the standard that every demon voice since has been measured against.
What makes McCambridge's work instructive for understanding demon voice generation is that it was already, even in 1973, a multi-layered process. A single vocal performance was never enough. The final demon voice in The Exorcist is the product of at least three to five simultaneous techniques working in concert, and that principle has not changed in fifty years. Modern demon voice design still depends on stacking multiple processing layers, each addressing a different psychological dimension of what makes a voice sound inhuman and threatening.
The first layer is always pitch shifting, but not the crude kind. A naive pitch shift just moves everything down one or two octaves, which makes a voice sound slow and dopey rather than demonic. Professional demon voice processing uses formant-preserved pitch shifting, which drops the fundamental frequency while keeping the resonant characteristics of the vocal tract intact. This is the difference between a voice that sounds like a slowed-down recording and a voice that sounds like a very large, very wrong creature that is genuinely speaking. The formants, the resonance peaks that our brains use to identify vowels and recognize speech as human, stay roughly in their original positions even as the pitch drops. The result is deeply unsettling because your brain receives contradictory signals: the pitch says this throat is impossibly large, but the formants say it is shaped like a human throat.
The second layer is voice doubling and layering. The same phrase gets generated or recorded multiple times with slight variations in pitch, timing, and tonal quality, then stacked together. This is exactly what Friedkin did with McCambridge's recordings, and it creates the sense of multiple voices speaking in unison that is central to the possession archetype. When the offset between layers is very small, under 30 milliseconds, the effect is a chorus-like thickening that makes the voice sound larger than any single source could produce. When the offset is larger, you get the classic two-voices-at-once effect where you can almost but not quite pick out the individual speakers.
The third layer is reverse reverb, one of the most powerful techniques in horror audio design. Normally, reverb follows a sound: you hear the voice first, then the decay trail as it bounces around a space. Reverse reverb inverts this. The reverb tail precedes the actual voice, creating a swell of ambient energy that resolves into speech. The psychological effect is profound because it sounds like the voice is arriving from somewhere else, like it is being projected into the room rather than originating within it. This technique was used in The Exorcist and has appeared in virtually every horror film since. The 2024 film Smile 2 used it extensively for its demon vocal processing, as did A24's 2026 audio-focused horror film Undertone, which builds its entire premise around manipulated sound including backmasked audio that reveals hidden messages when reversed.
The fourth layer is granular synthesis, a technique that breaks the voice signal into tiny fragments called grains, typically between 1 and 100 milliseconds long, and rearranges, stretches, or scatters them. This can produce textures that range from subtle grit, like sand embedded in the vocal cords, to complete disintegration where the voice dissolves into a swarm of particles and reassembles. Granular processing is what gives many modern demon voices their organic-yet-alien texture. Tools like Krotos Dehumaniser use up to eleven simultaneous layers of manipulation including granular synthesis, convolution with animal recordings, and throat modeling to build demon voices for AAA game titles like Far Cry 4.
The fifth layer is sub-bass enhancement, adding frequency content below 80 Hz that you feel in your chest more than you hear with your ears. This exploits the phenomenon of infrasound, frequencies below the normal threshold of human hearing that have been documented to produce feelings of unease, dread, and even the sensation of a presence in the room. Many haunted house attractions use sub-bass speakers specifically for this effect, and professional demon voice design incorporates it to make the voice feel physically threatening.
Different demon archetypes call for different combinations of these techniques. The classic possession voice, inspired by The Exorcist, emphasizes heavy layering with contradictory vocal textures, formant-preserved low pitch, and reverse reverb to suggest an entity controlling a human body. The ancient evil archetype, voices like Sauron or the Balrog, focuses on massive scale through extreme sub-bass, slow granular stretching, and cavernous reverb spaces that make the voice sound geologically old. The seductive demon or succubus voice takes a different approach entirely, using subtle breathiness, gentle pitch modulation, and just enough reverse reverb to create allure with an undercurrent of wrongness. Cosmic horror voices, the Lovecraftian eldritch entities, lean hard on granular disintegration and spectral processing that makes the voice sound like it exists in more dimensions than three. And the comedic demon, think Beetlejuice or Hazbin Hotel's Alastor, plays with exaggerated versions of these effects in ways that are clearly performative, using dramatic pitch drops and theatrical distortion that signal entertainment rather than genuine threat.
TwoShot's demon voice generator gives you access to this full spectrum. You are not applying a single filter to a text-to-speech output. The AI understands the layered architecture behind convincing demonic voices and applies the appropriate combination of techniques based on the archetype and intensity you describe.