The Static is Watching
Speech, Glitch and Ghosts on the ESP32
I am starting to build what I am hoping will become some kind of haunted audio device based on the ESP32-S3. I want it to be capable of glitchy sonic output like a radio pushing out static, interspersed with lo-fi speech synthesis. It won’t just beep. It will whisper strange things it pulls from the air, such as actual Wi-Fi SSIDs from nearby signals, cross-pollinated with artificial sonic fragments.
So far I have set this up with:
ESP32-S3 DevKit board
MAX98357A I2S DAC amplifier
A Mozzi fork (or native I2S AudioOutput)
SAM: Software Automatic Mouth ported to ESP32
Small speaker
Breadboard, wires
What It Does currently
Scans for Wi-Fi networks
Extracts the SSIDs
Randomly converts some into speech using SAM
Glitch oscillator underneath (which I am trying to make sound sort of like an Atari Punk Console with background radio interference)
Occasionally the intention is for it to insert its own weird messages
The result will hopefully soon be a drifting, haunted-sounding sonic loop that mutters fragments like:
“VIRGIN666 underscore 2G”
“GHOST OPEN NETWORK”
“STATIC IS WATCHING”
The Voice of the Machine
SAM’s output is currently garbled and difficult to decider but whilst wanting to make the voice output clearer, I still want it to spill out wrong clipped consonants, spectral vowels and difficult to decider fragments of speech. It will hopefully sound like a broken numbers station. I have routed its output to a MAX98357A which I soldered the pins and speaker outs on and I am currently using I2S, bypassing PWM for cleaner audio.
Here’s some of the core logic:
if (millis() - lastSpeakTime > random(5000, 15000)) {
const char* phrase = getHauntedSSIDorMessage(); // chooses either a scanned SSID or a custom glitch phrase
SAM_Say(phrase);
lastSpeakTime = millis();
}
Glitch + speech = unease.
What next?
I am aiming to integrate pots and capacitive touch to trigger different phrases
Battery power and mobile project box to house this for field playback
Hopefully aiming to send audio to TouchDesigner via OSC over a LAN for reactive visuals
Add low-speed FM scan for real audio bleed from the ether
I am really interested to see what audio interference over various protocols can be sniffed to spill out haunted audio fragments.
What else do you think it needs to do?



