Epoch — Unlocking S2S
Last Updated:
|
Effective:
The next generation of voice AI will be built on speech. Real speech. Millions of hours of it.
Today we're announcing Epoch, the largest conversational audio collection effort ever undertaken.
The architecture problem
Every frontier lab is converging on the same conclusion: speech-to-speech is the future. Full-duplex, low-latency, emotionally coherent voice models that talk and listen simultaneously, without routing through text as an intermediary.
The problem is that existing datasets were designed for a different world. They were built for ASR and TTS, for a pipeline where speech gets transcribed, processed as text, and synthesized back out. That pipeline is dying.
Speech-to-speech models need naturalistic conversation with clean speaker separation. They need the full acoustic signal: prosody, timing, overlap patterns, turn-taking dynamics. They need scale that doesn't exist yet.
So we're building it.
What we're building
Millions and millions of hours of conversational English audio. The largest collection effort of its kind, by a wide margin.
Training infrastructure for the next class of voice foundation models.
—
Exclusivity
Frontier labs have reached out about exclusive rights to the corpus. We're open to both exclusive and non-exclusive licensing.
If you're working on speech-to-speech and want access, reach out.
