Epoch — Unlocking S2S

Last Updated:

|

Effective:

The next generation of voice AI will be built on speech. Real speech. Millions of hours of it.

Today we're announcing Epoch, the largest conversational audio collection effort ever undertaken.


The architecture problem

Every frontier lab is converging on the same conclusion: speech-to-speech is the future. Full-duplex, low-latency, emotionally coherent voice models that talk and listen simultaneously, without routing through text as an intermediary.

The problem is that existing datasets were designed for a different world. They were built for ASR and TTS, for a pipeline where speech gets transcribed, processed as text, and synthesized back out. That pipeline is dying.

Speech-to-speech models need naturalistic conversation with clean speaker separation. They need the full acoustic signal: prosody, timing, overlap patterns, turn-taking dynamics. They need scale that doesn't exist yet.

So we're building it.


What we're building

Millions and millions of hours of conversational English audio. The largest collection effort of its kind, by a wide margin.

Training infrastructure for the next class of voice foundation models.




Exclusivity

Frontier labs have reached out about exclusive rights to the corpus. We're open to both exclusive and non-exclusive licensing.

If you're working on speech-to-speech and want access, reach out.

BG

Every Clip, Research-Ready

Our datasets are delivered with version control and complete documentation. They’re ready-to-train from day one.

CTA Image
logo

Making Models Multilingual

X

© Extrian. All rights reserved.