tts

elevenlabs

#qwen

Qwen3-TTS vs ElevenLabs: I Tested Both and Here's What Actually Happened

Dishant Sharma

•

Jan 24th, 2026

•

5 min read

Qwen3-TTS vs ElevenLabs: I Tested Both and Here's What Actually Happened

ElevenLabs charged me $22 last month for voice generation. For a side project that gets maybe 200 users.

Most of that cost came from testing. i kept tweaking prompts, trying different voices, regenerating the same sentences. The API worked great. But every test run cost money. And when you're building something new, you test a lot.

Then Qwen dropped their TTS models three days ago. Completely open source. Free to run on your own hardware.

Why everyone's talking about it

The Reddit thread hit 684 upvotes in 24 hours. Developers started sharing benchmark scripts immediately. One guy had it running on his AMD GPU within twenty minutes of release.

Here's what got people excited. Qwen3-TTS clones a voice from 3 seconds of audio. ElevenLabs needs 1 to 3 minutes. That's not a small difference when you're trying to clone 50 celebrity voices for a meme app or testing different narrator styles.

The latency numbers are wild too. Qwen reports 97ms for first packet. ElevenLabs sits around 200ms. OpenAI TTS is about 150ms.

i tested the voice cloning yesterday with a 4-second clip of my friend's voice. The output sounded closer to him than i expected. Not perfect. But way better than what open source offered six months ago.

The cost thing

Here's where it gets interesting for people actually shipping products.

ElevenLabs costs around $0.30 per 1,000 characters for their standard tier. OpenAI TTS is $0.015 per 1,000 characters. Smallest.ai undercuts everyone at $0.00059 per 10,000 tokens, which roughly translates to 5.9 cents per 10,000 tokens.

But Qwen3-TTS? Free if you run it yourself.

One developer calculated saving $900 per month on a podcast app generating 5 million characters. That's real money for a bootstrapped SaaS.

The catch is infrastructure. You need a GPU. The model downloads at 4.5GB on first run. And you're responsible for scaling, monitoring, uptime.

Most people don't want to deal with that.

What the actual quality is like

i spent two hours testing Qwen against ElevenLabs and OpenAI.

For English, ElevenLabs still sounds the most natural. The prosody is better. Emphasis lands where it should. Qwen sometimes puts stress on weird syllables.

But for multilingual stuff? Qwen actually won in my tests. i had it generate the same sentence in Hindi, French, and Japanese. The word error rates were lower than ElevenLabs Multilingual v2 according to their benchmarks. And i could hear it. The Hindi pronunciation was noticeably cleaner.

OpenAI TTS felt middle-ground. Decent quality but nothing special. The latency bothered me more than i thought it would. That extra 50-100ms makes voice chat feel less natural.

The real surprise was how fast Qwen processes long text. No chunking artifacts. No weird pauses at sentence boundaries. ElevenLabs sometimes adds unnatural breaks when synthesizing paragraphs.

Smallest.ai trying to compete

Smallest.ai launched their Lightning V2 model last May with 98ms latency. Almost identical to Qwen's speed. They made this whole marketing campaign showing a chicken getting electrocuted by lightning to symbolize beating their competitor Sarvam's Bulbul model.

The pricing is aggressive. Way cheaper than ElevenLabs. And they support seven Indian languages plus global ones.

But here's the thing. Smallest.ai is still a paid API. You're still sending your audio to someone else's servers. You're still accumulating charges as you scale.

Qwen gives you the model weights. You own the infrastructure. No per-token anxiety.

My neighbor's self-hosting obsession

My neighbor runs everything on his home server. Plex, Nextcloud, his own email. He showed me his electricity bill once. i couldn't believe how much power his server rack pulls.

Last week he told me he's setting up Qwen3-TTS on his gaming PC. "Why?" i asked. He generates maybe 10 audio clips a month for his D&D sessions.

"Because i can," he said.

Sometimes the cost savings don't matter. People just like owning their stuff.

When you shouldn't bother

If you're generating under 100,000 characters a month, just use OpenAI TTS. The $1.50 monthly cost isn't worth the setup hassle.

If you need the absolute best English voice quality for a customer-facing product, ElevenLabs is still ahead. Their neural models sound more polished. Your users will notice.

If you don't have a GPU or don't want to manage infrastructure, any of the API options make more sense. Qwen offers their own API too. But then you're back to paying per token.

Open source doesn't mean free if your time costs money.

The developer reaction

The LocalLLaMA subreddit went crazy. One person wrote "this is arguably the most disruptive release in open-source TTS yet". Another said they already have VibeVoice and Dia, so Qwen isn't groundbreaking, but the multilingual support matters.

Hacker News was more measured. People shared installation scripts within hours. Someone got it working with ROCm on AMD hardware. The community moved fast.

Twitter had the usual hype cycle. Qwen's official account called it their most disruptive TTS release. Developers shared voice samples. Some were genuinely impressive. Others sounded robotic.

The difference between hype and reality is about three weeks. That's when the bugs surface and limitations become obvious.

My actual plan

i'm switching my side project to Qwen3-TTS this weekend. The $22/month ElevenLabs bill isn't breaking me. But it bugs me. And i like tinkering.

i'll probably regret it when something breaks at 2am. Or when i need to scale and realize i don't know what i'm doing. But saving $250 a year sounds good right now.

Maybe in two months i'll be back on ElevenLabs. Maybe Qwen becomes my default.

Either way, it's good to have options that don't charge per character.

Enjoyed this article? Check out more posts.

View All Posts