StyleTTS 2

Free

Human-level synthesis via style diffusion.

Open-SourceHigh-QualityDiffusion

Overview

StyleTTS 2 uses style diffusion and adversarial training with large speech language models to reach human-level naturalness. It models speaking style as a latent variable, producing diverse and expressive renditions. It is frequently cited as one of the best-sounding open TTS models.

What makes StyleTTS 2 special?

Focus: Style-diffusion synthesis.
Availability: Free & Open-Source (MIT).
Use cases: Audiobooks, narration, and high-fidelity voices.

Visit official site

Related free models

Chatterbox

A lightweight, fast TTS model built on LLaMA.

Dia

A 1.6B parameter TTS model from Nari Labs.

Kokoro

An 82M parameter TTS model by Hexgrad.

Back to directory