Skip to content

Instantly share code, notes, and snippets.

@netlinux-ai
netlinux-ai / 08-f5-vs-styletts2-tradeoff.md
Created May 9, 2026 07:13
F5-TTS vs StyleTTS2: real Pareto trade-off in fine-tune behaviour (accent strength vs phonetic stability)

F5-TTS vs StyleTTS2: a real Pareto trade-off in fine-tune behaviour

Running two TTS architectures on the same small fine-tune corpus surfaces a real trade-off: F5-TTS commits hard to accent character at the cost of phonetic stability; StyleTTS2 stays phonetically stable at the cost of accent commitment. Neither dominates. Each has its own late-epoch failure mode, just different ones.

This is a write-up of the comparison, with the concrete failure modes that made the trade-off visible.

@netlinux-ai
netlinux-ai / 07-tts-listening-loop.md
Created May 9, 2026 07:11
How human feedback actually steers TTS fine-tuning — the listening loop, with worked examples

How human feedback actually steers TTS fine-tuning

Notes on the iteration loop we ran while fine-tuning F5-TTS and StyleTTS2 on a small Northern English corpus. The headline finding is that the listening test isn't optional polish at the end — it's the only measurement that catches the failure modes that matter, and each round of listening produces specific phonetic observations that map to specific engineering decisions.

This is a write-up of the methodology, with the concrete examples that forced each decision.

@netlinux-ai
netlinux-ai / 06-non-avx2-cpu-tts-compat.md
Created May 9, 2026 07:10
Non-AVX2 CPU TTS compatibility: F5-TTS / StyleTTS2 / kokoro / whisper.cpp on a Phenom II X6

Running modern Python TTS toolchains on non-AVX2 CPUs

Notes from getting F5-TTS, StyleTTS2, kokoro/Misaki, and whisper.cpp to work on an AMD Phenom II X6 1090T (2010 K10/Family-10h architecture).

The CPU has SSE/SSE2/SSE3/SSE4a, plus CX16/POPCNT/LAHF — but no SSE4.1, no SSE4.2, no AVX, no AVX2, no FMA, no F16C. That puts it below the modern x86-64-v2 baseline. A growing share of binary Python wheels in the AI ecosystem assume v2 or v3, so they SIGILL or SIGFPE at import. This is a ground-truth list of what we hit and what worked.

@netlinux-ai
netlinux-ai / 05-README.md
Created May 9, 2026 07:09
Minimal F5-TTS fine-tune trainer: bypasses HuggingFace datasets/accelerate, single file, ~250 LOC

Minimal F5-TTS fine-tune trainer (no datasets / pyarrow / accelerate)

A ~250-line trainer for F5-TTS that bypasses the HuggingFace datasets and accelerate dependency stack. Useful when:

  • the upstream f5-tts_finetune-cli won't install/run because of pyarrow / pandas / datasets issues (binary-wheel CPU baseline mismatches, missing dependencies, etc.)
  • you want a single-file trainer you can read end-to-end and modify
  • you want to use precomputed mel-spectrograms loaded from disk rather than