One-pager: production pipeline (mufu), proxy bake-off on real audio, and known gaps. Updated 2026-04-04.
MP3/M4A (5 min segments)
Fireworks whisper-v3, language=yue
Repeated n-gram hallucination detect → retry w/ alt preprocessing
Stitched [MM:SS] markdown
If no Fireworks key: AssemblyAI first pass → flagged chunks retried with local Whisper (compat). Env chain: workspace .env or mufu root.
Transcript → manual meeting note under context/meetings/. Command reference: .cursor/commands/transcribe.md. WhatsApp voice files may be mislabeled .jpeg — rename to real media type first.
Proxy Segmentation stability, silence-trim sensitivity, manual plausibility read.
zh) on dense speechSource: Amap meeting MP3 (Mar 2026). Clips: 60s dense speech, 30s halves, 60s silence-heavy + trimmed variant. See reports/proxy-eval.md.
Related: workdir/cantonese-asr-study.md, workdir/asr-validation/, output/asr-validation-review.html