FFASR benchmarks speech‑recognition models on audio representative of far‑field use (noisy rooms, reverberation, adverse SNRs), not only studio‑dry speech. Every model runs on the same held‑out set and the same text normalization, so numbers are directly comparable. The simulated room impulse responses are similar to those in the Treble10 dataset.

Each submission reports WER for nine scenarios (Near Field Speech, Lab Measured, Lab Simulated, High SNR, Mid SNR, Low SNR, and three Moving SNR splits) plus RTFx and parameter count. The leaderboard ranks models by Average WER over the scenario columns you have checked (lower is better). The Analysis tab visualises the Pareto Front of Average WER vs RTFx for WER/speed trade-offs.

Paste a Hugging Face model id in the Submit tab; scoring runs server‑side and evaluation audio is not exposed to submitters. The Analysis tab provides WER rankings, heatmaps, speed views, and scenario‑wise comparisons.

Version
Select columns to display
  • Moving-source evaluations are a beta feature.
Leaderboard data loads from storage after the page opens · Evaluation runs on Hub Jobs against a held-out set · Models loaded from the Hugging Face Hub