AWS Machine Learning Blog·Tutorial·3d ago·by Gleb Geinke·~1 min read

Cost-effective multilingual audio transcription at scale with Parakeet-TDT and AWS Batch

Artificial Intelligence Cost-effective multilingual audio transcription at scale with Parakeet-TDT and AWS Batch Many organizations are archiving large media libraries, analyzing contact center recordings, preparing training data for AI, or processing on-demand video for subtitles. When data volumes grow significantly, managed automatic speech recognition (ASR) service costs can quickly become the primary constraint on scalability. To address this cost-scalability challenge, we use the NVIDIA Parakeet-TDT-0.6B-v3 model, deployed through AWS Batch on GPU-accelerated instances. Parakeet-TDT’s Token-and-Duration Transducer architecture simultaneously predicts text tokens and their duration to intelligently skip silence and redundant processing. This helps achieve inference speeds orders of magnitude faster than real-time. By paying only for brief bursts of compute rather than the full length of your audio, you can transcribe at scale for fractions of a cent per hour of audio based on the benchmarks described in this post.…

#rag#inference#multimodal

read full article on AWS Machine Learning Blog →

0login to vote