Sun. Feb 23rd, 2020

Google to announces open source the Live Transcribe Speech Engine

2 min read

Yesterday, Google announced in its blog the open-source Android speech recognition transcription tool – Live Transcribe Speech Engine, which aims to transcribe speech or dialogue into text in real-time and can also help the hearing impaired. Live Transcribe is an Android app launched by Google in February of this year. Its speech recognition is provided by Google’s most advanced Cloud Speech API. However, relying on the cloud introduces some complexity, and the ever-changing network connectivity, data cost, and latency robustness all bring some challenges. Therefore, Google has open-sourced it and hopes that developers will build and develop on the basis of existing ones.

The automatic speech recognition (ASR) module has the following features:

  • Infinite streaming
  • Support for 70+ languages
  • Robust to brief network loss (which occurs often when traveling and switching between network/wifi). Text is not lost, only delayed.
  • Robust to extended network loss. Will reconnect again even if network has been out for hours. Of course, no speech recognition can be delivered without a connection.
  • Robust to server errors
  • Opus, AMR-WB, FLAC encoding can be easily enabled and configured.
  • Contains a text formatting library for visualizing ASR confidence, speaker ID, and more
  • Extensible to offline models
  • Built-in support for speech detectors, which can be used to stop ASR during extended silences to save money and data (Note that speech detector implementation is not provided)
  • Built-in support for speaker identification, which can be used to label or color text according to speaker number (Note that speaker identification implementation is not provided)

Live Transcribe app is available here.