Meta announces open source large-scale multilingual model: identify more than 4,000 spoken languages

Meta has recently unveiled its Massively Multilingual Speech AI, a vast multilingual model capable of recognizing over 4,000 spoken languages, and has made it available open-source, thereby propelling the linguistic research community to further the preservation of numerous existing languages.

This breakthrough far exceeds last year’s premiere voice-to-voice translation technology, which enabled the direct translation of the Minnan dialect into English. Now, Meta’s expansive multilingual model can accommodate more than 4,000 spoken languages, a number forty times larger than existing technologies. This advancement can be applied to augmented and virtual reality scenarios, allowing people to communicate in their preferred languages.

In presenting this model, Meta has affirmed that this technology can preserve the majority of orally transmitted languages, thereby safeguarding a wealth of cultural heritage.

This technique originates from the conventional fields of text-to-speech and speech-to-text technology. Initially supporting only 100 languages, it can now convert over 1,100 languages and even recognize more than 4,000 spoken languages, facilitating smooth communication among users of various languages.

Behind this technology lies religious scripture that has been translated into multiple languages, widely read, and studied. For instance, the Bible, widely read and translated globally, is one of the text sources used for training this technology.

By utilizing audio readings of the New Testament Bible translated into 1,100 languages, with each language’s audio data averaging 32 hours in length, and subsequently adding unannotated Christian audio readings, the training data corresponds to over 4,000 languages.

Although the collected data predominantly feature male voices, the trained model is still capable of accurately identifying content expressed by both male and female voices. Furthermore, despite the predominance of religious content in the training data, the resulting model does not automatically generate additional religious content.

Meta has expressed its intent to persist in broadening the scope of the multilingual model, to support identification and translation among an even wider array of languages, and to overcome dialect-related content that has proven challenging with current technology.