There are over about 7,000 languages spoken around the world today. However, over 40% of them are endangered and may soon be lost. Meta has proposed a series of AI models that could help preserve language diversity and provide robust NLP tools that benefit those who speak underrepresented languages called Massively Multilingual Speech (MMS).
A vast majority of language models are developed in English, and lack of available language resources/data scarcity is the main challenge in developing AI models that work with languages that few people speak natively. The most extensive speech datasets cover only about 100 languages. Many languages don’t have readily available labeled speech data (audio recordings paired with a correct transcription). To get around this, Meta trained its MMS model on religious texts that have already been translated into different languages with publicly available data of speakers reading these texts, namely the Bible. Meta created a dataset that includes over 1,100 languages using readings of the New Testament, scaling up to 4,000 languages using other Christian religious readings. While the content of the data consists of solely religious readings by predominantly male speakers, Meta has stated that it hasn’t introduced any bias into the output of MMS. Meta combined this strategy with wav2vec 2.0, a machine-learning technique that converts audio waves to numerical values for better representation. Wav2vec 2.0 serves as the basis for the MMS model and uses self-supervised learning, meaning that it doesn’t require any human supervision to learn. As a result of these methods, MMS can recognize over 4,000 languages, and perform text-to-speech and speech-to-text on about 1,100 languages. These technologies were previously only available for about 100 languages.
MMS benefits global language diversity. MMS provides a way to document, store, and access languages that may soon go extinct using speech technology. This preserves the cultural heritage of these languages forever, even if there are no longer any native speakers. In addition, it can facilitate cross-lingual communication for languages that previously didn’t have any translation tools available. Not only does MMS help preserve languages using speech technology, but it could also provide NLP tools to people in developing countries who speak indigenous languages, allowing them to search for information online easier.
One huge benefit of MMS for researchers is that it is completely open-source. Anyone can view the models and source code, allowing people to innovate, improve, and verify the quality of the technology.