Who are Turbonomic's main competitors

Mozilla's gigantic language and voice database

With the free publication of a comprehensive speech and voice database, Mozilla wants to promote the development of alternative speech recognition systems.

Mozilla, the maker of the Firefox browser, has made available the largest dataset of human voices recorded entirely by volunteers. The "Common Voice" project aims to create the world's most diverse voice data set, which is optimized for the development of voice technologies.

The company in San Francisco wants to enable smaller manufacturers and crowdfunding projects in particular to develop their own speech recognition systems without license fees. So far, the large Internet companies such as Google, Microsoft, IBM, Amazon and Apple have dominated the speech recognition market. Another important player is Nuance, whose technology is behind the speech recognition of Apple's Siri.

According to the company, Mozilla's data set includes 18 different languages, including English, French, German and Mandarin (traditional), but also, for example, Welsh and Kabyle, an Algerian Berber language. The dataset adds up to almost 1,400 hours of recorded voice data from more than 42,000 contributors.

The data collected by Mozilla are available under the »CC0« license. This is the most permissive variant of the Creative Commons licenses (»No rights reserved«). The project participants also voluntarily provided metadata such as age, gender and accent. "This means that further information is saved together with your recordings, with which language engines can be trained even better," says the Mozilla blog entry. They want to "contribute to a diverse and innovative ecosystem of language technologies". The aim is to bring their own voice-controlled products onto the market, but also to support researchers and smaller players.

Related articles

Mozilla, dpa

Artificial intelligence