Sundanese Automatic Speech Recognition

Sundanese Speech Recognition and its Dataset

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text.

By Wilson Wongso, Steven Limcorn and AI-Research.id team

June 1, 2021

Models

Name	Description	Author	Link
Wav2Vec2-Large-XLSR-Sundanese	Fine-tuned facebook/wav2vec2-large-xlsr-53 on the OpenSLR High quality TTS data for Sundanese. When using this model, make sure that your speech input is sampled at 16kHz.	Cahya Wirawan	HuggingFace

Datasets

Name	Description	Author	Link
OpenSLR	This data set contains transcribed audio data for Sundanese (~220K utterances). The data set consists of wave files, and a TSV file. The file utt_spk_text.tsv contains a FileID, UserID and the transcription of audio in the file.	Oddur Kjartansson and Supheakmungkol Sarin and Knot Pipatsrisawat and Martin Jansche and Linne Ha	HuggingFace
VolLingua107	VoxLingua107 is a speech dataset for training spoken language identification models. The dataset consists of speech segments extracted from YouTube videos & post-processed. The Sundanese dataset has 64 hours (6.2G)	Jörgen Valk, Tanel Alumäe	bark.phon.ioc.ee