Indonesian Automatic Speech Recognition
Speech Recognition for Indonesian, Javanese and Sundanese.
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text.
By Wilson Wongso, Steven Limcorn and AI-Research.id team
June 1, 2021
Models
Name | Description | Author | Link |
---|---|---|---|
Wav2Vec2-Large-XLSR-Indonesian | Fine-tuned facebook/wav2vec2-large-xlsr-53 on the Indonesian Artificial Common Voice dataset. When using this model, make sure that your speech input is sampled at 16kHz. | Cahya Wirawan | HuggingFace |
Wav2Vec2-Large-XLSR-Indonesian | Fine-tuned facebook/wav2vec2-large-xlsr-53 on the Indonesian Common Voice dataset and synthetic voices generated using Artificial Common Voicer, which again based on Google Text To Speech. When using this model, make sure that your speech input is sampled at 16kHz. | Cahya Wirawan | HuggingFace |
Wav2Vec2-Large-XLSR-Indonesian | Fine-tuned facebook/wav2vec2-large-xlsr-53 on the Indonesian Common Voice dataset. When using this model, make sure that your speech input is sampled at 16kHz. | Cahya Wirawan | HuggingFace |
Wav2Vec2-Large-XLSR-Indonesian | This is the model for Wav2Vec2-Large-XLSR-Indonesian, a fine-tuned facebook/wav2vec2-large-xlsr-53 model on the Indonesian Common Voice dataset. When using this model, make sure that your speech input is sampled at 16kHz. | Galuh | HuggingFace |
Wav2Vec2-Large-XLSR-Indonesian | This is the model for Wav2Vec2-Large-XLSR-Indonesian, a fine-tuned facebook/wav2vec2-large-xlsr-53 model on the Indonesian Common Voice dataset. When using this model, make sure that your speech input is sampled at 16kHz. | Indonesian NLP | HuggingFace |
Wav2Vec2-Large-XLSR-Indonesian | This is the baseline for Wav2Vec2-Large-XLSR-Indonesian, a fine-tuned facebook/wav2vec2-large-xlsr-53 model on the Indonesian Common Voice dataset. It was trained using the default hyperparamer and for 2x30 epochs. When using this model, make sure that your speech input is sampled at 16kHz. | Indonesian NLP | HuggingFace |
Wav2Vec2-Large-XLSR-53-Indonesia | Fine-tuned facebook/wav2vec2-large-xlsr-53 in Indonesia using the Common Voice When using this model, make sure that your speech input is sampled at 16kHz. | Muhammad Agung Hambali | HuggingFace |
Wav2Vec2-Large-XLSR-53-Indonesia | Fine-tuned facebook/wav2vec2-large-xlsr-53 in Indonesia using the Common Voice When using this model, make sure that your speech input is sampled at 16kHz. | Muhammad Agung Hambali | HuggingFace |
XLSR-Indonesia | Wav2Vec2 fine-tuned on Common Voice ID Test. | Samsul Rahmadani | HuggingFace |
Datasets
Name | Description | Author | Link |
---|---|---|---|
Common Voice | The Common Voice dataset consists of a unique MP3 and corresponding text file. Many of the 9,283 recorded hours in the dataset also include demographic metadata like age, sex, and accent that can help train the accuracy of speech recognition engines. | Ardila, R. and Branson, M. and Davis, K. and Henretty, M. and Kohler, M. and Meyer, J. and Morais, R. and Saunders, L. and Tyers, F. M. and Weber, G. | HuggingFace |
VolLingua107 | VoxLingua107 is a speech dataset for training spoken language identification models. The dataset consists of speech segments extracted from YouTube videos & post-processed. The Indonesian dataset has 40 hours (3.8G) | Jörgen Valk, Tanel Alumäe | bark.phon.ioc.ee |