Local Language Portal

Lexical Resources

Path Nirvana Sinhala TTS Dataset

High Quality Sinhala dataset for Text to speech algorithm training – specially designed for deep learning algorithms.

A new dataset that can be used for building new Sinhala TTS voices using deep learning algorithms is now available below:
https://github.com/pathnirvana/sinhala-tts-dataset

LANGUAGE TECHNOLOGY RESEARCH LABORATORY – UCSC

10 Million word contemporary Sinhala text corpus for language research

100K word English, Sinhala parallel corpus

500k Sinhala tagged corpus

1300 word Sinhala WordNet for language technology improvement

UCSC Sinhala POS tagset

List of proper names for language research

NamedEntity Tagged Corpus

List of Sinhala Functional Words

Ingiya English-Sinhala dictionary database

400K Distinct word list

Speech corpora for Sinhala speech processing