Lexical Resources

LANGUAGE TECHNOLOGY RESEARCH LABORATORY – UCSC

10 Million word contemporary Sinhala text corpus for language research

100K word English, Sinhala parallel corpus

500k Sinhala tagged corpus

1300 word Sinhala WordNet for language technology improvement

UCSC Sinhala POS tagset

List of proper names for language research

NamedEntity Tagged Corpus

List of Sinhala Functional Words

Ingiya English-Sinhala dictionary database

400K Distinct word list

Speech corpora for Sinhala speech processing