What is Unicode?
a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was
adequate for all the letters, punctuation, and technical symbols in common use.
These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption.
Unicode is changing all that! Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. Unicode
Incorporating Unicode into client-server or multi-tiered applications and websites offers significant cost savings over the use of legacy character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption.
The importance of Unicode
Unicode represents a mechanism to support more regionally popular encoding systems – such as the ISO-8859 variants in Europe, Shift-JIS in Japan, or BIG-5 in China.
From a translation/localization point of view, Unicode is an important step towards standardization, at least from a tools and file format standpoint.
- Unicode enables a single software product or a single website to be designed for multiple platforms, languages and countries (no need for re-engineering) which can lead to a significant reduction in cost over the use of legacy character sets.
- Unicode data can be used through many different systems without data corruption.
- Unicode represents a single encoding scheme for all languages and characters.
- Unicode is a common point in the conversion between other character encoding schemes. Since it is a superset of all of the other common character encoding systems, you can convert from one encoding scheme to Unicode, and then from Unicode to the other encoding scheme.
- Unicode is the preferred encoding scheme used by XML-based tools and applications.
Bottom line: Unicode is a worldwide character-encoding standard, published by the Unicode Consortium. Computers store numbers that represent a character; Unicode provides a unique number for every character.
Issues of “Yansaya”, “Rakaranshaya” and “Repaya” and how to prevent that
Following tips were formulated by looking at most common errors that has been done in writing using unicode sinhala. Please try to avoid the errors far as possible using the following tips.
- kombuwa (ෙ) should come after the consonant.
e.g.: gedara (ගෙදර) is written as: ga + kombuwa + da + ra but not as kombuwa + ga + da + ra.
- kombuwa haa aela pilla (ො) is a single composite modifier.
e.g.: kolamba (කොළඹ) is written as: ka + kombuwa ha aela pilla + la + mba but not as kombuwa + ka + aela pilla + la + mba.
- in the same lines all the modifiers ේ, ෛ, ො, ෝ, ෞ have their own single composites.
- rakaranshaya (්ර) is written as: hal kereema + zero width joiner(zwj) + ra
e.g.: prauda (ප්රෞඪ) is written as: pa + hal kereema + zwj + ra + kombuwa haa gayanukitta + ddha
- yansaya (්ය) is written as: hal kereema + zwj + ya
e.g.: udyoga (උද්යෝග) is written as: u + da + hal kereema + zwj + ya + kombuwa haa diga aela pilla + ga
Font Rendering issues in Apple and Mac os
Unicode issues related to Adobe collection
'Iskolapotha' and 'Nirmala'
Different of 'Serif' and 'Sans-serif'
A typeface which has attributes or strokes at the tips of the letter called a serif typeface (or serifed typeface). A typeface without serifs is called sans-serif or sans serif, from the French sans, meaning “without”.