Skip to content

Latest commit

 

History

History
27 lines (25 loc) · 1.25 KB

README.md

File metadata and controls

27 lines (25 loc) · 1.25 KB

KhmerConverterPHP

These scripts transcode strings from Legacy khmer fonts to Unicode and vice versa. You can see them in action at http://www.selapa.net/khmerfonts/

How does it work?

Legacy → Unicode

  1. Search and replace from the database
  2. Recompose characters
  3. Transcode other characters * Ligatures get separated into characters * Ornaments get enclosed between 0x91 and 0x92 * Khmer characters missing in Unicode get enclosed between 0x86 and 0x87 * Characters missing in the legacy font get enclosed between 0x96 and 0x97
  4. Reorder characters according to Unicode order
    This code is translated to PHP from KhmerOS khmerconverter Python software

Unicode → Legacy

  1. Reorder characters according to visual order
    This code is translated to PHP from KhmerOS khmerconverter Python software
  2. Search and replace from the database
  3. Transcode characters
  4. Decompose composite characters if necessary * Missing characters get enclosed between 0x96 and 0x97
  5. Apply ligatures if present in the font

TODO

  • Refine the database (some font mappings aren't yet correct)
  • Word-breaking
  • Transcode documents with multiple fonts