GNUtools/emacs/etc/CHARACTERS - annotate

Return to CHARACTERS CVS log
Up to [Apple XNU] / GNUtools / emacs / etc
Annotation of GNUtools/emacs/etc/CHARACTERS, revision 1.1

1.1     ! root        1:    Ideas for extending GNU Emacs to deal with arbitrary character sets.
        !             2: 
        !             3: I would like GNU Emacs to be extended to handle all the world's alphabets
        !             4: and word signs.  I don't expect to have time to do such a thing in the next
        !             5: few years, so here are my ideas on the best way to do it.
        !             6: 
        !             7: * Each graphic is represented by a sequence of ordinary 8-bit characters.
        !             8: 
        !             9: * All the characters that make up such a sequence have codes >= 0200.
        !            10: 
        !            11: * The first character of such a sequence is between 0200 and 0237.
        !            12: 
        !            13: * The remaining characters of such a sequence are all 0240 or higher.
        !            14: 
        !            15: * The first character of the sequence determines the number of characters
        !            16: in the sequence.  Thus, 0200...0207 could start two-character sequences,
        !            17: 0210...0227 could start three-character sequences, and 0230 could start
        !            18: four-character sequences.  (Codes 0231...0237 would be reserved.)
        !            19: 
        !            20: *  Several common  alphabets,  and  some mathematical   symbols,  would get
        !            21: two-character sequences.  (Probably Greek,  Russian,  Hebrew(?), Arabic(?),
        !            22: Korean, and Japanese kana).  The remaining alphabets, and  some versions of
        !            23: Chinese,  would   get  three-character sequences.    Other  sets of Chinese
        !            24: characters would get four-character sequences.
        !            25: 
        !            26: Each country that uses Chinese characters has its own standard character
        !            27: set, and it is not easy to correlate them to avoid overlap.  So there may
        !            28: need to be several sets of Chinese characters.  That is why they need so
        !            29: much code space.
        !            30: 
        !            31: True support for Hebrew and Arabic requires dealing with the problem of
        !            32: writing direction for mixed text; I don't know what to do for that.
        !            33: 
        !            34: * The functions that use syntax table would determine the
        !            35: syntax of a sequence from its first character.
        !            36: 
        !            37: * Functions in indent.c for computing widths and columns would
        !            38: determine the width of a sequence from its first character.
        !            39: So would display routines.
        !            40: 
        !            41: * Only a few other editing routines would need any change.  In
        !            42: particular, searching and regexp matching might not need any change.
        !            43: 
        !            44: * Most of the work required would be in redisplay.  The only case that
        !            45: needs to be supported is with X windows, since ordinary terminals
        !            46: can't display all these characters anyway.
        !            47: 
        !            48: * There might need to be code to translate files from this format
        !            49: to whatever format is typically stored on disk.
        !            50: 
        !            51: 
        !            52: I would be very unhappy with half-measures, such as support for
        !            53: Japanese only.
        !            54:
unix.superglobalmegacorp.com
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.