Annotation of GNUtools/emacs/etc/CHARACTERS, revision 1.1.1.1

1.1       root        1:    Ideas for extending GNU Emacs to deal with arbitrary character sets.
                      2: 
                      3: I would like GNU Emacs to be extended to handle all the world's alphabets
                      4: and word signs.  I don't expect to have time to do such a thing in the next
                      5: few years, so here are my ideas on the best way to do it.
                      6: 
                      7: * Each graphic is represented by a sequence of ordinary 8-bit characters.
                      8: 
                      9: * All the characters that make up such a sequence have codes >= 0200.
                     10: 
                     11: * The first character of such a sequence is between 0200 and 0237.
                     12: 
                     13: * The remaining characters of such a sequence are all 0240 or higher.
                     14: 
                     15: * The first character of the sequence determines the number of characters
                     16: in the sequence.  Thus, 0200...0207 could start two-character sequences,
                     17: 0210...0227 could start three-character sequences, and 0230 could start
                     18: four-character sequences.  (Codes 0231...0237 would be reserved.)
                     19: 
                     20: *  Several common  alphabets,  and  some mathematical   symbols,  would get
                     21: two-character sequences.  (Probably Greek,  Russian,  Hebrew(?), Arabic(?),
                     22: Korean, and Japanese kana).  The remaining alphabets, and  some versions of
                     23: Chinese,  would   get  three-character sequences.    Other  sets of Chinese
                     24: characters would get four-character sequences.
                     25: 
                     26: Each country that uses Chinese characters has its own standard character
                     27: set, and it is not easy to correlate them to avoid overlap.  So there may
                     28: need to be several sets of Chinese characters.  That is why they need so
                     29: much code space.
                     30: 
                     31: True support for Hebrew and Arabic requires dealing with the problem of
                     32: writing direction for mixed text; I don't know what to do for that.
                     33: 
                     34: * The functions that use syntax table would determine the
                     35: syntax of a sequence from its first character.
                     36: 
                     37: * Functions in indent.c for computing widths and columns would
                     38: determine the width of a sequence from its first character.
                     39: So would display routines.
                     40: 
                     41: * Only a few other editing routines would need any change.  In
                     42: particular, searching and regexp matching might not need any change.
                     43: 
                     44: * Most of the work required would be in redisplay.  The only case that
                     45: needs to be supported is with X windows, since ordinary terminals
                     46: can't display all these characters anyway.
                     47: 
                     48: * There might need to be code to translate files from this format
                     49: to whatever format is typically stored on disk.
                     50: 
                     51: 
                     52: I would be very unhappy with half-measures, such as support for
                     53: Japanese only.
                     54: 

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.