|
|
1.1 ! root 1: Ideas for extending GNU Emacs to deal with arbitrary character sets. ! 2: ! 3: I would like GNU Emacs to be extended to handle all the world's alphabets ! 4: and word signs. I don't expect to have time to do such a thing in the next ! 5: few years, so here are my ideas on the best way to do it. ! 6: ! 7: * Each graphic is represented by a sequence of ordinary 8-bit characters. ! 8: ! 9: * All the characters that make up such a sequence have codes >= 0200. ! 10: ! 11: * The first character of such a sequence is between 0200 and 0237. ! 12: ! 13: * The remaining characters of such a sequence are all 0240 or higher. ! 14: ! 15: * The first character of the sequence determines the number of characters ! 16: in the sequence. Thus, 0200...0207 could start two-character sequences, ! 17: 0210...0227 could start three-character sequences, and 0230 could start ! 18: four-character sequences. (Codes 0231...0237 would be reserved.) ! 19: ! 20: * Several common alphabets, and some mathematical symbols, would get ! 21: two-character sequences. (Probably Greek, Russian, Hebrew(?), Arabic(?), ! 22: Korean, and Japanese kana). The remaining alphabets, and some versions of ! 23: Chinese, would get three-character sequences. Other sets of Chinese ! 24: characters would get four-character sequences. ! 25: ! 26: Each country that uses Chinese characters has its own standard character ! 27: set, and it is not easy to correlate them to avoid overlap. So there may ! 28: need to be several sets of Chinese characters. That is why they need so ! 29: much code space. ! 30: ! 31: True support for Hebrew and Arabic requires dealing with the problem of ! 32: writing direction for mixed text; I don't know what to do for that. ! 33: ! 34: * The functions that use syntax table would determine the ! 35: syntax of a sequence from its first character. ! 36: ! 37: * Functions in indent.c for computing widths and columns would ! 38: determine the width of a sequence from its first character. ! 39: So would display routines. ! 40: ! 41: * Only a few other editing routines would need any change. In ! 42: particular, searching and regexp matching might not need any change. ! 43: ! 44: * Most of the work required would be in redisplay. The only case that ! 45: needs to be supported is with X windows, since ordinary terminals ! 46: can't display all these characters anyway. ! 47: ! 48: * There might need to be code to translate files from this format ! 49: to whatever format is typically stored on disk. ! 50: ! 51: ! 52: I would be very unhappy with half-measures, such as support for ! 53: Japanese only. ! 54:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.