|
|
1.1 root 1: Ideas for extending GNU Emacs to deal with arbitrary character sets.
2:
3: I would like GNU Emacs to be extended to handle all the world's alphabets
4: and word signs. I don't expect to have time to do such a thing in the next
5: few years, so here are my ideas on the best way to do it.
6:
7: * Each graphic is represented by a sequence of ordinary 8-bit characters.
8:
9: * All the characters that make up such a sequence have codes >= 0200.
10:
11: * The first character of such a sequence is between 0200 and 0237.
12:
13: * The remaining characters of such a sequence are all 0240 or higher.
14:
15: * The first character of the sequence determines the number of characters
16: in the sequence. Thus, 0200...0207 could start two-character sequences,
17: 0210...0227 could start three-character sequences, and 0230 could start
18: four-character sequences. (Codes 0231...0237 would be reserved.)
19:
20: * Several common alphabets, and some mathematical symbols, would get
21: two-character sequences. (Probably Greek, Russian, Hebrew(?), Arabic(?),
22: Korean, and Japanese kana). The remaining alphabets, and some versions of
23: Chinese, would get three-character sequences. Other sets of Chinese
24: characters would get four-character sequences.
25:
26: Each country that uses Chinese characters has its own standard character
27: set, and it is not easy to correlate them to avoid overlap. So there may
28: need to be several sets of Chinese characters. That is why they need so
29: much code space.
30:
31: True support for Hebrew and Arabic requires dealing with the problem of
32: writing direction for mixed text; I don't know what to do for that.
33:
34: * The functions that use syntax table would determine the
35: syntax of a sequence from its first character.
36:
37: * Functions in indent.c for computing widths and columns would
38: determine the width of a sequence from its first character.
39: So would display routines.
40:
41: * Only a few other editing routines would need any change. In
42: particular, searching and regexp matching might not need any change.
43:
44: * Most of the work required would be in redisplay. The only case that
45: needs to be supported is with X windows, since ordinary terminals
46: can't display all these characters anyway.
47:
48: * There might need to be code to translate files from this format
49: to whatever format is typically stored on disk.
50:
51:
52: I would be very unhappy with half-measures, such as support for
53: Japanese only.
54:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.