|
|
1.1 root 1: .TH TCS 1
2: .SH NAME
3: tcs \- translate character sets
4: .SH SYNOPSIS
5: .B tcs
6: [
7: .B -slcv
8: ]
9: [
10: .B -f
11: .I ics
12: ]
13: [
14: .B -t
15: .I ocs
16: ]
17: [
18: .I file ...
19: ]
20: .SH DESCRIPTION
21: .I Tcs
22: interprets the named
23: .I file(s)
24: (standard input default) as a stream of characters from the
25: .I ics
26: character set or format, converts them to runes,
27: and then converts them into a stream of characters from the
28: .I ocs
29: character set or format on the standard output.
30: The default value for
31: .I ics
32: and
33: .I ocs
34: is
35: .BR utf ,
36: the
37: .SM UTF
38: encoding described in
39: .IR utf (6).
40: The
41: .B -l
42: option lists the character sets known to
43: .IR tcs .
44: Processing continues in the face of conversion errors (the
45: .B -s
46: option prevents reporting of these errors).
47: The
48: .B -c
49: option forces the output to contain only correctly converted characters;
50: otherwise,
51: .B 0x80
52: characters will be substituted for
53: .SM UTF
54: encoding errors and
55: .B 0xFFFD
56: characters will substituted for unknown characters.
57: .PP
58: The
59: .B -v
60: option generates various diagnostic and summary information on standard error,
61: or makes the
62: .B -l
63: output more verbose.
64: .PP
65: .I Tcs
66: recognizes an ever changing list of character sets.
67: In particular, it supports a variety of Russian and Japanese encodings.
68: Some of the supported encodings are
69: .TF jis-kanji
70: .TP
71: .B utf
72: The Plan 9
73: .SM UTF
74: encoding, known by ISO as UTF-8
75: .TP
76: .B utf1
77: The deprecated original
78: .SM UTF
79: encoding from ISO 10646
80: .TP
81: .B ascii
82: 7-bit ASCII
83: .TP
84: .B 8859-1
85: Latin-1 (Central European)
86: .TP
87: .B 8859-2
88: Latin-2 (Czech .. Slovak)
89: .TP
90: .B 8859-3
91: Latin-3 (Dutch .. Turkish)
92: .TP
93: .B 8859-4
94: Latin-4 (Scandinavian)
95: .TP
96: .B 8859-5
97: Part 5 (Cyrillic)
98: .TP
99: .B 8859-6
100: Part 6 (Arabic)
101: .TP
102: .B 8859-7
103: Part 7 (Greek)
104: .TP
105: .B 8859-8
106: Part 8 (Hebrew)
107: .TP
108: .B 8859-9
109: Latin-5 (Finnish .. Portuguese)
110: .TP
111: .B koi8
112: KOI-8 (GOST 19769-74)
113: .TP
114: .B jis-kanji
115: ISO 2022-JP
116: .TP
117: .B ujis
118: EUC-JX: JIS 0208
119: .TP
120: .B ms-kanji
121: Microsoft, or Shift-JIS
122: .TP
123: .B jis
124: (from only) guesses between ISO 2022-JP, EUC or Shift-Jis
125: .TP
126: .B gb
127: Chinese national standard (GB2312-80)
128: .TP
129: .B big5
130: Big 5 (HKU version)
131: .TP
132: .B unicode
133: Unicode Standard 1.0
134: .TP
135: .B tis
136: Thai character set plus ASCII (TIS 620-1986)
137: .TP
138: .B msdos
139: IBM PC: CP 437
140: .TP
141: .B atari
142: Atari-ST character set
143: .SH EXAMPLES
144: .TP
145: .B tcs -f 8859-1
146: Convert 8859-1 (Latin-1) characters into
147: .SM UTF
148: format.
149: .TP
150: .B tcs -s -f jis
151: Convert characters encoded in one of several shift JIS encodings into
152: .SM UTF
153: format.
154: Unknown Kanji will be converted into
155: .B 0xFFFD
156: characters.
157: .TP
158: .B tcs -lv
159: Print an up to date list of the supported character sets.
160: .SH SOURCE
161: .B /sys/src/cmd/tcs
162: .SH SEE ALSO
163: .IR ascii (1),
164: .IR rune (2),
165: .IR utf (6).
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.