Annotation of researchv10dc/man/adm/man1/ocr.1, revision 1.1.1.1

1.1       root        1: .TH OCR 1 coma,pipe,crab
                      2: .CT 1 graphics
                      3: .SH NAME
                      4: ocr \- optical character recognition
                      5: .SH SYNOPSIS
                      6: .B ocr
                      7: [
                      8: .I option ...
                      9: ]
                     10: [
                     11: .I file
                     12: ]
                     13: .SH DESCRIPTION
                     14: .I Ocr
                     15: reads a black-and-white image of a page from
                     16: .IR file ,
                     17: and writes ASCII to the standard output.
                     18: If no
                     19: .I file
                     20: is specified, it reads from the standard input.
                     21: .PP
                     22: The input is a
                     23: .IR picfile (5)
                     24: image of one column of machine-printed text.
                     25: Fonts, sizes, and line-spacings may vary within the column,
                     26: but each line should have a constant size and baseline.
                     27: Lines should be parallel and roughly horizontal.
                     28: .PP
                     29: In the output, white space approximates the original page layout.
                     30: Words are checked and corrected by reference to the
                     31: .IR spell (1)
                     32: dictionary, and hyphenations across lines are recombined.
                     33: .PP
                     34: The options are:
                     35: .nr xx \w'\fL-pn,m\ \ '
                     36: .TP \n(xxu
                     37: .BI -a s
                     38: The alphabet is the union of symbol sets selected by characters in string
                     39: .IR s ,
                     40: from among:
                     41: .RS
                     42: .PD
                     43: .nr yy \w'\fLA\ \ '
                     44: .TP \n(yyu
                     45: .B A
                     46: ABCDEFGHIJKLMNOPQRSTUVWXYZ
                     47: .PD0
                     48: .TP
                     49: .B a
                     50: abcdefghijklmnopqrstuvwxyz
                     51: .PD0
                     52: .TP
                     53: .B 0
                     54: 0123456789
                     55: .PD0
                     56: .TP
                     57: .B .
                     58: \&.\^,\|-\^:\^;\|*\^'\|\^"\|?\^!\|/\|&\|$\^(\^)\^[\|\^]\|#\|@\|% \0\0\0\0\0\0\0 \kz(basic punctuation)
                     59: .ig
                     60: should include ` /(em + ???
                     61: shouldn't include []#@% ???
                     62: ..
                     63: .PD0
                     64: .TP
                     65: .B ^
                     66: ^\|\f(CW~\fR\^`\|\^\\\||\|\^{\|}\|_ \h'|\nzu'(extended punctuation)
                     67: .ig
                     68: should include []#@% ???
                     69: shouldn't include ` ???
                     70: ..
                     71: .PD0
                     72: .TP
                     73: .B +
                     74: +\^\-\^*\|/\|<\^>\^=\^.\^E\|e\|[\|] \h'|\nzu'(for numerical tables)
                     75: .PD0
                     76: .TP
                     77: .B s
                     78: .ie t \(sc\^\(dg\^\(dd\^\(ct\|\(bu\|\(rg\|\(co\|\(de\^\(fm\^\(en\|\^\(mi\|\(em \h'|\nzu'(selected non-ASCII)
                     79: .el \\(sc\\(dg\\(dd\\(ct\\(bu\\(rg\\(co\\(de\\(fm\\(en\\(mi\\(em (selected non-ASCII)
                     80: .PD0
                     81: .TP
                     82: .B l
                     83: .ie t \(fi\|\(fl\|f\h'-.1m'f\|f\h'-.1m'\(fi\|f\h'-.1m'\(fl\|\N'114'\|\N'115'\|\N'105'\|\N'106' \h'|\nzu'(ligatures and digraphs)
                     84: .el fi fl ff ffi ffl ae AE oe OE \h'|\nzu'(ligatures & digraphs)
                     85: .PD
                     86: .PP
                     87: The default is
                     88: .BR -aAa0.+^ ,
                     89: the full printable-ASCII set, which may be abbreviated as
                     90: .BR -ap .
                     91: Thus,
                     92: .B -apsl
                     93: selects all of the above.
                     94: .RE
                     95: .PD
                     96: .TP \n(xxu
                     97: .BI -m l[,r]
                     98: Trim the left and right margins of the image by
                     99: .I l
                    100: and
                    101: .I r
                    102: inches, respectively, before looking for columns.
                    103: If
                    104: .I r
                    105: is omitted, it is assumed to equal
                    106: .IR l.
                    107: .TP
                    108: .BI -n n
                    109: Find the
                    110: .I n
                    111: largest columns.
                    112: Each column should be compactly-printed
                    113: and separated from the others by at least 5 ems of horizontal white space.
                    114: .TP
                    115: .BI -p n,m
                    116: Point sizes lie in the range [
                    117: .I n, m
                    118: ]; other sizes are discarded.
                    119: The default is
                    120: .BR -p6,24 .
                    121: .TP
                    122: .B -t
                    123: Write
                    124: .IR troff (1)
                    125: format.
                    126: Each column is shown on a separate page, left- and top-justified.
                    127: Lines are placed at their original height in the column,
                    128: and each word starts at its original horizontal location in the line.
                    129: Characters are printed approximately original size in Times roman.
                    130: Hyphenated words are not recombined.
                    131: .TP
                    132: .B -u
                    133: Unspellable words are prefixed with `?' or, if
                    134: .B -t
                    135: is specified, printed boldface.
                    136: .TP
                    137: .BI -w w
                    138: Find the largest column of width
                    139: .I w
                    140: inches.
                    141: .SS Fonts
                    142: Times, Helvetica, Palatino, Constant Width, Printout, Baskerville, Memphis,
                    143: Caslon Old, Zapf, Optima, Futura, Euro, Spartan, Garamond, Breughel, Textype,
                    144: Bembo, Souvenir and similar fonts are recognized in roman,
                    145: italic, bold, condensed, and expanded styles.
                    146: Also Tibetan, on request.
                    147: .SH SEE ALSO
                    148: .IR bcp (1),
                    149: .IR cscan (1),
                    150: .IR font (6),
                    151: .IR picfile (5),
                    152: .IR spell (1),
                    153: .IR troff (1)
                    154: .SH BUGS
                    155: For best results, use images of high-contrast, cleanly-printed original
                    156: documents digitized at a resolution of 400 pixels/inch or higher.
                    157: It sometimes helps to restrict the alphabet and sizes to what's there.
                    158: Multiple-column finding is chancy; if it goes wrong, runtimes may be excessive.
                    159: .ig
                    160: 8.7 CPU minutes on pipe to read this page, September 1989.
                    161: ..

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.