|
|
1.1 ! root 1: .TH OCR 1 coma,pipe,crab ! 2: .CT 1 graphics ! 3: .SH NAME ! 4: ocr \- optical character recognition ! 5: .SH SYNOPSIS ! 6: .B ocr ! 7: [ ! 8: .I option ... ! 9: ] ! 10: [ ! 11: .I file ! 12: ] ! 13: .SH DESCRIPTION ! 14: .I Ocr ! 15: reads a black-and-white image of a page from ! 16: .IR file , ! 17: and writes ASCII to the standard output. ! 18: If no ! 19: .I file ! 20: is specified, it reads from the standard input. ! 21: .PP ! 22: The input is a ! 23: .IR picfile (5) ! 24: image of one column of machine-printed text. ! 25: Fonts, sizes, and line-spacings may vary within the column, ! 26: but each line should have a constant size and baseline. ! 27: Lines should be parallel and roughly horizontal. ! 28: .PP ! 29: In the output, white space approximates the original page layout. ! 30: Words are checked and corrected by reference to the ! 31: .IR spell (1) ! 32: dictionary, and hyphenations across lines are recombined. ! 33: .PP ! 34: The options are: ! 35: .nr xx \w'\fL-pn,m\ \ ' ! 36: .TP \n(xxu ! 37: .BI -a s ! 38: The alphabet is the union of symbol sets selected by characters in string ! 39: .IR s , ! 40: from among: ! 41: .RS ! 42: .PD ! 43: .nr yy \w'\fLA\ \ ' ! 44: .TP \n(yyu ! 45: .B A ! 46: ABCDEFGHIJKLMNOPQRSTUVWXYZ ! 47: .PD0 ! 48: .TP ! 49: .B a ! 50: abcdefghijklmnopqrstuvwxyz ! 51: .PD0 ! 52: .TP ! 53: .B 0 ! 54: 0123456789 ! 55: .PD0 ! 56: .TP ! 57: .B . ! 58: \&.\^,\|-\^:\^;\|*\^'\|\^"\|?\^!\|/\|&\|$\^(\^)\^[\|\^]\|#\|@\|% \0\0\0\0\0\0\0 \kz(basic punctuation) ! 59: .ig ! 60: should include ` /(em + ??? ! 61: shouldn't include []#@% ??? ! 62: .. ! 63: .PD0 ! 64: .TP ! 65: .B ^ ! 66: ^\|\f(CW~\fR\^`\|\^\\\||\|\^{\|}\|_ \h'|\nzu'(extended punctuation) ! 67: .ig ! 68: should include []#@% ??? ! 69: shouldn't include ` ??? ! 70: .. ! 71: .PD0 ! 72: .TP ! 73: .B + ! 74: +\^\-\^*\|/\|<\^>\^=\^.\^E\|e\|[\|] \h'|\nzu'(for numerical tables) ! 75: .PD0 ! 76: .TP ! 77: .B s ! 78: .ie t \(sc\^\(dg\^\(dd\^\(ct\|\(bu\|\(rg\|\(co\|\(de\^\(fm\^\(en\|\^\(mi\|\(em \h'|\nzu'(selected non-ASCII) ! 79: .el \\(sc\\(dg\\(dd\\(ct\\(bu\\(rg\\(co\\(de\\(fm\\(en\\(mi\\(em (selected non-ASCII) ! 80: .PD0 ! 81: .TP ! 82: .B l ! 83: .ie t \(fi\|\(fl\|f\h'-.1m'f\|f\h'-.1m'\(fi\|f\h'-.1m'\(fl\|\N'114'\|\N'115'\|\N'105'\|\N'106' \h'|\nzu'(ligatures and digraphs) ! 84: .el fi fl ff ffi ffl ae AE oe OE \h'|\nzu'(ligatures & digraphs) ! 85: .PD ! 86: .PP ! 87: The default is ! 88: .BR -aAa0.+^ , ! 89: the full printable-ASCII set, which may be abbreviated as ! 90: .BR -ap . ! 91: Thus, ! 92: .B -apsl ! 93: selects all of the above. ! 94: .RE ! 95: .PD ! 96: .TP \n(xxu ! 97: .BI -m l[,r] ! 98: Trim the left and right margins of the image by ! 99: .I l ! 100: and ! 101: .I r ! 102: inches, respectively, before looking for columns. ! 103: If ! 104: .I r ! 105: is omitted, it is assumed to equal ! 106: .IR l. ! 107: .TP ! 108: .BI -n n ! 109: Find the ! 110: .I n ! 111: largest columns. ! 112: Each column should be compactly-printed ! 113: and separated from the others by at least 5 ems of horizontal white space. ! 114: .TP ! 115: .BI -p n,m ! 116: Point sizes lie in the range [ ! 117: .I n, m ! 118: ]; other sizes are discarded. ! 119: The default is ! 120: .BR -p6,24 . ! 121: .TP ! 122: .B -t ! 123: Write ! 124: .IR troff (1) ! 125: format. ! 126: Each column is shown on a separate page, left- and top-justified. ! 127: Lines are placed at their original height in the column, ! 128: and each word starts at its original horizontal location in the line. ! 129: Characters are printed approximately original size in Times roman. ! 130: Hyphenated words are not recombined. ! 131: .TP ! 132: .B -u ! 133: Unspellable words are prefixed with `?' or, if ! 134: .B -t ! 135: is specified, printed boldface. ! 136: .TP ! 137: .BI -w w ! 138: Find the largest column of width ! 139: .I w ! 140: inches. ! 141: .SS Fonts ! 142: Times, Helvetica, Palatino, Constant Width, Printout, Baskerville, Memphis, ! 143: Caslon Old, Zapf, Optima, Futura, Euro, Spartan, Garamond, Breughel, Textype, ! 144: Bembo, Souvenir and similar fonts are recognized in roman, ! 145: italic, bold, condensed, and expanded styles. ! 146: Also Tibetan, on request. ! 147: .SH SEE ALSO ! 148: .IR bcp (1), ! 149: .IR cscan (1), ! 150: .IR font (6), ! 151: .IR picfile (5), ! 152: .IR spell (1), ! 153: .IR troff (1) ! 154: .SH BUGS ! 155: For best results, use images of high-contrast, cleanly-printed original ! 156: documents digitized at a resolution of 400 pixels/inch or higher. ! 157: It sometimes helps to restrict the alphabet and sizes to what's there. ! 158: Multiple-column finding is chancy; if it goes wrong, runtimes may be excessive. ! 159: .ig ! 160: 8.7 CPU minutes on pipe to read this page, September 1989. ! 161: ..
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.