Annotation of researchv10dc/man/adm/man1/ocr.1, revision 1.1

1.1     ! root        1: .TH OCR 1 coma,pipe,crab
        !             2: .CT 1 graphics
        !             3: .SH NAME
        !             4: ocr \- optical character recognition
        !             5: .SH SYNOPSIS
        !             6: .B ocr
        !             7: [
        !             8: .I option ...
        !             9: ]
        !            10: [
        !            11: .I file
        !            12: ]
        !            13: .SH DESCRIPTION
        !            14: .I Ocr
        !            15: reads a black-and-white image of a page from
        !            16: .IR file ,
        !            17: and writes ASCII to the standard output.
        !            18: If no
        !            19: .I file
        !            20: is specified, it reads from the standard input.
        !            21: .PP
        !            22: The input is a
        !            23: .IR picfile (5)
        !            24: image of one column of machine-printed text.
        !            25: Fonts, sizes, and line-spacings may vary within the column,
        !            26: but each line should have a constant size and baseline.
        !            27: Lines should be parallel and roughly horizontal.
        !            28: .PP
        !            29: In the output, white space approximates the original page layout.
        !            30: Words are checked and corrected by reference to the
        !            31: .IR spell (1)
        !            32: dictionary, and hyphenations across lines are recombined.
        !            33: .PP
        !            34: The options are:
        !            35: .nr xx \w'\fL-pn,m\ \ '
        !            36: .TP \n(xxu
        !            37: .BI -a s
        !            38: The alphabet is the union of symbol sets selected by characters in string
        !            39: .IR s ,
        !            40: from among:
        !            41: .RS
        !            42: .PD
        !            43: .nr yy \w'\fLA\ \ '
        !            44: .TP \n(yyu
        !            45: .B A
        !            46: ABCDEFGHIJKLMNOPQRSTUVWXYZ
        !            47: .PD0
        !            48: .TP
        !            49: .B a
        !            50: abcdefghijklmnopqrstuvwxyz
        !            51: .PD0
        !            52: .TP
        !            53: .B 0
        !            54: 0123456789
        !            55: .PD0
        !            56: .TP
        !            57: .B .
        !            58: \&.\^,\|-\^:\^;\|*\^'\|\^"\|?\^!\|/\|&\|$\^(\^)\^[\|\^]\|#\|@\|% \0\0\0\0\0\0\0 \kz(basic punctuation)
        !            59: .ig
        !            60: should include ` /(em + ???
        !            61: shouldn't include []#@% ???
        !            62: ..
        !            63: .PD0
        !            64: .TP
        !            65: .B ^
        !            66: ^\|\f(CW~\fR\^`\|\^\\\||\|\^{\|}\|_ \h'|\nzu'(extended punctuation)
        !            67: .ig
        !            68: should include []#@% ???
        !            69: shouldn't include ` ???
        !            70: ..
        !            71: .PD0
        !            72: .TP
        !            73: .B +
        !            74: +\^\-\^*\|/\|<\^>\^=\^.\^E\|e\|[\|] \h'|\nzu'(for numerical tables)
        !            75: .PD0
        !            76: .TP
        !            77: .B s
        !            78: .ie t \(sc\^\(dg\^\(dd\^\(ct\|\(bu\|\(rg\|\(co\|\(de\^\(fm\^\(en\|\^\(mi\|\(em \h'|\nzu'(selected non-ASCII)
        !            79: .el \\(sc\\(dg\\(dd\\(ct\\(bu\\(rg\\(co\\(de\\(fm\\(en\\(mi\\(em (selected non-ASCII)
        !            80: .PD0
        !            81: .TP
        !            82: .B l
        !            83: .ie t \(fi\|\(fl\|f\h'-.1m'f\|f\h'-.1m'\(fi\|f\h'-.1m'\(fl\|\N'114'\|\N'115'\|\N'105'\|\N'106' \h'|\nzu'(ligatures and digraphs)
        !            84: .el fi fl ff ffi ffl ae AE oe OE \h'|\nzu'(ligatures & digraphs)
        !            85: .PD
        !            86: .PP
        !            87: The default is
        !            88: .BR -aAa0.+^ ,
        !            89: the full printable-ASCII set, which may be abbreviated as
        !            90: .BR -ap .
        !            91: Thus,
        !            92: .B -apsl
        !            93: selects all of the above.
        !            94: .RE
        !            95: .PD
        !            96: .TP \n(xxu
        !            97: .BI -m l[,r]
        !            98: Trim the left and right margins of the image by
        !            99: .I l
        !           100: and
        !           101: .I r
        !           102: inches, respectively, before looking for columns.
        !           103: If
        !           104: .I r
        !           105: is omitted, it is assumed to equal
        !           106: .IR l.
        !           107: .TP
        !           108: .BI -n n
        !           109: Find the
        !           110: .I n
        !           111: largest columns.
        !           112: Each column should be compactly-printed
        !           113: and separated from the others by at least 5 ems of horizontal white space.
        !           114: .TP
        !           115: .BI -p n,m
        !           116: Point sizes lie in the range [
        !           117: .I n, m
        !           118: ]; other sizes are discarded.
        !           119: The default is
        !           120: .BR -p6,24 .
        !           121: .TP
        !           122: .B -t
        !           123: Write
        !           124: .IR troff (1)
        !           125: format.
        !           126: Each column is shown on a separate page, left- and top-justified.
        !           127: Lines are placed at their original height in the column,
        !           128: and each word starts at its original horizontal location in the line.
        !           129: Characters are printed approximately original size in Times roman.
        !           130: Hyphenated words are not recombined.
        !           131: .TP
        !           132: .B -u
        !           133: Unspellable words are prefixed with `?' or, if
        !           134: .B -t
        !           135: is specified, printed boldface.
        !           136: .TP
        !           137: .BI -w w
        !           138: Find the largest column of width
        !           139: .I w
        !           140: inches.
        !           141: .SS Fonts
        !           142: Times, Helvetica, Palatino, Constant Width, Printout, Baskerville, Memphis,
        !           143: Caslon Old, Zapf, Optima, Futura, Euro, Spartan, Garamond, Breughel, Textype,
        !           144: Bembo, Souvenir and similar fonts are recognized in roman,
        !           145: italic, bold, condensed, and expanded styles.
        !           146: Also Tibetan, on request.
        !           147: .SH SEE ALSO
        !           148: .IR bcp (1),
        !           149: .IR cscan (1),
        !           150: .IR font (6),
        !           151: .IR picfile (5),
        !           152: .IR spell (1),
        !           153: .IR troff (1)
        !           154: .SH BUGS
        !           155: For best results, use images of high-contrast, cleanly-printed original
        !           156: documents digitized at a resolution of 400 pixels/inch or higher.
        !           157: It sometimes helps to restrict the alphabet and sizes to what's there.
        !           158: Multiple-column finding is chancy; if it goes wrong, runtimes may be excessive.
        !           159: .ig
        !           160: 8.7 CPU minutes on pipe to read this page, September 1989.
        !           161: ..

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.