|
|
1.1 root 1: This is minimal documentation for the lexer driver produced by ml-lex.
2:
3: Main data structures:
4:
5: The transition table is stored in tab. Tab is an array of records, indexed
6: by state number. The first field of the record, fin, is a list of final leaves
7: assocated with it. The second field of the record, trans, is a transition
8: table for the state indexed by character number. It gives the next state
9: for a given input character.
10:
11: The usual initial start state is state #1. State 0 is a dead state, which
12: has transitions only to itself.
13:
14: The field yyfin has type yyfinstate list. yyfinstate consists of the
15: following three constructors:
16:
17: * N of int - indicates normal end leaf.
18: * D of int - dummy end leaf - for indicating when an end state for
19: a trailing context regular expression has been reached. These are
20: stored and propagated backwards when action is executed.
21: * T of int - indicates an actual end leaf for a trailing context reg.
22: expression, which should be executed only if D i was encountered
23: after this end leaf while scanning forward. The dummy end leaf is
24: removed from the backward propagating list after this node is
25: encountered.
26:
27:
28: The function scan inside the function lex operates as a transition
29: function, scanning the input until it is no longer possible to take any
30: more transitions. It accumulates a list of the accepting leaf list
31: associated with each accepting state passed through.
32:
33: Scan operates as follows:
34:
35: Input: * s - current state
36: * AcceptingLeaves - list of accepting leave lists. Each state
37: has a list of accepting leaves associated with it. This list
38: may be nil if the state is not a final state.
39: * l - position of the next character in the buffer b to read
40: * i0 - starting position in the buffer.
41:
42: Output: If no match is found, it raises the exception LexError.
43: Otherwise, it returns a value of type lexresult.
44:
45: It operates as a transtion function:
46: It (1) adds the list of accepting leaves for the current state to
47: the list of accepting leave lists
48: (2) tries to make a transition on the current input character
49: to the next state. If it can't make a transition, it
50: executes the action function.
51: (a) - if it is past the end of the buffer, it
52: (1) checks if it as at end eof. If it is then:
53: It checks to see if it has made any
54: transitions since it was first called -
55: (l>i0 when this is true.) If it hasn't
56: this implies that scan was called at
57: the end of file. It thus executes
58: eof function declared by the user.
59: Otherwise it must execute action w/
60: the current accepting state list.
61: (2) otherwise it reads a block of up to 1024
62: characters, and appends this block to the
63: useful suffix of characters left in the
64: buffer (those character which have been
65: scanned in this call to lex()). The buffer
66: operation should be altered if one intends
67: to process reg. expressions whose lexemes'
68: length will be >> 1024. For most normal
69: applications, the buffer update operation
70: will be fine.
71:
72: This buffer update operation requires
73: O(n^2/1024) char. copies for lexemes > 1024
74: characters in length, and O(n) char. copies
75: for lexemes <= 1024 characters in length.
76: It can be made O(n) using linked list
77: buffers & a Byte.array of size n (not the
78: ^operator!) for concatenating the buffers
79: to return a value for yytext when a lexeme
80: is longer than the typical buffer length.
81:
82: (3) If the transition is to a dead state (0 is used
83: for the dead state), action is executed instead.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.