Annotation of researchv10dc/cmd/sml/lib/lexgen/mlex_int.doc, revision 1.1.1.1

1.1       root        1: This is minimal documentation for the lexer driver produced by ml-lex.
                      2: 
                      3: Main data structures:
                      4: 
                      5:    The transition table is stored in tab.  Tab is an array of records, indexed
                      6: by state number.  The first field of the record, fin, is a list of final leaves
                      7: assocated with it.  The second field of the record, trans, is a transition
                      8: table for the state indexed by character number.  It gives the next state
                      9: for a given input character.
                     10: 
                     11:    The usual initial start state is state #1.  State 0 is a dead state, which
                     12: has transitions only to itself.
                     13: 
                     14:    The field yyfin has type yyfinstate list.  yyfinstate consists of the
                     15: following three constructors:
                     16: 
                     17:        * N of int - indicates normal end leaf.
                     18:        * D of int - dummy end leaf - for indicating when an end state for
                     19:          a trailing context regular expression has been reached.  These are
                     20:          stored and propagated backwards when action is executed.
                     21:        * T of int - indicates an actual end leaf for a trailing context reg.
                     22:          expression, which should be executed only if D i was encountered
                     23:          after this end leaf while scanning forward.  The dummy end leaf is
                     24:          removed from the backward propagating list after this node is
                     25:          encountered.
                     26: 
                     27:     
                     28:      The function scan inside the function lex operates as a transition
                     29: function, scanning the input until it is no longer possible to take any
                     30: more transitions.  It accumulates a list of the accepting leaf list 
                     31: associated with each accepting state passed through.
                     32: 
                     33:        Scan operates as follows:
                     34: 
                     35:         Input: * s - current state
                     36:                * AcceptingLeaves - list of accepting leave lists.  Each state
                     37:                  has a list of accepting leaves associated with it.  This list
                     38:                  may be nil if the state is not a final state.
                     39:                * l - position of the next character in the buffer b to read
                     40:                * i0 - starting position in the buffer.
                     41: 
                     42:        Output: If no match is found, it raises the exception LexError. 
                     43:                Otherwise, it returns a value of type lexresult.
                     44: 
                     45:        It operates as a transtion function:
                     46:             It (1) adds the list of accepting leaves for the current state to
                     47:                    the list of accepting leave lists
                     48:                (2) tries to make a transition on the current input character
                     49:                    to the next state.  If it can't make a transition, it 
                     50:                    executes the action function.
                     51:                        (a) - if it is past the end of the buffer, it 
                     52:                                (1) checks if it as at end eof.  If it is then:
                     53:                                        It checks to see if it has made any
                     54:                                        transitions since it was first called -
                     55:                                        (l>i0 when this is true.)  If it hasn't
                     56:                                        this implies that scan was called at
                     57:                                        the end of file.  It thus executes
                     58:                                        eof function declared by the user.
                     59:                                        Otherwise it must execute action w/
                     60:                                        the current accepting state list.
                     61:                                (2) otherwise it reads a block of up to 1024
                     62:                                    characters, and appends this block to the
                     63:                                    useful suffix of characters left in the
                     64:                                    buffer (those character which have been
                     65:                                    scanned in this call to lex()).  The buffer
                     66:                                    operation should be altered if one intends
                     67:                                    to process reg. expressions whose lexemes'
                     68:                                    length will be >> 1024.  For most normal
                     69:                                    applications, the buffer update operation
                     70:                                    will be fine.
                     71: 
                     72:                                    This buffer update operation requires
                     73:                                    O(n^2/1024) char. copies for lexemes > 1024
                     74:                                    characters in length, and O(n) char. copies 
                     75:                                    for lexemes <= 1024 characters in length.
                     76:                                    It can be made O(n) using linked list
                     77:                                    buffers & a Byte.array of size n (not the
                     78:                                     ^operator!) for concatenating the buffers
                     79:                                    to return a value for yytext when a lexeme
                     80:                                    is longer than the typical buffer length.
                     81: 
                     82:                        (3) If the transition is to a dead state (0 is used
                     83:                            for the dead state), action is executed instead.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.