Annotation of researchv10no/cmd/sml/lib/lexgen/mlex_int.doc, revision 1.1

1.1     ! root        1: This is minimal documentation for the lexer driver produced by ml-lex.
        !             2: 
        !             3: Main data structures:
        !             4: 
        !             5:    The transition table is stored in tab.  Tab is an array of records, indexed
        !             6: by state number.  The first field of the record, fin, is a list of final leaves
        !             7: assocated with it.  The second field of the record, trans, is a transition
        !             8: table for the state indexed by character number.  It gives the next state
        !             9: for a given input character.
        !            10: 
        !            11:    The usual initial start state is state #1.  State 0 is a dead state, which
        !            12: has transitions only to itself.
        !            13: 
        !            14:    The field yyfin has type yyfinstate list.  yyfinstate consists of the
        !            15: following three constructors:
        !            16: 
        !            17:        * N of int - indicates normal end leaf.
        !            18:        * D of int - dummy end leaf - for indicating when an end state for
        !            19:          a trailing context regular expression has been reached.  These are
        !            20:          stored and propagated backwards when action is executed.
        !            21:        * T of int - indicates an actual end leaf for a trailing context reg.
        !            22:          expression, which should be executed only if D i was encountered
        !            23:          after this end leaf while scanning forward.  The dummy end leaf is
        !            24:          removed from the backward propagating list after this node is
        !            25:          encountered.
        !            26: 
        !            27:     
        !            28:      The function scan inside the function lex operates as a transition
        !            29: function, scanning the input until it is no longer possible to take any
        !            30: more transitions.  It accumulates a list of the accepting leaf list 
        !            31: associated with each accepting state passed through.
        !            32: 
        !            33:        Scan operates as follows:
        !            34: 
        !            35:         Input: * s - current state
        !            36:                * AcceptingLeaves - list of accepting leave lists.  Each state
        !            37:                  has a list of accepting leaves associated with it.  This list
        !            38:                  may be nil if the state is not a final state.
        !            39:                * l - position of the next character in the buffer b to read
        !            40:                * i0 - starting position in the buffer.
        !            41: 
        !            42:        Output: If no match is found, it raises the exception LexError. 
        !            43:                Otherwise, it returns a value of type lexresult.
        !            44: 
        !            45:        It operates as a transtion function:
        !            46:             It (1) adds the list of accepting leaves for the current state to
        !            47:                    the list of accepting leave lists
        !            48:                (2) tries to make a transition on the current input character
        !            49:                    to the next state.  If it can't make a transition, it 
        !            50:                    executes the action function.
        !            51:                        (a) - if it is past the end of the buffer, it 
        !            52:                                (1) checks if it as at end eof.  If it is then:
        !            53:                                        It checks to see if it has made any
        !            54:                                        transitions since it was first called -
        !            55:                                        (l>i0 when this is true.)  If it hasn't
        !            56:                                        this implies that scan was called at
        !            57:                                        the end of file.  It thus executes
        !            58:                                        eof function declared by the user.
        !            59:                                        Otherwise it must execute action w/
        !            60:                                        the current accepting state list.
        !            61:                                (2) otherwise it reads a block of up to 1024
        !            62:                                    characters, and appends this block to the
        !            63:                                    useful suffix of characters left in the
        !            64:                                    buffer (those character which have been
        !            65:                                    scanned in this call to lex()).  The buffer
        !            66:                                    operation should be altered if one intends
        !            67:                                    to process reg. expressions whose lexemes'
        !            68:                                    length will be >> 1024.  For most normal
        !            69:                                    applications, the buffer update operation
        !            70:                                    will be fine.
        !            71: 
        !            72:                                    This buffer update operation requires
        !            73:                                    O(n^2/1024) char. copies for lexemes > 1024
        !            74:                                    characters in length, and O(n) char. copies 
        !            75:                                    for lexemes <= 1024 characters in length.
        !            76:                                    It can be made O(n) using linked list
        !            77:                                    buffers & a Byte.array of size n (not the
        !            78:                                     ^operator!) for concatenating the buffers
        !            79:                                    to return a value for yytext when a lexeme
        !            80:                                    is longer than the typical buffer length.
        !            81: 
        !            82:                        (3) If the transition is to a dead state (0 is used
        !            83:                            for the dead state), action is executed instead.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.