Annotation of researchv10no/cmd/sml/doc/papers/modimp/paper.ms, revision 1.1.1.1

1.1       root        1: .RP
                      2: .ND March 11, 1988
                      3: .TL
                      4: An Implementation of Standard ML Modules
                      5: .AU 
                      6: David MacQueen
                      7: .AI
                      8: AT&T Bell Laboratories
                      9: Murray Hill, NJ 07974
                     10: .AB
                     11: Standard ML includes a set of module constructs that support
                     12: programming in the large.  These constructs extend ML's basic
                     13: polymorphic type system by introducing the dependent types of Martin
                     14: L\o'o"'f's Intuitionistic Type Theory.  This paper discusses the
                     15: problems involved in implementing Standard ML's modules and describes
                     16: a practical, efficient solution to these problems.  The
                     17: representations and algorithms of this implementation were inspired by
                     18: a detailed formal semantics of Standard ML developed by Milner, Tofte,
                     19: and Harper.  The implementation is part of a new Standard ML compiler
                     20: that is written in Standard ML using the module system.
                     21: .AE
                     22: .EQ
                     23: delim %%
                     24: .EN
                     25: .de M1         \" start of a program display
                     26: .DS
                     27: .ft CW
                     28: ..
                     29: .de M2
                     30: .ft R
                     31: .DE
                     32: ..
                     33: .de M3
                     34: .if "\\$2"" \\&\f(CW\\$1\f1
                     35: .if !"\\$2"" \\&\f(CW\\$1\f1\\&\\$2
                     36: ..
                     37: .fp 5 CW
                     38: .NH
                     39: Introduction
                     40: .PP
                     41: An important part of the revision of ML that led to
                     42: the Standard ML language was the inclusion of module
                     43: facilities for the support of ``programming in the large.''
                     44: The design of these facilities went through several
                     45: versions [8] and was supported by concurrent investigations
                     46: of the type theory of ML and related systems [9,11].
                     47: The central idea behind the design was to support modularity
                     48: by introducing a stratified system of dependent types as suggested
                     49: by Martin L\o'o"'f's Intuitionistic Type Theory [10].
                     50: In late 1985 Bob Harper added a prototype implementation of
                     51: most of the module facilities to the Edinburgh ML compiler,
                     52: which was serving as a test-bed for the evolving Standard ML design.
                     53: .PP
                     54: Starting in the spring of 1986, Andrew Appel, Trevor Jim, and I have
                     55: implemented a new Standard ML compiler, written in Standard ML,
                     56: and initially bootstrapped from the Edinburgh compiler.  An overview of
                     57: this new compiler, known as Standard ML of New Jersey, is given in [1].
                     58: The implementation of modules in this new compiler went through
                     59: two generations.
                     60: A first version was done in the fall of 1986, but it was completely
                     61: rewritten in the summer of 1987 following discussions of the
                     62: operational static semantics of modules with Robin Milner, Mads Tofte,
                     63: and Bob Harper [6,7,12].
                     64: Like Harper's prototype implementation, the new modules implementation
                     65: was inspired by the static semantics, but it uses a \fIstructure sharing\fP
                     66: strategy [3,13]
                     67: to avoid serious performance problems associated with a naive
                     68: implementation of the static semantics.
                     69: Although precise comparative measurements are not yet available, our
                     70: experience shows that the symbol table size for a large ML program
                     71: such as the ML compiler is several times smaller with the
                     72: new compiler than with the old compiler.
                     73: .PP
                     74: The objective of this paper is to describe our implementation of the
                     75: Standard ML module facilities, with particular emphasis on the
                     76: techniques used to minimize the space consumed by static representations
                     77: of modules (\fIi.e.\fP symbol table structures).
                     78: We begin by reviewing the elements of the module language.
                     79: .NH
                     80: Summary of the module constructs
                     81: .PP
                     82: Before describing the basic issues concerning implementation
                     83: of Standard ML modules, we need to review the main elements
                     84: of the module language.
                     85: There are three principal notions:
                     86: .in .3i
                     87: .IP (1)
                     88: \f2signature\f1 \- interface specification or ``type'' of modules.
                     89: .IP (2)
                     90: \f2structure\f1 \- an environment; a collection of type,
                     91: structure, value, and exception bindings; corresponds to
                     92: the conventional idea of a module.
                     93: .IP (3)
                     94: \f2functor\f1 \- function mapping structures to structures;
                     95: a form of parametric module.
                     96: .in  -0.3i
                     97: .LP
                     98: Figure 1 below contains simple examples of each of these constructs.
                     99: .M1
                    100: signature ORD =
                    101:   sig
                    102:     type t
                    103:     val le : t*t -> bool
                    104:   end
                    105: 
                    106: structure S =
                    107:   struct
                    108:     datatype t = A | B of t
                    109:     val x = A
                    110:     fun le(A,_) = true
                    111:       | le(_,A) = false
                    112:       | le(B x, B y) = le(x,y)
                    113:   end
                    114: 
                    115: signature LEXORD =
                    116:   sig
                    117:     structure A : ORD
                    118:     val lexord : A.t list * A.t list -> bool
                    119:   end
                    120: 
                    121: functor LexOrd(O: ORD) : LEXORD =
                    122:   struct
                    123:     structure A = O
                    124:     fun lexord([],_) = true
                    125:       | lexord(_,[]) = false
                    126:       | lexord(x::l,y::m) = ... O.le(x,y) ...
                    127:   end
                    128: 
                    129: structure LS = LexOrd(S)
                    130: .M2
                    131: .tl ''\f2Figure 1\f1''
                    132: .PP
                    133: This example contains declarations of two signatures,
                    134: .M3 ORD
                    135: and
                    136: .M3 LEXORD ,
                    137: two structures,
                    138: .M3 S
                    139: and
                    140: .M3 LS ,
                    141: and one functor,
                    142: \&\f5LexOrd\fP, mapping a structure of signature \&\f5ORD\fP
                    143: to a new structure of signature \&\f5LEXORD\fP.
                    144: The structure
                    145: .M3 LS
                    146: is defined as the result of applying
                    147: .M3 LexOrd
                    148: to
                    149: .M3 S .
                    150: We refer to components of a structure using qualified names or
                    151: paths formed with the usual ``dot'' notation: \fIe.g.\fP \&\f5S.t\fP,
                    152: \&\f5S.x\fP, \&\f5LS.A.le\fP. 
                    153: .PP
                    154: A signature can be regarded as a form of ``type''
                    155: for structures, or as a schematic representation
                    156: of a class of structures, and a
                    157: structure \f2matches\f1 a signature if it satisfies the specifications
                    158: given in the signature.  A structure does not have to agree exactly with
                    159: a signature in order for it to match the signature;
                    160: in this example the structure
                    161: .M3 S
                    162: matches the signature
                    163: .M3 ORD ,
                    164: even though
                    165: .M3 S
                    166: has an additional value component
                    167: .M3 x
                    168: not specified in
                    169: .M3 ORD .
                    170: In such cases signature matching has a coercive
                    171: effect, producing a ``thinned'' structure that exactly
                    172: agrees with the signature in terms of number of
                    173: components and their types.
                    174: .PP
                    175: Signature matching is performed in two contexts:
                    176: (1) when a signature constraint is given in a structure
                    177: declaration, as in:
                    178: .M1
                    179: structure R : ORD = S
                    180: .M2
                    181: and (2) when a functor is applied to an argument structure,
                    182: which must match the signature specified for the formal
                    183: parameter, as in
                    184: .M1
                    185: structure LS = LexOrd(S)
                    186: .M2
                    187: where
                    188: .M3 S
                    189: must match
                    190: .M3 ORD .
                    191: Actually, these two contexts are closely related under Landin's
                    192: principle of correspondence.  In the first case, \&\f5R\fP is bound
                    193: to a thinned version of \&\f5S\fP that does not contain an \&\f5x\fP
                    194: component, and in the second case, the formal parameter \&\f5O\fP, and hence
                    195: the substructure \&\f5LS.A\fP, is also bound to a thinned version of \&\f5S\fP.
                    196: .PP
                    197: Another kind of specification that may appear in signatures
                    198: is the \f2sharing constraint\f1, the purpose of which is to
                    199: insure a kind of type coherence among functor parameters.
                    200: The program sketch in Figure 2 illustrates the use of sharing constraints.
                    201: .M1
                    202: signature SYMBOL = sig type symbol ... end
                    203: 
                    204: signature LEX =
                    205:   sig
                    206:     structure Symbol : SYMBOL
                    207:     val next : unit -> Symbol.symbol
                    208:     ...
                    209:   end
                    210: 
                    211: signature SYMBOLTABLE =
                    212:   sig
                    213:     structure Symbol : SYMBOL
                    214:     type var
                    215:     val bind : Symbol.symbol * var -> unit
                    216:     ...
                    217:   end
                    218: 
                    219: signature PARSE_ARGS =
                    220:   sig
                    221:     structure Lex : LEX
                    222:     structure SymTab : SYMBOLTABLE
                    223:     sharing Lex.Symbol = SymTab.Symbol
                    224:   end
                    225: 
                    226: functor Parse(A: PARSE_ARGS) =
                    227:   struct ... A.SymTab.bind(A.Lex.next(), v) ... end
                    228: .M2
                    229: .tl ''\f2Figure 2\f1''
                    230: .PP
                    231: The functor
                    232: .M3 Parse
                    233: essentially takes two structure arguments,
                    234: .M3 Lex
                    235: (implementing a lexical analyzer) and
                    236: .M3 SymTab
                    237: (implementing a symbol table), which are bundled as components
                    238: of a single parameter structure.
                    239: The sharing specification in
                    240: .M3 PARSE_ARGS
                    241: requires that the same
                    242: .M3 Symbol
                    243: structure be used in both
                    244: .M3 Lex
                    245: and
                    246: .M3 SymTab .
                    247: This insures that
                    248: .M3 Lex
                    249: and
                    250: .M3 SymTab
                    251: can consistently interact, as in the expression
                    252: .M3 "A.Symtab.bind(A.Lex.next(),v)" ,
                    253: which is well-typed only if
                    254: .M3 A.Lex.Symbol.symbol
                    255: and
                    256: .M3 A.Symtab.Symbol.symbol
                    257: are the same type.
                    258: .PP
                    259: An important point about datatype and structure declarations
                    260: is that they are \f2generative\f1, meaning
                    261: that each time they are elaborated (\fIe.g.\fP in a functor body as a result
                    262: of functor applications) a new, distinct structure or type is created.
                    263: For example, in
                    264: .M1
                    265: functor F () =
                    266:   struct
                    267:     datatype t = A | B of t
                    268:   end
                    269: 
                    270: structure S1 = F()
                    271: structure S2 = F()
                    272: .M2
                    273: .M3 S1
                    274: and
                    275: .M3 S2
                    276: are distinct structures and 
                    277: .M3 S1.t
                    278: and
                    279: .M3 S2.t
                    280: are distinct types, so 
                    281: .M3 S1.B(S2.A)
                    282: is an ill-typed expression.
                    283: .PP
                    284: On the other hand, simple type definitions (whether occurring inside or
                    285: outside of structures) are \f2transparent\f1 rather than generative.
                    286: For instance, in 
                    287: .M1
                    288: structure IntOrd =
                    289:   struct
                    290:     type t = int
                    291:     fun le(x,y) = x <= y
                    292:   end
                    293: .M2
                    294: the type
                    295: .M3 S.t
                    296: is identical to \&\f5int\fP.
                    297: In other words, there is no information hiding or abstraction
                    298: inherent in the formation of structures.
                    299: This applies even to the results of functor applications; type
                    300: information is propagated through functor applications, so that after
                    301: the declaration
                    302: .M1
                    303: structure IntLexOrd = LexOrd (IntOrd)
                    304: .M2
                    305: .M3 IntLexOrd.le
                    306: has type
                    307: .M3 "int list * int list -> bool" .
                    308: This reflects the dependent product nature of functor signatures, and
                    309: the fact that structures represent a form of \fIstrong\fP dependent sum
                    310: (see [9,11] for discussion of the relation between ML modules and dependent
                    311: types).
                    312: .NH
                    313: Implementation of modules
                    314: .PP
                    315: The principal tasks that an implementation must deal with are as follows:
                    316: .IP (1)
                    317: representation of signatures, structures, and functors.
                    318: .IP (2)
                    319: signature matching, including instantiation of the signature
                    320: template and possible thinning of the matched structure.
                    321: .IP (3)
                    322: functor application, including 
                    323: .RS
                    324: .IP (a)
                    325: matching formal signature to actual parameter, with possible thinning
                    326: of the parameter.
                    327: .IP (b)
                    328: creation of the result structure, including propagation of type information from 
                    329: parameter to result and generation of new instances of datatypes and
                    330: structures.
                    331: .RE
                    332: .IP (4)
                    333: representation and checking of sharing constraints.
                    334: .PP
                    335: Most of these tasks have two parts, the \f2static\f1 or compile-time
                    336: task and the \f2dynamic\f1 or run-time task.
                    337: The run-time problems are straightforward and are discussed in the
                    338: next subsection.
                    339: Our main focus will be on the static aspects of the module language,
                    340: for which our principal implementation goals are:
                    341: .IP (1)
                    342: compact representation of structures having a given signature
                    343: .IP (2)
                    344: efficient signature matching and functor application, with minimal 
                    345: duplication of static (\fIi.e.\fP symbol table) information
                    346: .IP (3)
                    347: efficient representation and checking of sharing constraints.
                    348: .NH 2
                    349: Dynamic representations and processes.
                    350: .PP
                    351: The run-time representations of modules are remarkably simple [1].
                    352: Signatures and types have no run-time representation \(em they exist
                    353: only at the static level.  A structure is represented as a record
                    354: whose components represent the dynamic
                    355: structure components (\fIi.e.\fP substructures, values, and exceptions) in a
                    356: canonical order.  A functor is represented as an ordinary function
                    357: closure, and functor application corresponds to the normal application
                    358: of this function to a record representing the argument structure.  The
                    359: thinning coercions associated with signature matching give rise to
                    360: in-line code to construct the thinned record.
                    361: .PP
                    362: In the middle-end of the compiler, all module constructs are reduced
                    363: to the same simple lambda-calculus based intermediate language that is
                    364: used for the core ML constructs of value declarations and expressions.
                    365: In effect, the back-end of the compiler is unaware of the existence of
                    366: the module constructs \(em they have been reduced to common notions of
                    367: records and functions.
                    368: .NH 2
                    369: Static Representations
                    370: .PP
                    371: A naive representation of signatures and structures can be modeled
                    372: more or less directly on the semantic constructs used in the operational static
                    373: semantics [5,6].
                    374: There a structure is modeled by an environment %E%
                    375: that maps component identifiers to the appropriate sort of
                    376: static binding (type, structure, variable, etc.), and a 
                    377: \fIstamp\fP,* %n%, that uniquely identifies the structure:
                    378: %str^=^(n,E)%.
                    379: .FS
                    380: *We prefer the term ``stamp'' for this purpose, rather than the term
                    381: ``name'' used in [6,7],
                    382: since ``name'' could also refer to the identifier to which a structure
                    383: is bound or an identifier bound within a structure.
                    384: .FE
                    385: We can view a structure as a tree or dag with nodes labeled by stamps
                    386: and edges labeled by component names.
                    387: A signature is then a structure together with a designated set of
                    388: \fIbound\fP or \fIschematic\fP stamps occurring within the structure:
                    389: %sig^=^(N)(n,E)%.
                    390: .PP
                    391: We illustrate this with the definitions in Figure 3
                    392: and the corresponding graphs in Figure 4 (adapted from [6]), in which
                    393: \fI(a)\fP represents the structure \&\f5C\fP and \fI(b)\fP represents
                    394: the signature \&\f5SIGC\fP.
                    395: Our convention for distinguishing between constant and bound stamps
                    396: is that metavariables %k sub i%
                    397: range over constant stamps, while metavariables %x sub i% range over 
                    398: bound stamps in a signature.  This is a more concise alternative to the
                    399: separate specification of the graph and the set of bound stamps
                    400: We emphasize the distinction by using solid
                    401: circles for nodes with constant stamps and open circles for nodes with 
                    402: bound stamps.
                    403: A structure will always contain only constant
                    404: stamps, while a signature will typically contain only bound stamps.  The
                    405: graphs are simplified by showing only structure components, but type
                    406: components are dealt with similarly.
                    407: 
                    408: .DS
                    409: .ft 5
                    410: .ta 3i
                    411: structure A =  signature SIGA =
                    412:   struct         sig
                    413:     type t = int           type t
                    414:     fun f n = 2 * n        val f : t -> t
                    415:   end    end
                    416: 
                    417: structure B =  signature SIGB  =
                    418:   struct         sig
                    419:     structure BA = A       structure BA: SIGA
                    420:     fun g x = BA.f(x) + 1          val g: BA.t -> BA.t
                    421:   end    end
                    422: 
                    423: structure C =  signature SIGC =
                    424:   struct         sig
                    425:     structure CA = A       structure CA : SIGA
                    426:     structure CB = B       structure CB : SIGB
                    427:     fun h x = CB.g(CA.f x)         val h: CA.t  -> CA.t
                    428:   end    end
                    429: .ft R
                    430: .DE
                    431: .tl ''\f2Figure 3\f1''
                    432: .PS
                    433: circlerad = 0.05i
                    434: define str '
                    435: {C1: circle
                    436:  move to C1
                    437:  move down right
                    438:  C2: circle
                    439:  move to C2
                    440:  move down left
                    441:  C3: circle
                    442:  A1: arrow from C1.c to C2.c chop
                    443:  A2: arrow from C2.c to C3.c chop
                    444:  A3: arrow from C1.c to C3.c chop
                    445:  "%k sub 1% " at C1.w rjust
                    446:  " %k sub 3%" at C2.e ljust
                    447:  "%k sub 2% " at C3.w rjust
                    448:  " CB" at A1.center ljust
                    449:  " BA" at A2.center ljust
                    450:  "CA " at A3.center rjust
                    451:  move to C1
                    452:  move down 1.3i
                    453:  "\f5structure C\fR" center
                    454:  }
                    455: '
                    456: define sig '
                    457: {C1: circle
                    458:  move to C1
                    459:  move down left
                    460:  C2: circle
                    461:  move to C1
                    462:  move down right
                    463:  C3: circle
                    464:  move to C3
                    465:  move down
                    466:  C4: circle
                    467:  A1: arrow from C1.c to C2.c chop
                    468:  A2: arrow from C1.c to C3.c chop
                    469:  A3: arrow from C3.c to C4.c chop
                    470:  " %x sub 1%" at C1.e ljust
                    471:  "%x sub 2% " at C2.w rjust
                    472:  " %x sub 3%" at C3.e ljust
                    473:  " %x sub 4%" at C4.e ljust
                    474:  "CA " at A1.center rjust
                    475:  " CB" at A2.center ljust
                    476:  " BA" at A3.center ljust
                    477:  move to C1
                    478:  move down 1.3i
                    479:  "\f5signature SIGC\fR" center
                    480: }
                    481: '
                    482: H: str; move to H; move right 2.25i; sig
                    483: .PE
                    484: .tl ''\f2Figure 4\f1''
                    485: 
                    486: .PP
                    487: The purpose of a signature matching \&\f5S: SIG\fP is to produce a
                    488: structure \&\f5S'\fP that has exactly the form specified by
                    489: \&\f5SIG\fP and yet shares the identity (\fIi.e.\fP the stamps) of
                    490: \&\f5S\fP.  In some cases, \&\f5S\fP and \&\f5S'\fP are identical, as
                    491: when \&\f5S\fP had already matched the signature \&\f5SIG\fP.  In
                    492: other cases \&\f5S'\fP is a thinned version of \&\f5S\fP having fewer
                    493: components or components whose types are generic instances of their
                    494: types in \&\f5S\fP.  Another product of matching is the realization
                    495: map, whose use in functor applications is explained in Section 3.5.
                    496: .PP
                    497: We can think of \&\f5SIG\fP as a scheme analogous to a generic type scheme or
                    498: polytype in the core ML type system [4], with the bound stamps
                    499: playing the role of generic type variables in a type scheme.  The
                    500: product of matching is then an instance of this scheme under the
                    501: substitution represented by the realization map.  The details of this
                    502: analogy have been worked out by Mads Tofte, including a version of the
                    503: principal typing theorem of [4].
                    504: .PP
                    505: A naive implementation of matching would make a copy of the signature
                    506: \&\f5SIG\fP, in the process replacing each bound stamp by the corresponding
                    507: constant stamp from \&\f5S\fP.  This would involve copying most of the
                    508: environment part of the signature, since we have to instantiate the
                    509: type specifications of values and exceptions as well as instantiating
                    510: the types and substructures themselves.  However, the environment or
                    511: symbol table part of the signature can be regarded as a template
                    512: relative to its type and substructure components, which are the only
                    513: parts that need to change during signature matching.  For instance, a
                    514: type specification like \&\f5f: t->t\fP can remain fixed if it is interpreted
                    515: relative to the type component \&\f5t\fP.  We can abstract out the type and
                    516: structure components carrying the bound stamps and use the rest of the
                    517: information in a signature as an unchanging template that can be
                    518: shared by all instances of the signature.  This is the familiar
                    519: \fIstructure sharing\fP idea first proposed by Boyer and Moore in the
                    520: context of resolution theorem proving [3] and later exploited in the
                    521: implementation of Prolog [13].  The use of structure sharing in
                    522: the basic ML type system has been considered, but in that context it
                    523: does not appear to have a clear advantage over the simpler approach of
                    524: instantiation by copying.  In the case of signature matching, however,
                    525: the shared information in the template is typically of considerable
                    526: volume, so structure sharing is quite effective in saving space
                    527: relative to copying.
                    528: .PP
                    529: The definitions of the basic datatypes used to represent type
                    530: constructors, structures and signatures are given in Figure 5.  The
                    531: representations of structures and signatures both use the \&\f5Structure\fP
                    532: datatype, and differ only in the value of the \&\f5kind\fP field.  The \&\f5stamp\fP
                    533: field contains the identifying stamp, the \&\f5table\fP is the environment
                    534: component represented as a hash table mapping symbols to the various
                    535: sorts of bindings, and the \&\f5env\fP field contains a pair of instance
                    536: vectors for type and structure components.  The \&\f5sign\fP field in
                    537: a signature to identify the signature (the \&\f5stamp\fP field
                    538: will have a formal value representing a bound stamp); in
                    539: a structure it identifies the signature the structure is an instantiation of,
                    540: if any.  Bound and constant stamps are both represented as integers;
                    541: stamps greater than some base value are constant stamps, while stamps
                    542: less than that value are bound.  Within a given signature, bound
                    543: stamps are canonically numbered starting from 0.
                    544: .M1
                    545: datatype tycon
                    546:   = TYCON of {stamp : stamp, ...}
                    547:   | INDtyc of int list
                    548: 
                    549: datatype Structure
                    550:   = STRstr of
                    551:       {stamp : stamp,
                    552:        sign  : stamp option,
                    553:        table : symtable,
                    554:        env   : strenv,
                    555:        kind  : strkind}
                    556:   | INDstr of int
                    557: 
                    558: and strkind
                    559:   = STRkind
                    560:   | SIGkind of
                    561:       {share : sharespec,
                    562:        bindings : binding list,
                    563:        stampcounts : {s : int, t : int}}
                    564: .M2
                    565: .tl ''\f2Figure 5\f1''
                    566: 
                    567: .PP
                    568: The \&\f5INDtyc\fP and \&\f5INDstr\fP forms of type constructors and structures are
                    569: used within the symbol table to refer indirectly to components stored
                    570: in the instance vectors.  The term \&\f5INDtyc[%i%]\fR refers to the ith
                    571: element of the type instance vector, while \&\f5INDtyc[%i%,%j%]\fR refers to the
                    572: jth element of the type vector of the ith element of the structure
                    573: vector.  The type specifications
                    574: .M1
                    575: f: t -> t
                    576: h: CA.t -> CA.t
                    577: .M2
                    578: from Figure 3 are represented internally as
                    579: .M1
                    580: f: INDtyc[0] -> INDtyc[0]
                    581: h: INDtyc[0,0] -> INDtyc[0,0]
                    582: .M2
                    583: The representation of the entire signature SIGC from Figure 3 can be
                    584: summarized as follows:
                    585: .M1
                    586: SIGC:
                    587:   stamp:  0
                    588:   table:  CA => INDstr 0
                    589:           CB => INDstr 1
                    590:           h  => VAR: INDtyc[0,0] -> INDtyc[0,0]
                    591:   strenv: structures = <SIGA',SIGB'>
                    592:           types      = <>
                    593: 
                    594: SIGA':
                    595:   stamp:  1
                    596:   table:  t  => INDtyc[0]
                    597:           f  => VAR: INDtyc[0] -> INDtyc[0]
                    598:   strenv: structures = <>
                    599:           types      = <DUMMY 0>
                    600: 
                    601: .M2
                    602: .M1
                    603: SIGB':
                    604:   stamp:  2
                    605:   table:  BA => INDstr 0
                    606:           g  => VAR: BA.t -> BA.t
                    607:   strenv: structures = <SIGA''>
                    608:           types      = <>
                    609: 
                    610: SIGA'':
                    611:   stamp:  3
                    612:   table:  t  => INDtyc[0]
                    613:           f  => VAR: INDtyc[0] -> INDtyc[0]
                    614:   strenv: structures = <>
                    615:           types      = <DUMMY 1>
                    616: .M2
                    617: .LP
                    618: Note that there are two copies of the signature \&\f5SIGA\fP, identified
                    619: as \&\f5SIGA'\fP and \&\f5SIGA''\fP, each with its own stamp.  This 
                    620: duplication is required to get the canonical numbering of stamps for
                    621: each component of \&\f5SIGC\fP, but each of these copies shares the
                    622: original symbol table component from \&\f5SIGA\fP.  \&\f5DUMMY 0\fP
                    623: and \&\f5DUMMY 1\fP are dummy type constructor components, which have
                    624: their own separate numbering within the context of \&\f5SIGC\fP.
                    625: .NH 2
                    626: Signature Matching
                    627: .PP
                    628: We now describe the process of signature matching in terms of the
                    629: representation described above.  Given a signature \&\f5sig\fP and
                    630: structure \&\f5str\fP represented as
                    631: .M1
                    632: sig = STRstr{stamp = x, sign = n, table = sigtab, env = sigenv,
                    633:              kind = SIGkind{bindings,stampcounts,sharing}}
                    634: 
                    635: str = STRstr{stamp = k, sign = s, table = strtab, env = strenv,
                    636:              kind=STRkind}
                    637: .M2
                    638: we first check whether \&\f5s=n\fP , and if so
                    639: return \&\f5str\fP, because \&\f5str\fP is already an instance of
                    640: \&\f5sig\fP.  Otherwise we attempt to construct a new instance of
                    641: \&\f5sig\fP.  We start by allocating a new pair of instance vectors,
                    642: \&\f5newenv={s=sNew,t=tNew}\fP, based on the size information in the
                    643: \&\f5stampcounts\fP field.  Then we iterate through the list of all
                    644: of \&\f5sig\fP's bindings (\fIi.e.\fP specifications), which is
                    645: available in the field \&\f5bindings\fP.  For each structure binding
                    646: \&\f5(id,INDstr i)\fP in \&\fIsig\fP, we look up a structure named
                    647: \&\fIid\fP in \&\f5strtab\fP.  If it does not exist, matching fails.
                    648: If it does exist, we recursively match it against the substructure
                    649: signature bound to \&\f5id\fP in \&\f5sig\fP (obtained as the ith
                    650: element of the structure vector in \&\f5sigenv\fP), and if successful
                    651: use the result to define the ith element of \&\f5sNew\fP.  Similarly
                    652: for type bindings, where we check that the type constructor bound in
                    653: \&\f5str\fP agrees with the specification in sig (\fIe.g.\fP they must have
                    654: the same arity).  For value specifications like \&\f5x: ty\fP, we
                    655: interpret indirect type constructors in \&\f5ty\fP with respect to
                    656: \&\f5newenv\fP and check that we have a generic instance of the type
                    657: of the corresponding component of \&\f5str\fP.  Checking value
                    658: components has no effect on the instance vectors, but if necessary
                    659: (\fIi.e.\fP if \&\f5sig\fP has fewer value components than
                    660: \&\f5str\fP) we calculate the translation between the old and new
                    661: runtime positions of the components and collect this information in a
                    662: thinning specification to be applied at runtime.
                    663: .PP
                    664: When we have successfully matched all the bindings in \&\f5sig\fP we build the
                    665: result structure
                    666: .M3 str\'
                    667: using
                    668: .M3 sig \'s
                    669: table and
                    670: .M3 sign ,
                    671: .M3 str \'s
                    672: stamp ,
                    673: and
                    674: .M3 newenv
                    675: .M1
                    676: str' = STRstr {stamp=n, sign=SOME k, table=sigtab,
                    677:                env=newenv, kind=STRkind}
                    678: .M2
                    679: Finally any sharing constraints (from sharing) are checked as
                    680: described in Section 3.5, and we return \&\f5str'\fP and a thinning
                    681: specification as the result of the match.  Note that
                    682: the bulk of the information in the signature is in
                    683: .M3 sigtab ,
                    684: and this is directly shared with the instantiation
                    685: .M3 str\' .
                    686: .PP
                    687: As a shortcut, when elaborating a declaration like
                    688: .M1
                    689: structure S: SIG = struct \fIdeclarations\fP end
                    690: .M2
                    691: we do not build the structure on the right-hand side before doing
                    692: the signature match.  Instead we elaborate the body declarations in
                    693: the top-level environment and then do the signature matching using
                    694: the top-level environment in the place of the target structure.
                    695: .NH 2
                    696: Functors and functor application
                    697: .PP
                    698: In the static semantics a functor %F% is modeled by a pair of
                    699: structures representing the parameter and body of the functor, and
                    700: two sets of bound stamps.
                    701: .EQ I
                    702: F ~ = ~ ( N ) ( S sub p , ^ ( N ' ) S sub b )
                    703: .EN
                    704: %N% is the set of bound stamps in the parameter structure, which may
                    705: also occur in the body %S sub b%, while %N '%
                    706: is the set of stamps associated with generative elements of the
                    707: functor body.  To apply %F% to an argument structure %A% we perform
                    708: the following steps
                    709: .EQ I
                    710: r ~ = ~ match ( ( N ) S sub p , ^ A )
                    711: .EN
                    712: .EQ I
                    713: g ~ = ~ generate ( N ' )
                    714: .EN
                    715: .EQ I
                    716: F ( A ) ~ = ~ g ( r ( S sub b ) )
                    717: .EN
                    718: That is, we match the parameter signature and the argument to produce
                    719: a realization map %r%, then we generate a realization map %g% that maps
                    720: each bound stamp in %N '% to a unique, new constant stamp, and finally
                    721: we produce the result structure by using %r% and %g% to instantiate
                    722: the body structure.
                    723: .PP
                    724: The implementation of functors follows this scheme closely.  The
                    725: datatype used to represent functors is defined by
                    726: .M1
                    727: datatype Functor = FUNCTOR of
                    728:                      {param : Structure,
                    729:                       body  : Structure,
                    730:                       tycCount : int}
                    731: .M2
                    732: The bound stamps in the \&\f5param\fP structure are numbered from 0 to
                    733: %n% and these may also occur in the \&\f5body\fP structure.
                    734: Generative stamps in the body are numbered from %n+1% to %n+m%, which
                    735: is the value of \&\f5tycCount\fP.
                    736: .PP
                    737: If the functor declaration provides an explicit result signature, as in
                    738: .M1
                    739: functor F(X : SIGP) : SIGR = struct ... end
                    740: .M2
                    741: the body will naturally be schematic (i.e. the parts with bound stamps will
                    742: be isolated in instance vectors) as a result of the signature
                    743: matching between the body and the result signature.  However, if there
                    744: is no result signature, we explicitly abstract these ``volatile'' parts of
                    745: the body structure to get an instantiable scheme so that the body's
                    746: symbol table may be shared by all structures produced by the functor.
                    747: .PP
                    748: To apply the functor, signature matching is performed between the
                    749: parameter signature and the argument to build a realization map for
                    750: the bound stamps in the parameter.  Then the body is instantiated
                    751: using this realization map and introducing new constant stamps to
                    752: replace generative bound stamps as required.  The actual algorithm
                    753: is more complicated than this because functor application can occur
                    754: within the body of functor declaration, as in
                    755: .M1
                    756: functor F(X : SIGP) =
                    757:   struct
                    758:     structure A = G(X)
                    759:     structure B = H(A)
                    760:     ...
                    761:   end
                    762: .M2
                    763: In cases like the applications of \&\f5G\fP and \&\f5H\fP in this
                    764: example, the actual parameter may contain parameter bound and even
                    765: generatively bound stamps, and the realization of the generative
                    766: stamps in the body of \&\f5G\fP and \&\f5H\fP will themselves be
                    767: generatively bound stamps.
                    768: .NH 2
                    769: Sharing
                    770: .PP
                    771: The purpose of sharing constraints is to insure a kind of compatibility
                    772: between several parameter structures of a functor, as illustrated in
                    773: Figure 2.  The sharing constraints are expressed as sets of equations
                    774: between paths designating structures or types (there are two kinds of
                    775: sharing specifications: structure sharing and type sharing) and they
                    776: determine an equivalence relation amongst the components of the
                    777: signature.  The strategy for incorporating sharing constraints in the
                    778: representation of a signature is to force all components of an
                    779: equivalence class to have the same stamp.
                    780: .PP
                    781: Two components may be required to share either because they are
                    782: directly equated in a sharing specification, or because they are
                    783: corresponding components of structures that are required to share.
                    784: Thus if 
                    785: .M1
                    786: X : sig
                    787:       structure C1 : SIGC
                    788:       structure C2 : SIGC
                    789:       sharing C1.CB = C2.CB
                    790:     end
                    791: .M2
                    792: then \&\f5X.C1.CB = X.C2.CB\fP is directly specified, and \&\f5X.C1.CB.BA =
                    793: X.C2.CB.BA\fP is an inferred consequence.  This simply says that the
                    794: complete sharing relation must be a congruence with respect to the
                    795: operation of selecting a named substructure or type.
                    796: .PP
                    797: Under what circumstances may two structures be constrained to share?
                    798: They must be \fIconsistent\fP, in the sense that it is possible to
                    799: find a structure that could simultaneously match both of them.  In
                    800: particular, the structures that are forced to share do not necessarily
                    801: have to share the same signature, and the fact that they share does
                    802: not have any effect on their signature.  The idea is that various
                    803: thinned versions of a given structure may have different signatures,
                    804: but they can still share because they are actually restricted views
                    805: of their common ancestor structure.  This approach is supported by the
                    806: fact that in signature matching, the stamps of the matched structure
                    807: are inherited by the resultant structure.  Note that this is not the
                    808: approach described in [6], where signatures of sharing
                    809: structures are forced to agree by formation of a kind of union
                    810: signature.  We do not actually verify that signatures that are
                    811: specified to share are consistent, but if they are not, the signature
                    812: containing the sharing specification can never be successfully matched.
                    813: .PP
                    814: The processing of the sharing constraints is performed in two stages.
                    815: First, a union-find algorithm is used to determine all sharing
                    816: relations, direct and inferred, and to construct the equivalence
                    817: classes for the sharing relation.  At this stage it is also possible
                    818: to detect certain pathological sharing specifications, such as trying
                    819: to identify a structure with one of its substructures.  Second, the
                    820: signature is copied and each element of a given equivalence class is
                    821: given the same representative stamp.
                    822: .PP
                    823: There are two ways in which sharing information is used.  (1) When a
                    824: signature with sharing constraints is used as a functor parameter, the
                    825: identification of stamps in the signature will automatically insure
                    826: that the sharing has the desired effect during type checking, \fIi.e.\fP
                    827: types that are specified to share will be seen to be identical by the
                    828: type checker.  (2) During signature matching, any sharing relations
                    829: specified in the signature must also hold in the matched structure.
                    830: One way to check this would be to make sure that the realization map
                    831: was well defined, because a failure of sharing in the target structure
                    832: would cause a single bound stamp to be mapped to more than one target
                    833: stamp.  However, the realization map is only explicitly constructed in
                    834: the matching of functor parameters, where it is needed to help
                    835: instantiate the functor body.  Hence it is more convenient to simply
                    836: save the original sharing constraints as equations in the signature
                    837: and check them explicitly in the target structure as part of signature
                    838: matching.
                    839: .NH 2
                    840: Relation with type checking
                    841: .PP
                    842: What is the relationship between the structures and signatures and the
                    843: underlying ML type checking mechanism?  Obviously signatures and structures
                    844: are carriers of type information \(em that is one of their principle purposes.
                    845: When we look up a value component of a structure we get the same sort of bindings
                    846: as in the top-level environment, except that in some circumstances the
                    847: type has been relativized to the structure's instance vectors and it
                    848: contains \&\f5INDtyc\fP type constructors.  The basic variable lookup
                    849: functions have been defined to eliminate these indirections by replacing
                    850: them with the referenced type constructors from the instance vectors,
                    851: at the expense of partially copying the type.  This would appear to undo
                    852: some of the savings achieved by the structure sharing representation of
                    853: structures, but these copies tend to be ephemeral, and they are
                    854: quickly and efficiently garbage-collected.  The type information in
                    855: structures and signatures is, on the other hand, long-lived, so it is
                    856: more critical to minimize their space requirements.
                    857: .NH
                    858: Conclusions
                    859: .PP
                    860: The challenge of implementing the Standard ML module features is to
                    861: perform the compile-time matchings and instantiations necessary to
                    862: propagate and check type information with a minimum of duplication of
                    863: that information.  Experience has shown that a naive approach leads to
                    864: an explosion in the size of the static representations.
                    865: .PP
                    866: The implementation strategy described in this paper uses a structure
                    867: sharing instantiation technique instead of instantiation by copying,
                    868: and has proved to be reasonably modest in its space requirements.  It
                    869: also has the advantage that it remains quite close in spirit to the
                    870: formal static semantics.
                    871: .PP
                    872: Work on the Standard ML module facilities continues, and current
                    873: topics of interest include explicit functor signatures and the relation
                    874: of the module constructs to separate compilation.
                    875: .SH
                    876: Acknowledgements
                    877: .PP
                    878: This implementation is inspired directly by the work done on the
                    879: operational semantics of Standard ML by Robin Milner, Mads Tofte, and
                    880: Bob Harper, and I have benefited from many discussions with them.
                    881: Bob Harper passed on useful ideas from his prototype implementation
                    882: of modules.  Nick Rothwell helped me explore the use of structure sharing
                    883: in type checking.  Andrew Appel provided valuable advice throughout and
                    884: had a direct hand in the implementation of sharing specifications, and
                    885: he has been an equal partner in the creation of Standard ML of New Jersey.
                    886: .SH
                    887: References
                    888: .IP 1.
                    889: A. Appel and D. MacQueen, \fIA Standard ML compiler\fP, Proceedings of the
                    890: Conference on Functional Programming and Computer Architecture, Portland,
                    891: September 1987, G. Kahn, ed., LNCS Vol. 274, Springer-Verlag, 1987.
                    892: .IP 2.
                    893: H.-J. Boehm and A. Demers, \fIImplementing Russell\fP, 
                    894: Proceedings of SIGPLAN 86 Symposium on Compiler Construction, Palo Alto,
                    895: 1986, 186-195.
                    896: .IP 3.
                    897: R. S. Boyer and J Moore, \fIThe sharing of structure in theorem-proving
                    898: programs\fP, Machine Intelligence 7, B. Meltzer and D. Michie, eds., Edinburgh
                    899: University Press, 1972, 101-116.
                    900: .IP 4.
                    901: L. Damas and R. Milner, \fIPrincipal type schemes for functional programs\fP,
                    902: Proceedings of 9th ACM Symposium on Principles of Programming Languages,
                    903: Albuquerque, 1982, 207-212.
                    904: .IP 5.
                    905: R. Harper, D. MacQueen, and R. Milner, \fIStandard ML\fP, Laboratory for
                    906: Foundations of Computer Science, Dept. of Computer Science, University of
                    907: Edinburgh, ECS-LFCS-86-2, 1986. (Also Polymorphism II, 2, October 1985.)
                    908: .IP 6.
                    909: R. Harper, R. Milner, and M. Tofte, \fIA type discipline for program modules\fP,
                    910: Proceedings TAPSOFT 87, LNCS Vol. 250, Springer-Verlag, New York, 1987, 308-319.
                    911: .IP 7.
                    912: R. Harper, R. Milner, and M. Tofte, \fIThe semantics of Standard ML,
                    913: Version I\fP, Laboratory for Foundations of Computer Science, Dept. of
                    914: Computer Science, University of Edinburgh, ECS-LFCS-87-36, 1986.
                    915: .IP 8.
                    916: D. MacQueen, \fIModules for Standard ML\fP, in [5].  (An earlier version
                    917: appeared in Proceedings ACM Symposium on Lisp and Functional Programming,
                    918: Austin, 1984.)
                    919: .IP 9.
                    920: D. MacQueen, \fIUsing dependent types to express modular structure\fP,
                    921: Proceedings 13th ACM Symposium on Principles of Programming Languages, St.
                    922: Petersburg Beach, 1986, 277-286.
                    923: .IP 10.
                    924: P. Martin-L\o'o"'f, \fIConstructive mathematics and computer programming\fP,
                    925: Sixth International Congress for Logic, Methodology, and Philosophy of Science,
                    926: North Holland, Amsterdam, 1982, 153-175.
                    927: .IP 11.
                    928: J. C. Mitchell and R. Harper, \fIThe essence of ML\fP, Proceedings 15th
                    929: ACM Symposium on Principles of Programming Languages, San Diego, 1988,
                    930: 28-46.
                    931: .IP 12.
                    932: M. Tofte, Operational Semantics and Polymorphic Type Inference, Ph.D.
                    933: Dissertation, Dept. of Computer Science, University of Edinburgh, 1987.
                    934: .IP 13.
                    935: D. H. D. Warren, \fIImplementing PROLOG - Compiling Predicate Logic Programs,
                    936: Vol. I\fP, Dept. of Artificial Intelligence Report No. 39, University of
                    937: Edinburgh, 1977.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.