|
|
1.1 ! root 1: .RP ! 2: .ND March 11, 1988 ! 3: .TL ! 4: An Implementation of Standard ML Modules ! 5: .AU ! 6: David MacQueen ! 7: .AI ! 8: AT&T Bell Laboratories ! 9: Murray Hill, NJ 07974 ! 10: .AB ! 11: Standard ML includes a set of module constructs that support ! 12: programming in the large. These constructs extend ML's basic ! 13: polymorphic type system by introducing the dependent types of Martin ! 14: L\o'o"'f's Intuitionistic Type Theory. This paper discusses the ! 15: problems involved in implementing Standard ML's modules and describes ! 16: a practical, efficient solution to these problems. The ! 17: representations and algorithms of this implementation were inspired by ! 18: a detailed formal semantics of Standard ML developed by Milner, Tofte, ! 19: and Harper. The implementation is part of a new Standard ML compiler ! 20: that is written in Standard ML using the module system. ! 21: .AE ! 22: .EQ ! 23: delim %% ! 24: .EN ! 25: .de M1 \" start of a program display ! 26: .DS ! 27: .ft CW ! 28: .. ! 29: .de M2 ! 30: .ft R ! 31: .DE ! 32: .. ! 33: .de M3 ! 34: .if "\\$2"" \\&\f(CW\\$1\f1 ! 35: .if !"\\$2"" \\&\f(CW\\$1\f1\\&\\$2 ! 36: .. ! 37: .fp 5 CW ! 38: .NH ! 39: Introduction ! 40: .PP ! 41: An important part of the revision of ML that led to ! 42: the Standard ML language was the inclusion of module ! 43: facilities for the support of ``programming in the large.'' ! 44: The design of these facilities went through several ! 45: versions [8] and was supported by concurrent investigations ! 46: of the type theory of ML and related systems [9,11]. ! 47: The central idea behind the design was to support modularity ! 48: by introducing a stratified system of dependent types as suggested ! 49: by Martin L\o'o"'f's Intuitionistic Type Theory [10]. ! 50: In late 1985 Bob Harper added a prototype implementation of ! 51: most of the module facilities to the Edinburgh ML compiler, ! 52: which was serving as a test-bed for the evolving Standard ML design. ! 53: .PP ! 54: Starting in the spring of 1986, Andrew Appel, Trevor Jim, and I have ! 55: implemented a new Standard ML compiler, written in Standard ML, ! 56: and initially bootstrapped from the Edinburgh compiler. An overview of ! 57: this new compiler, known as Standard ML of New Jersey, is given in [1]. ! 58: The implementation of modules in this new compiler went through ! 59: two generations. ! 60: A first version was done in the fall of 1986, but it was completely ! 61: rewritten in the summer of 1987 following discussions of the ! 62: operational static semantics of modules with Robin Milner, Mads Tofte, ! 63: and Bob Harper [6,7,12]. ! 64: Like Harper's prototype implementation, the new modules implementation ! 65: was inspired by the static semantics, but it uses a \fIstructure sharing\fP ! 66: strategy [3,13] ! 67: to avoid serious performance problems associated with a naive ! 68: implementation of the static semantics. ! 69: Although precise comparative measurements are not yet available, our ! 70: experience shows that the symbol table size for a large ML program ! 71: such as the ML compiler is several times smaller with the ! 72: new compiler than with the old compiler. ! 73: .PP ! 74: The objective of this paper is to describe our implementation of the ! 75: Standard ML module facilities, with particular emphasis on the ! 76: techniques used to minimize the space consumed by static representations ! 77: of modules (\fIi.e.\fP symbol table structures). ! 78: We begin by reviewing the elements of the module language. ! 79: .NH ! 80: Summary of the module constructs ! 81: .PP ! 82: Before describing the basic issues concerning implementation ! 83: of Standard ML modules, we need to review the main elements ! 84: of the module language. ! 85: There are three principal notions: ! 86: .in .3i ! 87: .IP (1) ! 88: \f2signature\f1 \- interface specification or ``type'' of modules. ! 89: .IP (2) ! 90: \f2structure\f1 \- an environment; a collection of type, ! 91: structure, value, and exception bindings; corresponds to ! 92: the conventional idea of a module. ! 93: .IP (3) ! 94: \f2functor\f1 \- function mapping structures to structures; ! 95: a form of parametric module. ! 96: .in -0.3i ! 97: .LP ! 98: Figure 1 below contains simple examples of each of these constructs. ! 99: .M1 ! 100: signature ORD = ! 101: sig ! 102: type t ! 103: val le : t*t -> bool ! 104: end ! 105: ! 106: structure S = ! 107: struct ! 108: datatype t = A | B of t ! 109: val x = A ! 110: fun le(A,_) = true ! 111: | le(_,A) = false ! 112: | le(B x, B y) = le(x,y) ! 113: end ! 114: ! 115: signature LEXORD = ! 116: sig ! 117: structure A : ORD ! 118: val lexord : A.t list * A.t list -> bool ! 119: end ! 120: ! 121: functor LexOrd(O: ORD) : LEXORD = ! 122: struct ! 123: structure A = O ! 124: fun lexord([],_) = true ! 125: | lexord(_,[]) = false ! 126: | lexord(x::l,y::m) = ... O.le(x,y) ... ! 127: end ! 128: ! 129: structure LS = LexOrd(S) ! 130: .M2 ! 131: .tl ''\f2Figure 1\f1'' ! 132: .PP ! 133: This example contains declarations of two signatures, ! 134: .M3 ORD ! 135: and ! 136: .M3 LEXORD , ! 137: two structures, ! 138: .M3 S ! 139: and ! 140: .M3 LS , ! 141: and one functor, ! 142: \&\f5LexOrd\fP, mapping a structure of signature \&\f5ORD\fP ! 143: to a new structure of signature \&\f5LEXORD\fP. ! 144: The structure ! 145: .M3 LS ! 146: is defined as the result of applying ! 147: .M3 LexOrd ! 148: to ! 149: .M3 S . ! 150: We refer to components of a structure using qualified names or ! 151: paths formed with the usual ``dot'' notation: \fIe.g.\fP \&\f5S.t\fP, ! 152: \&\f5S.x\fP, \&\f5LS.A.le\fP. ! 153: .PP ! 154: A signature can be regarded as a form of ``type'' ! 155: for structures, or as a schematic representation ! 156: of a class of structures, and a ! 157: structure \f2matches\f1 a signature if it satisfies the specifications ! 158: given in the signature. A structure does not have to agree exactly with ! 159: a signature in order for it to match the signature; ! 160: in this example the structure ! 161: .M3 S ! 162: matches the signature ! 163: .M3 ORD , ! 164: even though ! 165: .M3 S ! 166: has an additional value component ! 167: .M3 x ! 168: not specified in ! 169: .M3 ORD . ! 170: In such cases signature matching has a coercive ! 171: effect, producing a ``thinned'' structure that exactly ! 172: agrees with the signature in terms of number of ! 173: components and their types. ! 174: .PP ! 175: Signature matching is performed in two contexts: ! 176: (1) when a signature constraint is given in a structure ! 177: declaration, as in: ! 178: .M1 ! 179: structure R : ORD = S ! 180: .M2 ! 181: and (2) when a functor is applied to an argument structure, ! 182: which must match the signature specified for the formal ! 183: parameter, as in ! 184: .M1 ! 185: structure LS = LexOrd(S) ! 186: .M2 ! 187: where ! 188: .M3 S ! 189: must match ! 190: .M3 ORD . ! 191: Actually, these two contexts are closely related under Landin's ! 192: principle of correspondence. In the first case, \&\f5R\fP is bound ! 193: to a thinned version of \&\f5S\fP that does not contain an \&\f5x\fP ! 194: component, and in the second case, the formal parameter \&\f5O\fP, and hence ! 195: the substructure \&\f5LS.A\fP, is also bound to a thinned version of \&\f5S\fP. ! 196: .PP ! 197: Another kind of specification that may appear in signatures ! 198: is the \f2sharing constraint\f1, the purpose of which is to ! 199: insure a kind of type coherence among functor parameters. ! 200: The program sketch in Figure 2 illustrates the use of sharing constraints. ! 201: .M1 ! 202: signature SYMBOL = sig type symbol ... end ! 203: ! 204: signature LEX = ! 205: sig ! 206: structure Symbol : SYMBOL ! 207: val next : unit -> Symbol.symbol ! 208: ... ! 209: end ! 210: ! 211: signature SYMBOLTABLE = ! 212: sig ! 213: structure Symbol : SYMBOL ! 214: type var ! 215: val bind : Symbol.symbol * var -> unit ! 216: ... ! 217: end ! 218: ! 219: signature PARSE_ARGS = ! 220: sig ! 221: structure Lex : LEX ! 222: structure SymTab : SYMBOLTABLE ! 223: sharing Lex.Symbol = SymTab.Symbol ! 224: end ! 225: ! 226: functor Parse(A: PARSE_ARGS) = ! 227: struct ... A.SymTab.bind(A.Lex.next(), v) ... end ! 228: .M2 ! 229: .tl ''\f2Figure 2\f1'' ! 230: .PP ! 231: The functor ! 232: .M3 Parse ! 233: essentially takes two structure arguments, ! 234: .M3 Lex ! 235: (implementing a lexical analyzer) and ! 236: .M3 SymTab ! 237: (implementing a symbol table), which are bundled as components ! 238: of a single parameter structure. ! 239: The sharing specification in ! 240: .M3 PARSE_ARGS ! 241: requires that the same ! 242: .M3 Symbol ! 243: structure be used in both ! 244: .M3 Lex ! 245: and ! 246: .M3 SymTab . ! 247: This insures that ! 248: .M3 Lex ! 249: and ! 250: .M3 SymTab ! 251: can consistently interact, as in the expression ! 252: .M3 "A.Symtab.bind(A.Lex.next(),v)" , ! 253: which is well-typed only if ! 254: .M3 A.Lex.Symbol.symbol ! 255: and ! 256: .M3 A.Symtab.Symbol.symbol ! 257: are the same type. ! 258: .PP ! 259: An important point about datatype and structure declarations ! 260: is that they are \f2generative\f1, meaning ! 261: that each time they are elaborated (\fIe.g.\fP in a functor body as a result ! 262: of functor applications) a new, distinct structure or type is created. ! 263: For example, in ! 264: .M1 ! 265: functor F () = ! 266: struct ! 267: datatype t = A | B of t ! 268: end ! 269: ! 270: structure S1 = F() ! 271: structure S2 = F() ! 272: .M2 ! 273: .M3 S1 ! 274: and ! 275: .M3 S2 ! 276: are distinct structures and ! 277: .M3 S1.t ! 278: and ! 279: .M3 S2.t ! 280: are distinct types, so ! 281: .M3 S1.B(S2.A) ! 282: is an ill-typed expression. ! 283: .PP ! 284: On the other hand, simple type definitions (whether occurring inside or ! 285: outside of structures) are \f2transparent\f1 rather than generative. ! 286: For instance, in ! 287: .M1 ! 288: structure IntOrd = ! 289: struct ! 290: type t = int ! 291: fun le(x,y) = x <= y ! 292: end ! 293: .M2 ! 294: the type ! 295: .M3 S.t ! 296: is identical to \&\f5int\fP. ! 297: In other words, there is no information hiding or abstraction ! 298: inherent in the formation of structures. ! 299: This applies even to the results of functor applications; type ! 300: information is propagated through functor applications, so that after ! 301: the declaration ! 302: .M1 ! 303: structure IntLexOrd = LexOrd (IntOrd) ! 304: .M2 ! 305: .M3 IntLexOrd.le ! 306: has type ! 307: .M3 "int list * int list -> bool" . ! 308: This reflects the dependent product nature of functor signatures, and ! 309: the fact that structures represent a form of \fIstrong\fP dependent sum ! 310: (see [9,11] for discussion of the relation between ML modules and dependent ! 311: types). ! 312: .NH ! 313: Implementation of modules ! 314: .PP ! 315: The principal tasks that an implementation must deal with are as follows: ! 316: .IP (1) ! 317: representation of signatures, structures, and functors. ! 318: .IP (2) ! 319: signature matching, including instantiation of the signature ! 320: template and possible thinning of the matched structure. ! 321: .IP (3) ! 322: functor application, including ! 323: .RS ! 324: .IP (a) ! 325: matching formal signature to actual parameter, with possible thinning ! 326: of the parameter. ! 327: .IP (b) ! 328: creation of the result structure, including propagation of type information from ! 329: parameter to result and generation of new instances of datatypes and ! 330: structures. ! 331: .RE ! 332: .IP (4) ! 333: representation and checking of sharing constraints. ! 334: .PP ! 335: Most of these tasks have two parts, the \f2static\f1 or compile-time ! 336: task and the \f2dynamic\f1 or run-time task. ! 337: The run-time problems are straightforward and are discussed in the ! 338: next subsection. ! 339: Our main focus will be on the static aspects of the module language, ! 340: for which our principal implementation goals are: ! 341: .IP (1) ! 342: compact representation of structures having a given signature ! 343: .IP (2) ! 344: efficient signature matching and functor application, with minimal ! 345: duplication of static (\fIi.e.\fP symbol table) information ! 346: .IP (3) ! 347: efficient representation and checking of sharing constraints. ! 348: .NH 2 ! 349: Dynamic representations and processes. ! 350: .PP ! 351: The run-time representations of modules are remarkably simple [1]. ! 352: Signatures and types have no run-time representation \(em they exist ! 353: only at the static level. A structure is represented as a record ! 354: whose components represent the dynamic ! 355: structure components (\fIi.e.\fP substructures, values, and exceptions) in a ! 356: canonical order. A functor is represented as an ordinary function ! 357: closure, and functor application corresponds to the normal application ! 358: of this function to a record representing the argument structure. The ! 359: thinning coercions associated with signature matching give rise to ! 360: in-line code to construct the thinned record. ! 361: .PP ! 362: In the middle-end of the compiler, all module constructs are reduced ! 363: to the same simple lambda-calculus based intermediate language that is ! 364: used for the core ML constructs of value declarations and expressions. ! 365: In effect, the back-end of the compiler is unaware of the existence of ! 366: the module constructs \(em they have been reduced to common notions of ! 367: records and functions. ! 368: .NH 2 ! 369: Static Representations ! 370: .PP ! 371: A naive representation of signatures and structures can be modeled ! 372: more or less directly on the semantic constructs used in the operational static ! 373: semantics [5,6]. ! 374: There a structure is modeled by an environment %E% ! 375: that maps component identifiers to the appropriate sort of ! 376: static binding (type, structure, variable, etc.), and a ! 377: \fIstamp\fP,* %n%, that uniquely identifies the structure: ! 378: %str^=^(n,E)%. ! 379: .FS ! 380: *We prefer the term ``stamp'' for this purpose, rather than the term ! 381: ``name'' used in [6,7], ! 382: since ``name'' could also refer to the identifier to which a structure ! 383: is bound or an identifier bound within a structure. ! 384: .FE ! 385: We can view a structure as a tree or dag with nodes labeled by stamps ! 386: and edges labeled by component names. ! 387: A signature is then a structure together with a designated set of ! 388: \fIbound\fP or \fIschematic\fP stamps occurring within the structure: ! 389: %sig^=^(N)(n,E)%. ! 390: .PP ! 391: We illustrate this with the definitions in Figure 3 ! 392: and the corresponding graphs in Figure 4 (adapted from [6]), in which ! 393: \fI(a)\fP represents the structure \&\f5C\fP and \fI(b)\fP represents ! 394: the signature \&\f5SIGC\fP. ! 395: Our convention for distinguishing between constant and bound stamps ! 396: is that metavariables %k sub i% ! 397: range over constant stamps, while metavariables %x sub i% range over ! 398: bound stamps in a signature. This is a more concise alternative to the ! 399: separate specification of the graph and the set of bound stamps ! 400: We emphasize the distinction by using solid ! 401: circles for nodes with constant stamps and open circles for nodes with ! 402: bound stamps. ! 403: A structure will always contain only constant ! 404: stamps, while a signature will typically contain only bound stamps. The ! 405: graphs are simplified by showing only structure components, but type ! 406: components are dealt with similarly. ! 407: ! 408: .DS ! 409: .ft 5 ! 410: .ta 3i ! 411: structure A = signature SIGA = ! 412: struct sig ! 413: type t = int type t ! 414: fun f n = 2 * n val f : t -> t ! 415: end end ! 416: ! 417: structure B = signature SIGB = ! 418: struct sig ! 419: structure BA = A structure BA: SIGA ! 420: fun g x = BA.f(x) + 1 val g: BA.t -> BA.t ! 421: end end ! 422: ! 423: structure C = signature SIGC = ! 424: struct sig ! 425: structure CA = A structure CA : SIGA ! 426: structure CB = B structure CB : SIGB ! 427: fun h x = CB.g(CA.f x) val h: CA.t -> CA.t ! 428: end end ! 429: .ft R ! 430: .DE ! 431: .tl ''\f2Figure 3\f1'' ! 432: .PS ! 433: circlerad = 0.05i ! 434: define str ' ! 435: {C1: circle ! 436: move to C1 ! 437: move down right ! 438: C2: circle ! 439: move to C2 ! 440: move down left ! 441: C3: circle ! 442: A1: arrow from C1.c to C2.c chop ! 443: A2: arrow from C2.c to C3.c chop ! 444: A3: arrow from C1.c to C3.c chop ! 445: "%k sub 1% " at C1.w rjust ! 446: " %k sub 3%" at C2.e ljust ! 447: "%k sub 2% " at C3.w rjust ! 448: " CB" at A1.center ljust ! 449: " BA" at A2.center ljust ! 450: "CA " at A3.center rjust ! 451: move to C1 ! 452: move down 1.3i ! 453: "\f5structure C\fR" center ! 454: } ! 455: ' ! 456: define sig ' ! 457: {C1: circle ! 458: move to C1 ! 459: move down left ! 460: C2: circle ! 461: move to C1 ! 462: move down right ! 463: C3: circle ! 464: move to C3 ! 465: move down ! 466: C4: circle ! 467: A1: arrow from C1.c to C2.c chop ! 468: A2: arrow from C1.c to C3.c chop ! 469: A3: arrow from C3.c to C4.c chop ! 470: " %x sub 1%" at C1.e ljust ! 471: "%x sub 2% " at C2.w rjust ! 472: " %x sub 3%" at C3.e ljust ! 473: " %x sub 4%" at C4.e ljust ! 474: "CA " at A1.center rjust ! 475: " CB" at A2.center ljust ! 476: " BA" at A3.center ljust ! 477: move to C1 ! 478: move down 1.3i ! 479: "\f5signature SIGC\fR" center ! 480: } ! 481: ' ! 482: H: str; move to H; move right 2.25i; sig ! 483: .PE ! 484: .tl ''\f2Figure 4\f1'' ! 485: ! 486: .PP ! 487: The purpose of a signature matching \&\f5S: SIG\fP is to produce a ! 488: structure \&\f5S'\fP that has exactly the form specified by ! 489: \&\f5SIG\fP and yet shares the identity (\fIi.e.\fP the stamps) of ! 490: \&\f5S\fP. In some cases, \&\f5S\fP and \&\f5S'\fP are identical, as ! 491: when \&\f5S\fP had already matched the signature \&\f5SIG\fP. In ! 492: other cases \&\f5S'\fP is a thinned version of \&\f5S\fP having fewer ! 493: components or components whose types are generic instances of their ! 494: types in \&\f5S\fP. Another product of matching is the realization ! 495: map, whose use in functor applications is explained in Section 3.5. ! 496: .PP ! 497: We can think of \&\f5SIG\fP as a scheme analogous to a generic type scheme or ! 498: polytype in the core ML type system [4], with the bound stamps ! 499: playing the role of generic type variables in a type scheme. The ! 500: product of matching is then an instance of this scheme under the ! 501: substitution represented by the realization map. The details of this ! 502: analogy have been worked out by Mads Tofte, including a version of the ! 503: principal typing theorem of [4]. ! 504: .PP ! 505: A naive implementation of matching would make a copy of the signature ! 506: \&\f5SIG\fP, in the process replacing each bound stamp by the corresponding ! 507: constant stamp from \&\f5S\fP. This would involve copying most of the ! 508: environment part of the signature, since we have to instantiate the ! 509: type specifications of values and exceptions as well as instantiating ! 510: the types and substructures themselves. However, the environment or ! 511: symbol table part of the signature can be regarded as a template ! 512: relative to its type and substructure components, which are the only ! 513: parts that need to change during signature matching. For instance, a ! 514: type specification like \&\f5f: t->t\fP can remain fixed if it is interpreted ! 515: relative to the type component \&\f5t\fP. We can abstract out the type and ! 516: structure components carrying the bound stamps and use the rest of the ! 517: information in a signature as an unchanging template that can be ! 518: shared by all instances of the signature. This is the familiar ! 519: \fIstructure sharing\fP idea first proposed by Boyer and Moore in the ! 520: context of resolution theorem proving [3] and later exploited in the ! 521: implementation of Prolog [13]. The use of structure sharing in ! 522: the basic ML type system has been considered, but in that context it ! 523: does not appear to have a clear advantage over the simpler approach of ! 524: instantiation by copying. In the case of signature matching, however, ! 525: the shared information in the template is typically of considerable ! 526: volume, so structure sharing is quite effective in saving space ! 527: relative to copying. ! 528: .PP ! 529: The definitions of the basic datatypes used to represent type ! 530: constructors, structures and signatures are given in Figure 5. The ! 531: representations of structures and signatures both use the \&\f5Structure\fP ! 532: datatype, and differ only in the value of the \&\f5kind\fP field. The \&\f5stamp\fP ! 533: field contains the identifying stamp, the \&\f5table\fP is the environment ! 534: component represented as a hash table mapping symbols to the various ! 535: sorts of bindings, and the \&\f5env\fP field contains a pair of instance ! 536: vectors for type and structure components. The \&\f5sign\fP field in ! 537: a signature to identify the signature (the \&\f5stamp\fP field ! 538: will have a formal value representing a bound stamp); in ! 539: a structure it identifies the signature the structure is an instantiation of, ! 540: if any. Bound and constant stamps are both represented as integers; ! 541: stamps greater than some base value are constant stamps, while stamps ! 542: less than that value are bound. Within a given signature, bound ! 543: stamps are canonically numbered starting from 0. ! 544: .M1 ! 545: datatype tycon ! 546: = TYCON of {stamp : stamp, ...} ! 547: | INDtyc of int list ! 548: ! 549: datatype Structure ! 550: = STRstr of ! 551: {stamp : stamp, ! 552: sign : stamp option, ! 553: table : symtable, ! 554: env : strenv, ! 555: kind : strkind} ! 556: | INDstr of int ! 557: ! 558: and strkind ! 559: = STRkind ! 560: | SIGkind of ! 561: {share : sharespec, ! 562: bindings : binding list, ! 563: stampcounts : {s : int, t : int}} ! 564: .M2 ! 565: .tl ''\f2Figure 5\f1'' ! 566: ! 567: .PP ! 568: The \&\f5INDtyc\fP and \&\f5INDstr\fP forms of type constructors and structures are ! 569: used within the symbol table to refer indirectly to components stored ! 570: in the instance vectors. The term \&\f5INDtyc[%i%]\fR refers to the ith ! 571: element of the type instance vector, while \&\f5INDtyc[%i%,%j%]\fR refers to the ! 572: jth element of the type vector of the ith element of the structure ! 573: vector. The type specifications ! 574: .M1 ! 575: f: t -> t ! 576: h: CA.t -> CA.t ! 577: .M2 ! 578: from Figure 3 are represented internally as ! 579: .M1 ! 580: f: INDtyc[0] -> INDtyc[0] ! 581: h: INDtyc[0,0] -> INDtyc[0,0] ! 582: .M2 ! 583: The representation of the entire signature SIGC from Figure 3 can be ! 584: summarized as follows: ! 585: .M1 ! 586: SIGC: ! 587: stamp: 0 ! 588: table: CA => INDstr 0 ! 589: CB => INDstr 1 ! 590: h => VAR: INDtyc[0,0] -> INDtyc[0,0] ! 591: strenv: structures = <SIGA',SIGB'> ! 592: types = <> ! 593: ! 594: SIGA': ! 595: stamp: 1 ! 596: table: t => INDtyc[0] ! 597: f => VAR: INDtyc[0] -> INDtyc[0] ! 598: strenv: structures = <> ! 599: types = <DUMMY 0> ! 600: ! 601: .M2 ! 602: .M1 ! 603: SIGB': ! 604: stamp: 2 ! 605: table: BA => INDstr 0 ! 606: g => VAR: BA.t -> BA.t ! 607: strenv: structures = <SIGA''> ! 608: types = <> ! 609: ! 610: SIGA'': ! 611: stamp: 3 ! 612: table: t => INDtyc[0] ! 613: f => VAR: INDtyc[0] -> INDtyc[0] ! 614: strenv: structures = <> ! 615: types = <DUMMY 1> ! 616: .M2 ! 617: .LP ! 618: Note that there are two copies of the signature \&\f5SIGA\fP, identified ! 619: as \&\f5SIGA'\fP and \&\f5SIGA''\fP, each with its own stamp. This ! 620: duplication is required to get the canonical numbering of stamps for ! 621: each component of \&\f5SIGC\fP, but each of these copies shares the ! 622: original symbol table component from \&\f5SIGA\fP. \&\f5DUMMY 0\fP ! 623: and \&\f5DUMMY 1\fP are dummy type constructor components, which have ! 624: their own separate numbering within the context of \&\f5SIGC\fP. ! 625: .NH 2 ! 626: Signature Matching ! 627: .PP ! 628: We now describe the process of signature matching in terms of the ! 629: representation described above. Given a signature \&\f5sig\fP and ! 630: structure \&\f5str\fP represented as ! 631: .M1 ! 632: sig = STRstr{stamp = x, sign = n, table = sigtab, env = sigenv, ! 633: kind = SIGkind{bindings,stampcounts,sharing}} ! 634: ! 635: str = STRstr{stamp = k, sign = s, table = strtab, env = strenv, ! 636: kind=STRkind} ! 637: .M2 ! 638: we first check whether \&\f5s=n\fP , and if so ! 639: return \&\f5str\fP, because \&\f5str\fP is already an instance of ! 640: \&\f5sig\fP. Otherwise we attempt to construct a new instance of ! 641: \&\f5sig\fP. We start by allocating a new pair of instance vectors, ! 642: \&\f5newenv={s=sNew,t=tNew}\fP, based on the size information in the ! 643: \&\f5stampcounts\fP field. Then we iterate through the list of all ! 644: of \&\f5sig\fP's bindings (\fIi.e.\fP specifications), which is ! 645: available in the field \&\f5bindings\fP. For each structure binding ! 646: \&\f5(id,INDstr i)\fP in \&\fIsig\fP, we look up a structure named ! 647: \&\fIid\fP in \&\f5strtab\fP. If it does not exist, matching fails. ! 648: If it does exist, we recursively match it against the substructure ! 649: signature bound to \&\f5id\fP in \&\f5sig\fP (obtained as the ith ! 650: element of the structure vector in \&\f5sigenv\fP), and if successful ! 651: use the result to define the ith element of \&\f5sNew\fP. Similarly ! 652: for type bindings, where we check that the type constructor bound in ! 653: \&\f5str\fP agrees with the specification in sig (\fIe.g.\fP they must have ! 654: the same arity). For value specifications like \&\f5x: ty\fP, we ! 655: interpret indirect type constructors in \&\f5ty\fP with respect to ! 656: \&\f5newenv\fP and check that we have a generic instance of the type ! 657: of the corresponding component of \&\f5str\fP. Checking value ! 658: components has no effect on the instance vectors, but if necessary ! 659: (\fIi.e.\fP if \&\f5sig\fP has fewer value components than ! 660: \&\f5str\fP) we calculate the translation between the old and new ! 661: runtime positions of the components and collect this information in a ! 662: thinning specification to be applied at runtime. ! 663: .PP ! 664: When we have successfully matched all the bindings in \&\f5sig\fP we build the ! 665: result structure ! 666: .M3 str\' ! 667: using ! 668: .M3 sig \'s ! 669: table and ! 670: .M3 sign , ! 671: .M3 str \'s ! 672: stamp , ! 673: and ! 674: .M3 newenv ! 675: .M1 ! 676: str' = STRstr {stamp=n, sign=SOME k, table=sigtab, ! 677: env=newenv, kind=STRkind} ! 678: .M2 ! 679: Finally any sharing constraints (from sharing) are checked as ! 680: described in Section 3.5, and we return \&\f5str'\fP and a thinning ! 681: specification as the result of the match. Note that ! 682: the bulk of the information in the signature is in ! 683: .M3 sigtab , ! 684: and this is directly shared with the instantiation ! 685: .M3 str\' . ! 686: .PP ! 687: As a shortcut, when elaborating a declaration like ! 688: .M1 ! 689: structure S: SIG = struct \fIdeclarations\fP end ! 690: .M2 ! 691: we do not build the structure on the right-hand side before doing ! 692: the signature match. Instead we elaborate the body declarations in ! 693: the top-level environment and then do the signature matching using ! 694: the top-level environment in the place of the target structure. ! 695: .NH 2 ! 696: Functors and functor application ! 697: .PP ! 698: In the static semantics a functor %F% is modeled by a pair of ! 699: structures representing the parameter and body of the functor, and ! 700: two sets of bound stamps. ! 701: .EQ I ! 702: F ~ = ~ ( N ) ( S sub p , ^ ( N ' ) S sub b ) ! 703: .EN ! 704: %N% is the set of bound stamps in the parameter structure, which may ! 705: also occur in the body %S sub b%, while %N '% ! 706: is the set of stamps associated with generative elements of the ! 707: functor body. To apply %F% to an argument structure %A% we perform ! 708: the following steps ! 709: .EQ I ! 710: r ~ = ~ match ( ( N ) S sub p , ^ A ) ! 711: .EN ! 712: .EQ I ! 713: g ~ = ~ generate ( N ' ) ! 714: .EN ! 715: .EQ I ! 716: F ( A ) ~ = ~ g ( r ( S sub b ) ) ! 717: .EN ! 718: That is, we match the parameter signature and the argument to produce ! 719: a realization map %r%, then we generate a realization map %g% that maps ! 720: each bound stamp in %N '% to a unique, new constant stamp, and finally ! 721: we produce the result structure by using %r% and %g% to instantiate ! 722: the body structure. ! 723: .PP ! 724: The implementation of functors follows this scheme closely. The ! 725: datatype used to represent functors is defined by ! 726: .M1 ! 727: datatype Functor = FUNCTOR of ! 728: {param : Structure, ! 729: body : Structure, ! 730: tycCount : int} ! 731: .M2 ! 732: The bound stamps in the \&\f5param\fP structure are numbered from 0 to ! 733: %n% and these may also occur in the \&\f5body\fP structure. ! 734: Generative stamps in the body are numbered from %n+1% to %n+m%, which ! 735: is the value of \&\f5tycCount\fP. ! 736: .PP ! 737: If the functor declaration provides an explicit result signature, as in ! 738: .M1 ! 739: functor F(X : SIGP) : SIGR = struct ... end ! 740: .M2 ! 741: the body will naturally be schematic (i.e. the parts with bound stamps will ! 742: be isolated in instance vectors) as a result of the signature ! 743: matching between the body and the result signature. However, if there ! 744: is no result signature, we explicitly abstract these ``volatile'' parts of ! 745: the body structure to get an instantiable scheme so that the body's ! 746: symbol table may be shared by all structures produced by the functor. ! 747: .PP ! 748: To apply the functor, signature matching is performed between the ! 749: parameter signature and the argument to build a realization map for ! 750: the bound stamps in the parameter. Then the body is instantiated ! 751: using this realization map and introducing new constant stamps to ! 752: replace generative bound stamps as required. The actual algorithm ! 753: is more complicated than this because functor application can occur ! 754: within the body of functor declaration, as in ! 755: .M1 ! 756: functor F(X : SIGP) = ! 757: struct ! 758: structure A = G(X) ! 759: structure B = H(A) ! 760: ... ! 761: end ! 762: .M2 ! 763: In cases like the applications of \&\f5G\fP and \&\f5H\fP in this ! 764: example, the actual parameter may contain parameter bound and even ! 765: generatively bound stamps, and the realization of the generative ! 766: stamps in the body of \&\f5G\fP and \&\f5H\fP will themselves be ! 767: generatively bound stamps. ! 768: .NH 2 ! 769: Sharing ! 770: .PP ! 771: The purpose of sharing constraints is to insure a kind of compatibility ! 772: between several parameter structures of a functor, as illustrated in ! 773: Figure 2. The sharing constraints are expressed as sets of equations ! 774: between paths designating structures or types (there are two kinds of ! 775: sharing specifications: structure sharing and type sharing) and they ! 776: determine an equivalence relation amongst the components of the ! 777: signature. The strategy for incorporating sharing constraints in the ! 778: representation of a signature is to force all components of an ! 779: equivalence class to have the same stamp. ! 780: .PP ! 781: Two components may be required to share either because they are ! 782: directly equated in a sharing specification, or because they are ! 783: corresponding components of structures that are required to share. ! 784: Thus if ! 785: .M1 ! 786: X : sig ! 787: structure C1 : SIGC ! 788: structure C2 : SIGC ! 789: sharing C1.CB = C2.CB ! 790: end ! 791: .M2 ! 792: then \&\f5X.C1.CB = X.C2.CB\fP is directly specified, and \&\f5X.C1.CB.BA = ! 793: X.C2.CB.BA\fP is an inferred consequence. This simply says that the ! 794: complete sharing relation must be a congruence with respect to the ! 795: operation of selecting a named substructure or type. ! 796: .PP ! 797: Under what circumstances may two structures be constrained to share? ! 798: They must be \fIconsistent\fP, in the sense that it is possible to ! 799: find a structure that could simultaneously match both of them. In ! 800: particular, the structures that are forced to share do not necessarily ! 801: have to share the same signature, and the fact that they share does ! 802: not have any effect on their signature. The idea is that various ! 803: thinned versions of a given structure may have different signatures, ! 804: but they can still share because they are actually restricted views ! 805: of their common ancestor structure. This approach is supported by the ! 806: fact that in signature matching, the stamps of the matched structure ! 807: are inherited by the resultant structure. Note that this is not the ! 808: approach described in [6], where signatures of sharing ! 809: structures are forced to agree by formation of a kind of union ! 810: signature. We do not actually verify that signatures that are ! 811: specified to share are consistent, but if they are not, the signature ! 812: containing the sharing specification can never be successfully matched. ! 813: .PP ! 814: The processing of the sharing constraints is performed in two stages. ! 815: First, a union-find algorithm is used to determine all sharing ! 816: relations, direct and inferred, and to construct the equivalence ! 817: classes for the sharing relation. At this stage it is also possible ! 818: to detect certain pathological sharing specifications, such as trying ! 819: to identify a structure with one of its substructures. Second, the ! 820: signature is copied and each element of a given equivalence class is ! 821: given the same representative stamp. ! 822: .PP ! 823: There are two ways in which sharing information is used. (1) When a ! 824: signature with sharing constraints is used as a functor parameter, the ! 825: identification of stamps in the signature will automatically insure ! 826: that the sharing has the desired effect during type checking, \fIi.e.\fP ! 827: types that are specified to share will be seen to be identical by the ! 828: type checker. (2) During signature matching, any sharing relations ! 829: specified in the signature must also hold in the matched structure. ! 830: One way to check this would be to make sure that the realization map ! 831: was well defined, because a failure of sharing in the target structure ! 832: would cause a single bound stamp to be mapped to more than one target ! 833: stamp. However, the realization map is only explicitly constructed in ! 834: the matching of functor parameters, where it is needed to help ! 835: instantiate the functor body. Hence it is more convenient to simply ! 836: save the original sharing constraints as equations in the signature ! 837: and check them explicitly in the target structure as part of signature ! 838: matching. ! 839: .NH 2 ! 840: Relation with type checking ! 841: .PP ! 842: What is the relationship between the structures and signatures and the ! 843: underlying ML type checking mechanism? Obviously signatures and structures ! 844: are carriers of type information \(em that is one of their principle purposes. ! 845: When we look up a value component of a structure we get the same sort of bindings ! 846: as in the top-level environment, except that in some circumstances the ! 847: type has been relativized to the structure's instance vectors and it ! 848: contains \&\f5INDtyc\fP type constructors. The basic variable lookup ! 849: functions have been defined to eliminate these indirections by replacing ! 850: them with the referenced type constructors from the instance vectors, ! 851: at the expense of partially copying the type. This would appear to undo ! 852: some of the savings achieved by the structure sharing representation of ! 853: structures, but these copies tend to be ephemeral, and they are ! 854: quickly and efficiently garbage-collected. The type information in ! 855: structures and signatures is, on the other hand, long-lived, so it is ! 856: more critical to minimize their space requirements. ! 857: .NH ! 858: Conclusions ! 859: .PP ! 860: The challenge of implementing the Standard ML module features is to ! 861: perform the compile-time matchings and instantiations necessary to ! 862: propagate and check type information with a minimum of duplication of ! 863: that information. Experience has shown that a naive approach leads to ! 864: an explosion in the size of the static representations. ! 865: .PP ! 866: The implementation strategy described in this paper uses a structure ! 867: sharing instantiation technique instead of instantiation by copying, ! 868: and has proved to be reasonably modest in its space requirements. It ! 869: also has the advantage that it remains quite close in spirit to the ! 870: formal static semantics. ! 871: .PP ! 872: Work on the Standard ML module facilities continues, and current ! 873: topics of interest include explicit functor signatures and the relation ! 874: of the module constructs to separate compilation. ! 875: .SH ! 876: Acknowledgements ! 877: .PP ! 878: This implementation is inspired directly by the work done on the ! 879: operational semantics of Standard ML by Robin Milner, Mads Tofte, and ! 880: Bob Harper, and I have benefited from many discussions with them. ! 881: Bob Harper passed on useful ideas from his prototype implementation ! 882: of modules. Nick Rothwell helped me explore the use of structure sharing ! 883: in type checking. Andrew Appel provided valuable advice throughout and ! 884: had a direct hand in the implementation of sharing specifications, and ! 885: he has been an equal partner in the creation of Standard ML of New Jersey. ! 886: .SH ! 887: References ! 888: .IP 1. ! 889: A. Appel and D. MacQueen, \fIA Standard ML compiler\fP, Proceedings of the ! 890: Conference on Functional Programming and Computer Architecture, Portland, ! 891: September 1987, G. Kahn, ed., LNCS Vol. 274, Springer-Verlag, 1987. ! 892: .IP 2. ! 893: H.-J. Boehm and A. Demers, \fIImplementing Russell\fP, ! 894: Proceedings of SIGPLAN 86 Symposium on Compiler Construction, Palo Alto, ! 895: 1986, 186-195. ! 896: .IP 3. ! 897: R. S. Boyer and J Moore, \fIThe sharing of structure in theorem-proving ! 898: programs\fP, Machine Intelligence 7, B. Meltzer and D. Michie, eds., Edinburgh ! 899: University Press, 1972, 101-116. ! 900: .IP 4. ! 901: L. Damas and R. Milner, \fIPrincipal type schemes for functional programs\fP, ! 902: Proceedings of 9th ACM Symposium on Principles of Programming Languages, ! 903: Albuquerque, 1982, 207-212. ! 904: .IP 5. ! 905: R. Harper, D. MacQueen, and R. Milner, \fIStandard ML\fP, Laboratory for ! 906: Foundations of Computer Science, Dept. of Computer Science, University of ! 907: Edinburgh, ECS-LFCS-86-2, 1986. (Also Polymorphism II, 2, October 1985.) ! 908: .IP 6. ! 909: R. Harper, R. Milner, and M. Tofte, \fIA type discipline for program modules\fP, ! 910: Proceedings TAPSOFT 87, LNCS Vol. 250, Springer-Verlag, New York, 1987, 308-319. ! 911: .IP 7. ! 912: R. Harper, R. Milner, and M. Tofte, \fIThe semantics of Standard ML, ! 913: Version I\fP, Laboratory for Foundations of Computer Science, Dept. of ! 914: Computer Science, University of Edinburgh, ECS-LFCS-87-36, 1986. ! 915: .IP 8. ! 916: D. MacQueen, \fIModules for Standard ML\fP, in [5]. (An earlier version ! 917: appeared in Proceedings ACM Symposium on Lisp and Functional Programming, ! 918: Austin, 1984.) ! 919: .IP 9. ! 920: D. MacQueen, \fIUsing dependent types to express modular structure\fP, ! 921: Proceedings 13th ACM Symposium on Principles of Programming Languages, St. ! 922: Petersburg Beach, 1986, 277-286. ! 923: .IP 10. ! 924: P. Martin-L\o'o"'f, \fIConstructive mathematics and computer programming\fP, ! 925: Sixth International Congress for Logic, Methodology, and Philosophy of Science, ! 926: North Holland, Amsterdam, 1982, 153-175. ! 927: .IP 11. ! 928: J. C. Mitchell and R. Harper, \fIThe essence of ML\fP, Proceedings 15th ! 929: ACM Symposium on Principles of Programming Languages, San Diego, 1988, ! 930: 28-46. ! 931: .IP 12. ! 932: M. Tofte, Operational Semantics and Polymorphic Type Inference, Ph.D. ! 933: Dissertation, Dept. of Computer Science, University of Edinburgh, 1987. ! 934: .IP 13. ! 935: D. H. D. Warren, \fIImplementing PROLOG - Compiling Predicate Logic Programs, ! 936: Vol. I\fP, Dept. of Artificial Intelligence Report No. 39, University of ! 937: Edinburgh, 1977.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.