Annotation of researchv10no/cmd/sml/src/mips/opcodes.nw, revision 1.1.1.1

1.1       root        1: \chapter{Handling the MIPS opcodes}
                      2: \section{Introduction}
                      3: 
                      4: This file generates the code necessary to handle MIPS instructions
                      5: in a natural, mnemonic way from within ML.
                      6: All MIPS instructions occupy 32 bits, and since ML has no simple
                      7: 32~bit data type, we use pairs of integerss to represent MIPS instructions.
                      8: A pair [[(hi,lo)]] of 16-bit integers holds the most and least significant
                      9: halfwords of the MIPS word.
                     10: ML integers are 31 bits, so this is more than adequate.
                     11: 
                     12: The biggest hassle in converting between these integer pairs and more
                     13: mnemonic representations is that it is too easy to make mistakes
                     14: (especially typographical errors) in writing the code.
                     15: For that reason, I have added an extra level of indirection to the
                     16: whole business by putting all of the instruction descriptions in
                     17: tables.
                     18: These tables are read by an awk script, which writes two ML files:
                     19: {\tt opcodes.sml} and {\tt mipsdecode.sml}.
                     20: The {\tt opcodes.sml} file contains the code needed to convert from
                     21: a mnemonic like [[add(3,4,9)]] (add the contents of register~3 to
                     22: the contents of register~4, placing the result in register~9) to 
                     23: the integer pair representation of the actual bits in that add instruction
                     24: (in this case [[(137,6176)]]).
                     25: The {\tt mipsdecode.sml} file contains a [[decode]] function that converts
                     26: from the integer pair representation of instructions to a string
                     27: representation.
                     28: The string representation is a little hokey at the moment (that is,
                     29: it's different from the one used in the MIPS book), but it represents
                     30: a nice compromise between being readable and easy to generate.
                     31: 
                     32: I have contemplating generating a third file to test the whole
                     33: business.
                     34: The idea would be to have a function that would write out (to files)
                     35: two
                     36: parallel representations of the same instruction stream (presumably
                     37: one copy of each known instruction).
                     38: One representation would be the binary one understood by the MIPS.
                     39: The other representation would be a string representation.
                     40: We could then use a tool like {\tt gdb} or {\tt adb} to print out
                     41: the binary as an instruction sequence (i.e. convert back to
                     42: a second string representation) and compare the string representations
                     43: to see if they make sense.
                     44: 
                     45: \paragraph{Possible bugs}
                     46: This code should be gone over with care to make sure that negative
                     47: operands (e.g. in [[offset]]) won't break the code.
                     48: 
                     49: 
                     50: @
                     51: We need a special line in the Makefile to handle this file, since
                     52: it writes both an awk program and that program's input.  The input
                     53: is in module {\tt @<<opcodes table@>>} so the line is
                     54: $$\hbox{[[     $(NOTANGLE) '-Ropcodes table' opcodes.ow > opcodes]]}$$
                     55: The input is nothing but a sequence of tables, each labelled, and
                     56: processed one after anothing according to the label.
                     57: The label is always a single word on a line by itself.
                     58: Tables end with blank lines.
                     59: @ The opcode-to-pair code is written to the standard output, in 
                     60: [[structure Opcodes]].
                     61: The pair-to-string code is written to [["mipsdecode.sml"]], in
                     62: [[structure MipsDecode]].
                     63: 
                     64: We begin by defining and and shift functions.
                     65: We make pessimistic assumptions about shifting, trying always to
                     66: keep the arguments between 0 and 31 inclusive.
                     67: <<BEGIN>>=
                     68: print "structure Opcodes = struct"
                     69: print "val andb = Bits.andb"
                     70: print "fun lshift(op1,amt) = "
                     71: print "    if amt<0 then Bits.rshift(op1,0-amt)"
                     72: print "    else Bits.lshift(op1,amt)"
                     73: print "nonfix sub"     # bug fixes; want [[sub]] to be a MIPS opcode
                     74: print "nonfix div"     # bug fixes; want [[div]] to be a MIPS opcode
                     75: 
                     76: decode = "mipsdecode.sml";
                     77: print "structure MipsDecode = struct" > decode
                     78: print "val andb = Bits.andb" > decode
                     79: print "fun rshift(op1,amt) = " > decode
                     80: print "    if amt<0 then Bits.lshift(op1,0-amt)" > decode
                     81: print "    else Bits.rshift(op1,amt)" > decode
                     82: <<END>>=
                     83: <<write out the definitions of the decoding functions>>
                     84: print "end (* Opcodes *)"
                     85: print "end (* Decode *)" > decode
                     86: @ The sections BEGIN and END are drawn from 
                     87:  our universal model of an awk program:
                     88: <<*>>=
                     89: BEGIN {
                     90:   <<BEGIN>>
                     91: }
                     92: <<functions>>
                     93: <<statements>>
                     94: END {
                     95:   <<END>>
                     96: }
                     97: @ \section{The opcode tables}
                     98: The numeric codes for all the MIPS opcodes are described in three
                     99: tables in the MIPS book on page~A-87.
                    100: Normal opcodes are six bits, and appear in the [[opcode]] field of the
                    101: instruction.
                    102: Two opcodes [[special]] and [[bcond]] stand for several instructions.
                    103: These instructions are decoded by checking the bit-pattern in the
                    104: [[funct]] and [[cond]] fields of the instructions, respectively.
                    105: 
                    106: The tables show which opcodes correspond to which bit-patterns.
                    107: For example, the [[slti]] corresponds to an [[opcode]] value of octal~12.
                    108: The table headed [[opcode]] gives the mnemonics for all six-bit patterns
                    109: in the [[opcode]] field.
                    110: The [[special]] table shows patterns for the [[funct]] field, used with
                    111: the [[special]] opcode.
                    112: The [[bcond]] table shows five-bit patterns for the [[cond]] field,
                    113: used with the [[bcond]] opcode.
                    114: In all tables, stars ([[*]]) stand for unused fields.
                    115: 
                    116: Each table is terminated with a blank line.
                    117: <<opcodes table>>=
                    118:                            opcode
                    119: special        bcond   j       jal     beq     bne     blez    bgtz
                    120: addi   addiu   slti    sltiu   andi    ori     xori    lui
                    121: cop0   cop1    cop2    cop3    *       *       *       *
                    122: *      *       *       *       *       *       *       *
                    123: lb     lh      lwl     lw      lbu     lhu     lwr     *
                    124: sb     sh      swl     sw      *       *       swr     *
                    125: lwc0   lwc1    lwc2    lwc3    *       *       *       *
                    126: swc0   swc1    swc2    swc3    *       *       *       *
                    127: 
                    128:                            special
                    129: sll    *       srl     sra     sllv    *       srlv    srav
                    130: jr     jalr    *       *       syscall break   *       *
                    131: mfhi   mthi    mflo    mtlo    *       *       *       *
                    132: mult   multu   div     divu    *       *       *       *
                    133: add    addu    sub     subu    and'    or      xor     nor
                    134: *      *       slt     sltu    *       *       *       *
                    135: *      *       *       *       *       *       *       *
                    136: *      *       *       *       *       *       *       *
                    137: 
                    138:                            bcond
                    139: bltz   bgez    *       *       *       *       *       *
                    140: *      *       *       *       *       *       *       *
                    141: bltzal bgezal  *       *       *       *       *       *
                    142: *      *       *       *       *       *       *       *
                    143: 
                    144: 
                    145: @ The instructions codes for Coprocessor 1 (floating point)
                    146: are takin from page B-28 of the Mips book.
                    147: <<opcodes table>>=
                    148:                            cop1
                    149: add_fmt        sub_fmt mul_fmt div_fmt *       abs_fmt mov_fmt neg_fmt
                    150: *      *       *       *       *       *       *       *
                    151: *      *       *       *       *       *       *       *
                    152: *      *       *       *       *       *       *       *
                    153: cvt_s  cvt_d   *       *       cvt_w   *       *       *
                    154: *      *       *       *       *       *       *       *
                    155: c_f    c_un    c_eq    c_ueq   c_olt   c_ult   c_ole   c_ule
                    156: c_sf   c_ngle  c_seq   c_ngl   c_lt    c_nge   c_le    c_ngt
                    157: 
                    158: @
                    159: Now we have to deal with reading these tables, and extracting the
                    160: information stored therein.
                    161: First of all, for each mnemonic [[$i]] we store the corresponding bit
                    162: pattern (as an integer, [[code]]) in the array [[numberof[$i] ]].
                    163: Then, we store the type of the mnemonic (ordinary [[OPCODE]], 
                    164: [[SPECIAL]], [[BCOND]], of [[COP1]]) in the array [[typeof[$i] ]].
                    165: Finally, we store inverse (a map from type and bit pattern to mnemonic)
                    166: in the [[opcode]] array.
                    167: <<store opcode information>>=
                    168: if ($i != "*") {
                    169:        numberof[$i] = code
                    170:        typeof[$i] = type
                    171:        opcode[type,code] = $i
                    172: } else {
                    173:        opcode[type,code] = "reserved"
                    174: }
                    175: @ The types are just constants set at the beginning.
                    176: <<BEGIN>>=
                    177: OPCODE = 1 ; SPECIAL = 2 ; BCOND = 3 ; COP1 = 4
                    178: @ We determine the type by scanning the header word that precedes
                    179: each table.
                    180: Once we see the appropriate table header, we set one of [[opcodes]],
                    181: [[specials]], and [[bconds]], so that determining the type is easy:
                    182: <<set [[type]]>>=
                    183: type = OPCODE * opcodes + SPECIAL * specials + BCOND * bconds + COP1 * cop1s
                    184: @ Seeing the right table header causes us to set the right variable.
                    185: We also remember the line number, because we use the positions of later
                    186: lines to help extract the bit patterns from the table.
                    187: <<statements>>=
                    188: NF == 1 && $1 == "opcode" {
                    189:        startline = NR
                    190:        opcodes = 1
                    191:        next
                    192: }
                    193: NF == 1 && $1 == "special" {
                    194:        startline = NR
                    195:        specials = 1
                    196:        next
                    197: }
                    198: NF == 1 && $1 == "bcond" {
                    199:        startline = NR
                    200:        bconds = 1
                    201:        next
                    202: }
                    203: NF == 1 && $1 == "cop1" {
                    204:        startline = NR
                    205:        cop1s = 1
                    206:        next
                    207: }
                    208: @ Any time we see a blank line, that ends the appropriate table.
                    209: <<statements>>=
                    210: NF == 0 {opcodes = 0; specials = 0; bconds = 0; cop1s = 0
                    211:        <<blank line resets>>
                    212: }
                    213: @ Here is the code that actually extracts the bit patterns from
                    214: the opcode tables.
                    215: The code is the same for each of the three tables.
                    216: 
                    217: The [[insist_fields(8)]] issues an error message and returns false (0)
                    218: unless there are exactly 8 fields on the input line.
                    219: <<statements>>=
                    220: opcodes || specials || bconds || cop1s {
                    221:        if (!insist_fields(8)) next
                    222:        <<set [[type]]>>
                    223:        major = NR - startline - 1              # major octal digit from row
                    224:        for (i=1; i<= NF; i++) {
                    225:                minor = i-1                     # minor octal digit from column
                    226:                code = minor + 8 * major
                    227:                <<store opcode information>>
                    228:        }
                    229: }
                    230: @ \section{The instruction fields}
                    231: Now that we've dealt with the opcodes, we'll handle other fields of
                    232: the instruction.
                    233: This table tells us the position of each field within the word,
                    234: so that if we know a bit-pattern for each field, we can assemble
                    235: all the fields into an instruction.
                    236: 
                    237: Not all fields are used in all instructions.
                    238: Later we'll have a table that indicates exactly which fields are used in
                    239: which instructions.
                    240: For now, we just list the fields and their positions with the
                    241: understanding that some fields will overlap.
                    242: 
                    243: The table is taken from the MIPS book, page A-3.
                    244: The numbers are the numbers of the starting and ending bit positions,
                    245: where 0 is the least and 31 the most significant bit.
                    246: The names are exactly those used in the book except [[op']] has been
                    247: substituted for [[op]] since [[op]] is a reserved word in ML.
                    248: 
                    249: If a field is signed, we put a [[+]]~sign as the first character
                    250: of its name.
                    251: The sign information is used only in decoding (I think).
                    252: <<opcodes table>>=
                    253:                        fields
                    254: op' 26 31
                    255: rs 21 25
                    256: rt 16 20
                    257: +immed 0 15
                    258: +offset 0 15
                    259: base 21 25
                    260: target 0 25
                    261: rd 11 15
                    262: shamt 6 10
                    263: funct 0 5
                    264: cond 16 20
                    265: <<floating point load/store fields>>
                    266: <<floating point computation fields>>
                    267: 
                    268: @ From page B-5.  Most fields are the same as the CPU instruction formats.
                    269: <<floating point load/store fields>>=
                    270: ft 16 20
                    271: @ From page B-6.  Many fields are reused from earlier specifications.
                    272: The computational instructions all have a one bit in position 25.
                    273: Instead of trying to insert special code to handle that, we cheat on
                    274: it by making that bit part of the format, and cheating on the format.
                    275: Thus:
                    276: <<floating point computation fields>>=
                    277: fmt 21 25
                    278: fs 11 15
                    279: fd 6 10
                    280: <<write format info>>=
                    281: print "val S_fmt = 16+0"
                    282: print "val D_fmt = 16+1"
                    283: print "val W_fmt = 16+4"
                    284: 
                    285: @ The setup for the fields is similar to that used for the opcodes.
                    286: <<statements>>=
                    287: NF == 1 && $1 == "fields" {
                    288:        startline = NR
                    289:        fields = 1
                    290:        <<write format info>>
                    291:        next
                    292: }
                    293: <<blank line resets>>=
                    294: fields = 0
                    295: <<statements>>=
                    296: fields {
                    297:        if (!insist_fields(3)) next
                    298:        fieldname = $1;  low = $2; high = $3
                    299:        <<look for sign in [[fieldname]] and set [[signed]]>>
                    300:        fieldnames[fieldname]= 1        # rememeber all the field names
                    301: 
                    302:        <<write to standard output a function to convert bit-pattern to pair>>
                    303:        <<write to [[decode]] a function to extract field from pair>>
                    304: 
                    305: }
                    306: <<look for sign in [[fieldname]] and set [[signed]]>>=
                    307: if (substr(fieldname,1,1)=="+") {
                    308:        signed = 1
                    309:        fieldname = substr(fieldname,2)
                    310: } else {
                    311:        signed = 0
                    312: }
                    313: @
                    314: The idea is that for each of these fields, we want to write a function
                    315: that will take an integer argument and shift it by the right amount.
                    316: Since we have to represent the 32-bit quantities as pairs of integers,
                    317: we actually use two functions, one for the high half and one for the low.
                    318: So, for example, for the [[rd]] field we will produce two function definitions,
                    319: [[rdHI]] and [[rdLO]].
                    320: 
                    321: The awk function [[function_definition]] is used to compute ML function
                    322: definitions.
                    323: It takes as arguments the name of the function and the number of arguments
                    324: to that function.
                    325: The arguments are numbered [[A1]], [[A2]], et cetera.
                    326: 
                    327: The functions themselves are all tedious combinations of ands and shifts.
                    328: At one time I had convinced myself that this worked.
                    329: <<write to standard output a function to convert bit-pattern to pair>>=
                    330: if (low >= 16) {
                    331:        printf "%s", function_definition(fieldname "LO",1); print "0"
                    332: } else {
                    333:        printf "%s", function_definition(fieldname "LO",1)
                    334:         printf "andb(lshift(A1,%d),65535)\n", low
                    335: }
                    336: if (high < 16) {
                    337:        printf "%s", function_definition(fieldname "HI",1); print "0"
                    338: } else {
                    339:        printf "%s", function_definition(fieldname "HI",1)
                    340:         printf "lshift(A1,%s)\n", mlnumber(low - 16)
                    341: }
                    342: @ The inverse operation is
                    343: to extract a bit pattern from a pair.
                    344: We'll want that if we ever care to decode instructions.
                    345: This time, the function to extract e.g.\ field [[rd]] from a pair
                    346: is the function [[THErd]] applied to that pair.
                    347: 
                    348: The functions work first by extracting from the low part, then
                    349: from the high part, and adding everything together.
                    350: If the field is signed, we make the value negative if it is too high.
                    351: <<write to [[decode]] a function to extract field from pair>>=
                    352: printf "%s", function_definition("THE" fieldname,2) > decode
                    353: if (signed) printf "let val n = " > decode
                    354: <<print expression for unsigned value>>
                    355: if (signed) {
                    356:        printf "in if n < %d then n else n - %d\nend\n",
                    357:                2**(high-low), 2**(high-low+1) > decode
                    358: }
                    359: 
                    360: <<print expression for unsigned value>>=
                    361: if (low >= 16) {
                    362:        printf "0" > decode
                    363: } else {
                    364:         printf "andb(rshift(A2,%d),%d)", low,
                    365:                        (2**(min(15,high)-low+1)-1) > decode
                    366: }
                    367: printf " + " > decode
                    368: if (high < 16) {
                    369:        printf "0\n" > decode
                    370: } else {
                    371:         printf "rshift(andb(A1,%d),%s)\n", (2**(high-16+1)-1),
                    372:                        mlnumber(low - 16) > decode
                    373: }
                    374: @ ML uses a strange minus sign ([[~]] instead of [[-]]), 
                    375: so we print numbers that might be negative like this:
                    376: <<functions>>=
                    377: function mlnumber(n, s) {
                    378:        if (n<0) s = sprintf("~%d", -n)
                    379:        else s = sprintf("%d", n)
                    380:        return s
                    381: }
                    382: @ For reasons best known to its designers, awk has no [[min]] function.
                    383: <<functions>>=
                    384: function min(x,y){
                    385:        if (x<y) return x
                    386:        else return y
                    387: }
                    388: @ \section{The list of instructions and their formats}
                    389: This is the section that tells which fields are used in what instructions,
                    390: and in what order the fields appear.
                    391: The information is from Appendix A
                    392: of the MIPS book and should be proofread.
                    393: 
                    394: To cut down on the number of ML functions generated, we can comment out
                    395: instructions with a [[#]] in the first column.
                    396: This means that no code will be generated for the instruction, and
                    397: it won't appear in the [[structure Opcodes]].
                    398: <<opcodes table>>=
                    399:                        instructions
                    400: add rd rs rt
                    401: addi rt rs immed
                    402: addiu rt rs immed
                    403: addu rd rs rt
                    404: and' rd rs rt
                    405: andi rt rs immed
                    406: beq rs rt offset
                    407: bgez rs offset
                    408: bgezal rs offset
                    409: bgtz rs offset
                    410: blez rs offset
                    411: bltz rs offset
                    412: bltzal rs offset
                    413: bne rs rt offset
                    414: break
                    415: div rs rt
                    416: divu rs rt
                    417: j target
                    418: jal target
                    419: jalr rs rd
                    420: jr rs
                    421: lb rt offset base
                    422: lbu rt offset base
                    423: lh rt offset base
                    424: lb rt offset base
                    425: lhu rt offset base
                    426: lui rt immed
                    427: lw rt offset base
                    428: lwl rt offset base
                    429: lwr rt offset base
                    430: mfhi rd
                    431: mflo rd
                    432: mthi rs
                    433: mtlo rs
                    434: mult rs rt
                    435: multu rs rt
                    436: nor rd rs rt
                    437: or rd rs rt
                    438: ori rt rs immed
                    439: sb rt offset base
                    440: sh rt offset base
                    441: sll rd rt shamt
                    442: sllv rd rt rs
                    443: slt rd rs rt
                    444: slti rt rs immed
                    445: sltiu rt rs immed
                    446: sltu rd rs rt
                    447: sra rd rt shamt
                    448: srav rd rt rs
                    449: srl rd rt shamt
                    450: srlv rd rt rs
                    451: sub rd rs rt
                    452: subu rd rs rt
                    453: sw rt offset base
                    454: swl rt offset base
                    455: swr rt offset base
                    456: syscall
                    457: xor rd rs rt
                    458: xori rt rs immed
                    459: <<floating point instructions>>
                    460: 
                    461: 
                    462: @ We define only those floating point instructions we seem likely to need.
                    463: To distinguish them as floating point we append an f to their names.
                    464: <<floating point instructions>>=
                    465: add_fmt fmt fd fs ft
                    466: div_fmt fmt fd fs ft
                    467: lwc1 ft offset base
                    468: mul_fmt fmt fd fs ft
                    469: neg_fmt fmt fd fs
                    470: sub_fmt fmt fd fs ft
                    471: swc1 ft offset base
                    472: c_seq fmt fs ft
                    473: c_lt fmt fs ft
                    474: @
                    475:  Here is a terrible hack to enable us to construct branch on coprocessor~1
                    476: true or false.
                    477: We will use [[fun bc1f offset = cop1(0,offset)]] and
                    478:        [[fun bc1t offset = cop1(1,offset)]].
                    479: <<floating point instructions>>=
                    480: cop1 rs rt offset
                    481: @
                    482: 
                    483: 
                    484: @ For each instruction, we define an ML function with the appropriate
                    485: number of arguments.
                    486: When that function is given an integer in each argument,
                    487: it converts the whole thing to one MIPS instruction, represented as an
                    488: integer pair.
                    489: 
                    490: The implementation is a bit of a grubby mess.
                    491: Doing the fields is straightforward enough, but
                    492: for each mnemonic we have to do something different based
                    493: on its type, because each type of opcode goes in a different
                    494: field.
                    495: Moreover, for mnemonics of type [[SPECIAL]], [[BCOND]], and [[COP1]] we
                    496: have to generate [[special]], [[bcond]], and [[cop1]] in the [[op']] field.
                    497: Finally, we have to do it all twice; once for the high order
                    498: halfword and once for the low order halfword.
                    499: <<compute function that generates this instruction>>=
                    500:        printf "%s", function_definition(opname, NF-1)
                    501:        printf "("      # open parenthesis for pair
                    502:        for (i=2; i<= NF; i++) {
                    503:                if (!($i in fieldnames)) <<bad field name>>
                    504:                printf "%sHI(A%d)+", $i, i-1
                    505:        }
                    506:        if (typeof[opname]==OPCODE) {
                    507:                printf "op'HI(%d)", numberof[opname]
                    508:        } else if (typeof[opname]==SPECIAL) {
                    509:                printf "op'HI(%d)+", numberof["special"]
                    510:                printf "functHI(%d)", numberof[opname]
                    511:        } else if (typeof[opname]==BCOND) {
                    512:                printf "op'HI(%d)+", numberof["bcond"]
                    513:                printf "condHI(%d)", numberof[opname]
                    514:        } else if (typeof[opname]==COP1) {
                    515:                printf "op'HI(%d)+", numberof["cop1"]
                    516:                printf "functHI(%d)", numberof[opname]
                    517:        } else <<bad operator name>>
                    518:        printf ", "
                    519:        for (i=2; i<= NF; i++) {
                    520:                if (!($i in fieldnames)) <<bad field name>>
                    521:                printf "%sLO(A%d)+", $i, i-1
                    522:        }
                    523:        if (typeof[opname]==OPCODE) {
                    524:                printf "op'LO(%d)", numberof[opname]
                    525:        } else if (typeof[opname]==SPECIAL) {
                    526:                printf "op'LO(%d)+", numberof["special"]
                    527:                printf "functLO(%d)", numberof[opname]
                    528:        } else if (typeof[opname]==BCOND) {
                    529:                printf "op'LO(%d)+", numberof["bcond"]
                    530:                printf "condLO(%d)", numberof[opname]
                    531:        } else if (typeof[opname]==COP1) {
                    532:                printf "op'LO(%d)+", numberof["cop1"]
                    533:                printf "functLO(%d)", numberof[opname]
                    534:        } else <<bad operator name>>
                    535:        printf ")\n"
                    536: @
                    537: Setup is as before.
                    538: <<statements>>=
                    539: NF == 1 && $1 == "instructions" {
                    540:        startline = NR
                    541:        instructions = 1
                    542:        next
                    543: }
                    544: <<blank line resets>>=
                    545: instructions= 0
                    546: <<statements>>=
                    547: instructions && $0 !~ /^#/ {
                    548:        opname = $1
                    549: 
                    550:        <<compute string displayed when this instruction is decoded>>
                    551: ########       gsub("[^a-z']+"," ")   ### ill-advised
                    552: 
                    553:        <<compute function that generates this instruction>>
                    554: }
                    555: 
                    556: @ \paragraph{Decoding instructions}
                    557: When we've decoded an instruction, we have to display some sort of
                    558: string representation that tells us what the instruction is.
                    559: Ideally we should display either just what the assembler expects,
                    560: or perhaps just what dbx displays when asked about actual instructions
                    561: in memory images.
                    562: 
                    563: For now, we just give the mnemonic for the instruction, followed
                    564: by a description of each field (followed by a newline).
                    565: The fields are described as name-value pairs.
                    566: 
                    567: We rely on the fact that for a field e.g.\ [[rd]], the string
                    568: representation of the value of that field is in [[Srd]].
                    569: <<compute string displayed when this instruction is decoded>>=
                    570: temp = "\"" opname " \""
                    571: for (i=2; i<=NF; i++) {
                    572:        temp = sprintf( "%s ^ \"%s = \" ^ S%s", temp, $i, $i)
                    573:        if (i<NF) temp = sprintf("%s ^ \",\" ", temp)
                    574: }
                    575: displayof[opname]=temp " ^ \"\\n\""
                    576: 
                    577: @ The implementation of the decoding function is split into several parts.
                    578: First, we have to be able to extract any field from an instruction.
                    579: Then, we have to be able to decode four kinds of opcodes:
                    580: [[OPCODE]]s, [[BCOND]]s,  [[SPECIAL]]s, and [[COP1]]s.
                    581: The main function is the one that does ordinary opcodes.
                    582: The others are auxiliary.
                    583: <<write out the definitions of the decoding functions>>=
                    584: printf "%s", function_definition("decode",2) > decode
                    585: print "let" > decode
                    586:   <<write definitions of integer and string representations of each field>>
                    587:   <<write expression that decodes the [[funct]] field for [[special]]s>>
                    588:   <<write expression that decodes the [[cond]] field for [[bcond]]s>>
                    589:   <<write expression that decodes the [[funct]] field for [[cop1]]s>>
                    590: print "in" > decode
                    591:   <<write [[case]] expression that decodes the [[op']] field for each instruction>>
                    592: print "end" > decode
                    593: @ We give each field its own name for an integer version, and its name
                    594: preceded by [[S]] for its string version.
                    595: These values are all computed just once, from the arguments to the
                    596: enclosing function ([[decode]]).
                    597: <<write definitions of integer and string representations of each field>>=
                    598: for (f in fieldnames) {
                    599:        printf "val %s = THE%s(A1,A2)\n", f, f  > decode
                    600:        printf "val S%s = Integer.makestring %s\n", f, f  > decode
                    601: }
                    602: @ The next three functions are very much of a piece.
                    603: They are just enormous [[case]] expressions that match up integers
                    604: (bit patterns) to strings.
                    605: The fundamental operation is printing out a decimal value and a string
                    606: for each opcode:
                    607: <<if [[name]] is known, display a case for it>>=
                    608: if (name != ""  && name != "reserved") {
                    609:        <<print space or bar ([[|]])>>
                    610:        disp = displayof[name]
                    611:        if (disp=="") disp="\"" name "(??? unknown format???)\\n\""
                    612:        printf "%d => %s\n", code, disp > decode
                    613: }
                    614: @ Cases must be separated by vertical bars.
                    615: We do the separation by putting a vertical bar before each case except
                    616: the first.
                    617: We use a hack to discover the first; we assume that code~0 is always
                    618: defined, and so it will always be the first.
                    619: <<print space or bar ([[|]])>>=
                    620: if (code!=0) printf " | "  > decode # hack but it works
                    621: else printf "   " > decode
                    622: <<write expression that decodes the [[funct]] field for [[special]]s>>=
                    623: print "val do_special ="  > decode
                    624: print "(case funct of" > decode
                    625: for (code=0; code<256; code++) {
                    626:        name = opcode[SPECIAL,code]
                    627:        <<if [[name]] is known, display a case for it>>
                    628: }
                    629: printf " | _ => \"unknown special\\n\"\n" > decode
                    630: print "   ) " > decode
                    631: <<write expression that decodes the [[cond]] field for [[bcond]]s>>=
                    632: print "val do_bcond =" > decode
                    633: print "(case cond of" > decode
                    634: for (code=0; code<256; code++) {
                    635:        name = opcode[BCOND,code]
                    636:        <<if [[name]] is known, display a case for it>>
                    637: }
                    638: printf " | _ => \"unknown bcond\\n\"\n" > decode
                    639: print "   ) " > decode
                    640: <<write expression that decodes the [[funct]] field for [[cop1]]s>>=
                    641: print "val do_cop1 =" > decode
                    642: print "(case funct of" > decode
                    643: for (code=0; code<256; code++) {
                    644:        name = opcode[COP1,code]
                    645:        <<if [[name]] is known, display a case for it>>
                    646: }
                    647: printf " | _ => \"unknown cop1\\n\"\n" > decode
                    648: print "   ) " > decode
                    649: @ The major expression is a little more complicated, because it has to
                    650: check for [[special]], [[bcond]], and [[cop1]] and handle those separately.
                    651: <<write [[case]] expression that decodes the [[op']] field for each instruction>>=
                    652: print "(case op' of" > decode
                    653: for (code=0; code<256; code++) {
                    654:        name = opcode[OPCODE,code]
                    655:        if (name=="special") {
                    656:                <<print space or bar ([[|]])>>
                    657:                printf "%d => %s\n", code, "do_special" > decode
                    658:        } else if (name=="bcond") {
                    659:                <<print space or bar ([[|]])>>
                    660:                printf "%d => %s\n", code, "do_bcond" > decode
                    661:        } else if (name=="cop1") {
                    662:                <<print space or bar ([[|]])>>
                    663:                printf "%d => %s\n", code, "do_cop1" > decode
                    664:        } else <<if [[name]] is known, display a case for it>>
                    665: }
                    666: printf " | _ => \"unknown opcode\\n\"\n" > decode
                    667: print "   ) " > decode
                    668: @ \section{testing}
                    669: One day someone will have to modify the instruction handler so that
                    670: it generates a test invocation of each instruction.
                    671: Then the results can be handed to something like adb or dbx and we can
                    672: see whether the system agrees with us about what we're generating.
                    673: 
                    674: @ \section{Defining ML functions}
                    675: The awk function [[function_definition]] is used to
                    676: come up with ML function definitions.
                    677: It takes as arguments the name of the function and the number of arguments
                    678: to that function, and returns a string containing the initial part of
                    679: the function definition.
                    680: Writing an expression following that string will result in a complete
                    681: ML function.
                    682: 
                    683: If we ever wanted to define these things as C preprocessor macros instead,
                    684: we could do it by substituting [[macro_definition]].
                    685: I'm not sure it would ever make sense to do so, but I'm leaving the
                    686: code here anyway.
                    687: <<functions>>=
                    688: function function_definition(name, argc,  i, temp) {
                    689:        if (argc==0) {
                    690:                temp = sprintf("val %s = ", name)
                    691:        } else {
                    692:                temp = sprintf( "fun %s(", name)
                    693:                for (i=1; i< argc; i++) temp = sprintf("%sA%d,", temp,i)
                    694:                temp = sprintf( "%sA%d) = ", temp, argc)
                    695:        }
                    696:        return temp
                    697: }
                    698: <<useless functions>>=
                    699: function macro_definition(name, argc,  i, temp) {
                    700:        if (argc==0) {
                    701:                temp = sprintf("#define %s ", name)
                    702:        } else {
                    703:                temp = sprintf( "#define %s(", name)
                    704:                for (i=1; i< argc; i++) temp = sprintf("%sA%d,", temp,i)
                    705:                temp = sprintf( "%sA%d) ", temp, argc)
                    706:        }
                    707:        return temp
                    708: }
                    709: @ \section{Handling error conditions}
                    710: Here are a bunch of uninteresting functions and modules
                    711: that handle error conditions.
                    712: <<bad operator name>>=
                    713: {
                    714:        print "unknown opcode", opname, "on line", NR > stderr
                    715:        next
                    716: }
                    717: <<bad field name>>=
                    718: {
                    719:        print "unknown field", $i, "on line", NR > stderr
                    720:        next
                    721: }
                    722: <<BEGIN>>=
                    723: stderr="/dev/tty"
                    724: <<functions>>=
                    725: function insist_fields(n) {
                    726:        if (NF != n) {
                    727:                print "Must have", n, "fields on line",NR ":", $0 > stderr
                    728:                return 0
                    729:        } else {
                    730:                return 1
                    731:        }
                    732: }
                    733: @ \section{Leftover junk}
                    734: Like a pack rat, I never throw out anything that might be useful again later.
                    735: <<junk>>=
                    736: function thetype(n) {
                    737:        if (n==OPCODE) return "OPCODE"
                    738:        else if (n==SPECIAL) return "SPECIAL"
                    739:        else if (n==BCOND) return "BCOND"
                    740:        else if (n==COP1) return "COP1"
                    741:        else return "BADTYPE"
                    742: }
                    743: <<decoding junk>>=
                    744: for (f in fieldnames) {
                    745:        printf "^ \"\\n%s = \" ^ Integer.makestring %s\n",f,f > decode
                    746: }
                    747: printf "^\"\\n\"\n" > decode

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.