31V4 Assignment 2 PDP subset assembler Date: 26 April 1989 Author: J.G.Harston, (C)HCE 1. Requirements Specification A program to assemble in two passes a text file info PDP11 machine code, recognising a subset of the instructions and a subset of the addressing modes. 2. Design Details The method I chose was to simply go through the text file twice assembling it into the instructions. The only difference on each pass was to be that in the first pass, any errors resulting from a label not being defined would be ignored and a dummy value used instead. Also, the machine code listing would only be produced on the second pass. This simplifies sizing the instructions as exactly the same routines can be used in the first pass and the second pass. The dummy value I chose to use was the program counter value at the time of assembling the instruction. Another design decision I took was to modify the specification for a label. The brief stated a label was to be up to 6 characters of the set {0..9, A..Z, a..z}. I modified this to having to start with a letter, making it easier to notice numerical constants. A number would begin with a digit, while a label would begin with a letter. Also, as all spaces and tabs are ignore, except where they are used as separators, spaces at the beginning of the line are ignored. The design brief stipulated that some things started at the beginning of the line, which the program recognises, but is slightly more forgiving about extra spaces. The whole program is split into 6 separate modules: Master : The master module that controls everything Pass_1 : Calls Parser for the whole file for pass one. Pass_2 : Calls Parser for pass 2, then displays a symbol table. Parser : Reads in, then parser the current line, and calls the appropriate routine to deal with it. Provides all the general text line processing routines. Assem : This module does all the PDP11 machine code related things. An advantage of this is that Assem only needs changing if the program is to be changed to assemble for a different processor. Symbols: This module holds the list of labels used by the assembler. MODULE Master This module just ties all the other modules together. It just calls Pass_1 and Pass_2 to do the two passes. MODULE Pass_1 PROCEDURE pass1 (Exported) in: nothing out: nothing effect: This module just calls the Parser to do the first pass of the source file. MODULE Pass_2 PROCEDURE pass2 (Exported) in: nothing out: nothing effect: This resets the input file pointer to the beginning of the file, then calls the Parser to do the second pass of the assembly. Finally, it calls DoListing to give a symbol table listing. PROCEDURE DoListing in: nothing out: nothing effect: This procedure displays the symbol table list. PROCEDURE disp in: lbl:Label out: nothing effect: displays the supplied label in a field of eight padded spaces. MODULE Parser This module exports almost all its's variables and types so that the Assem module can use them. Also, procedures indicated are exported. Types: txtln:ARRAY[0..11]OF CHAR This is the type of the text being looked at Symbol_Type:(.....) These are the various symbols recognised Variables: words:INTEGER This is the number of words placed into the cache by the Assem module. cache:ARRAY[0..7]OF INTEGER This is the cache where the machne code for the decoded opcode is stored. error:BOOLEAN This signal is set by Assem if a syntaxical error occurs. PC:INTEGER The location of the current instruction. text:txtln The line holding the opcode to be decoded err_txt:ARRAY[0..39]OF CHAR If an error occurs, the text of the message is placed in here for display ln_ptr:INTEGER This points to the current character in txt_buf txt_buf:ARRAY[0..79]OF CHAR This is the line buffer for the current line mesg_txt:ARRAY[0..39]OF CHAR This is amended to indicate in which field of a two parameter instruction an error occured in no_label:BOOLEAN Signals if a reference was made to an undefined label. PROCEDURE get_line in: nothing out: nothing local: ln_ptr:INTEGER, last_ch:CHAR, txt_buf:ARRAY[0..79]OF CHAR effect: Reads up to 80 characters in from the input stream into the module's global variable txt_buf. last_ch is set to the last character read. If a full line was read in this will be character 13, CR. If only the first 80 characters of a line longer than 80 characters were read, then last_ch is a character other than 13, so indicating that there are more character to be read in to get to the end of the line. This is used by the Displ_Line routine. PROCEDURE get_symbol (function) (Exported) in: nothing out: VAR txt_ln:txtln, Symbol_Type local: txt_buf:ARRAY[0..79]OF CHAR, ln_ptr:INTEGER effect: From the line buffer txt_buf, reads in a symbol, terminated by either a space, a tab, a semicolon, a colon, an equals sign, a tab or a comma. Up to 10 characters are read in, and a type is returned by cursuoryly decoding the characters. If the first character is a space, tab, end-of-file or a semicolon, then a comment is returned. A full stop indicates a pseudo command. If thelast character is a colon, it is a label definition. An equals sign is an equate. A comma signals a source field. If it is a space or a tab, then the next character after any more spaces or tabs is checked. If that is a comma, then it is a source field, otheriwse it is signalled as an opcode. It may not alwats be an opcode, but it is only decoded as one when the assembler is expecting an opcode. PROCEDURE skipspace in: nothing out: nothing local: txt_buf:ARRAY[0..79]OF CHAR, ln_ptr:INTEGER effect: Increments ln_ptr to skip any spaces and tabs in txt_buf. If the current character is not a space or a tab, then ln_ptr is not incremented. PROCEDURE Disp_Line in: zilch out: zilch local: txt_buf ln_ptr last_ch effect: Displays the line held in txt_buf and, if last_ch is not 13 (indicating that a whole line was in txt_buf) then more characters are read in and displayed, until the next character 13. PROCEDURE upper (function) in: ch:CHAR out: CHAR effect:converts the input character to upper case if it is a lower case letter. PROCEDURE Do_This_Line (Exported) in: Pass:INTEGER, line_num:INTEGER out: BOOLEAN local: txt_buf, ln_ptr effect: This reads in and scans through the current line and decodes it. On pass 2, after calling the appropriate routines, it displays the line with the line number supplies in lin_num and any information placed into the cache by Assem. (Could be improved by making a call with line_num set to one (ie the first line of the assembly) tidily reset pointers as to the state of the assembly.) PROCEDURE do_pseudo in: nothing (but looks at text) out: nothing local: PC, endflg, error, err_txt, fatal effect: decodes the text in text as a pseudo-instruction. The instructions recognised are .start and .end. If neither of these is recognised, then an error is returned by setting error and placing a message in err_txt. PROCEDURE do_equate in: nothing (but looks at text) out: nothing local: text, cache[0] effect: does an equate of the label held in text with the number fetched as the next symbol on the line. The value is stored in cache[0] for Do_This_Line to display. PROCEDURE do_label in: nothing (but looks at text) out: nothing local: text, PC effect: sets a label to be equal to the current value of the program counter, held in PC. PROCEDURE SetSymbol in: text:txtlm, endchr:CHAR, value:INTEGER out: nothing effect: sets a symbol, the name of which is in text and terminated by the character supplied in endchr (';' for labels, '=' for equates) to the value supplied in value. Before calling the Symbols module to enter the value, checks that the name starts with a letter. If it doesn't, an error is signalled. PROCEDURE FindOct (Exported) in: string:ARRAY OR CHAR, ptr:INTEGER out: ptr:INTEGER, INTEGER effect: gets an octal value from the text supplied in string, starting at the character pointed to by ptr. If the number does not start with a valid octal digit, then an error is signalled, otherwise, ptr is returned pointing to the first non-octal character in the string. PROCEDURE Assign (Exported) in: intxt:ARRAY OF CHAR out: outtxt:ARRAY OF CHAR effect: copies the string in intxt to outtxt. In effect, assigns the string in intxt into the string variable outtxt. MODULE Assem This module assembles the instructions sent to it in text, fetching any extra information needed from the line buffer by calling get_symbol in Parser. The object code is placed in the cache and words is set to the number of words returned. If any errors occur, then the error is set, and a message is placed in err_txt. PROCEDURE do_op_code in: nothing (supplied in text) out: nothing (returned in cache) effect: decodes the opcode supplied in text and calls the appropriate procedure for it. If the opcode is unrecognised, then an error is signalled. The base for the opcode is supplied to the subprocedures to add on the required fields for source, destination, etc. PROCEDURE do_2_oper in: base:INTEGER out: nothing (returned in cache) effect: decodes the rest of the line for the two parameter opcodes. PROCEDURE do_1_oper in: base:INTEGER out: nothing (returned in cache) effect: decodes the rest of the line for the one parameter opcodes. PROCEDURE do_branch in: base:INTEGER out: nothing (returned in cache) effect: decodes the rest of the line for branch opcodes. PROCEDURE find_register in: text:txtln out: INTEGER effect: decodes the text to see if it is a register. If it isn't, an error is flagged, else the register number is returned PROCEDURE check_comma in: nothing out: BOOLEAN effect: checks to see if the current character in the line buffer is a comma. If it is, it is stepped past and TRUE is returned, otherwise an error is signalled. This procesdure is used when checking that two-parameter opcodes have their second parameter. PROCEDURE do_jsr in: base:INTEGER out: nothing (returned in cache) effect: decodes the rest of the line for the jsr opcode PROCEDURE do_1_reg in: base:INTEGER out: nothing (returned in cache) effect: decodes the rest of the line for the opcodes with just a register as their parameter. This is used for the rts opcode. PROCEDURE get_src_dst in: text:txtln out: INTEGER effect: decodes the supplied text to work out what addressing mode it is, and to what register it refers to. The number returned is the value used in the source or destination field of the machine code word, ie the register plus eight times the mode. If it is a mode that uses more data after the opcode (modes 6 and 7 and the PC implied modes), then the extra data is placed into the cache in front of the opcode, and words is incremented by one to indicate this. If any non-recognition occurs, then an error message that indicates what the error most likely is is placed in err_txt and an error condition signalled by setting error to TRUE. PROCEDURE find_d_reg in: ptr:INTEGER out: INTEGER effect: decodes the text starting at the character pointed to by ptr to find a defered register reference. If text does not hold a valid defered register reference, then an error is signalled, otherwise, the register number is returned. PROCEDURE get_reg in text:txtln out: INTEGER effect: decodes the text to see if it is a register reference. If it is, then it returns the register number, otherwise, it returns -1. PROCEDURE Getvalue in: text:txtln, pntr:INTEGER out: pntr:INTEGER, INTEGER effect: decodes the supplied text, starting at the character pointed to by pntr, as a value, that is, as either an octal constant or a label reference. If it is a label reference, the label is looked for by calling the Symbols module. If the label has not been defined, then the no_label flag is set and a dummy value of PC+4 is returned. This is so that, on the first pass, all the instructions can be sized correctly. PROCEDURE match in: input:ARRAY OF CHAR, data:ARRAY OF CHAR out: INTEGER effect: scans through the text supplied in data to try and match it to the text supplied in input. The text in data is in the form: xxxxx,nnn,xxxxx, nnn,...,+ where xxxxx are the alternatives that input can match to and nnn are the octal number to return if the match succeeds. The data string is assumed to be error-free as match is only called by the program with valid values. The data string is terminated by a + entry. If no match is made, then -1 is returned. (This procedure might have been better placed in the Parser module.) MODULE Symbols This module holds the symbol table and provides all the procedures for accessing it. TYPE Label:ARRAY[0..6]OF CHAR This is the type for the labels. (Exported) PROCEDURE enter_label (Exported) in: l:Label, v:INTEGER out: nothing effect: adds the label l to the symbol table with the value v. If the label has already been defined, then the current definintion is just altered to the new value. PROCEDURE find_label (Exported) in: l:Label out: v:INTEGER, BOOLEAN effect: looks in the symbol table for the label l. If it is found, then the value is returned in v and the procedure returns TRUE, otherwise, FALSE is returned. PROCEDURE init_list (Exported) in: nothing out: n:INTEGER effect: initialises an internal pointer to the list so that repeated calls to get_label will return the text and values of successive labels in the symbol table. Returns the total number of labels in the symbol table in n. PROCEDURE get_label (Exported) in: nothing out: l:Label, v:INTEGER, BOOLEAN effect: fetches the next label and its value from the symbol table, then increments the internal pointer to the next one. If the end of the list is found, then FALSE is returned, otherwise, TRUE is returned. PROCEDURE looklbl in: lbl:Label out: equal:INTEGER, lisp_p effect: searches through the symbol table for the label lbl. If it is found, then equal is set to zero, and a pointer to the node before it is returned. Otherwise, equal is set to nonzero, and a pointer to the node before where the entry should go is returned. PROCEDURE compar in: label1,label2:Label out: INTEGER effect: compares the two supplied labels to see how they compare. If they are equal, then zero is returned. If label1 is alphanumerically smaller than label2 then -1 is returned, otherwise +1 is returned. Data Structures Used Module Symbols: The symbol table is held in a serially linked list. Each entry holds the text of the label name, the value of the label and the link to the next entry. The list has a node at the start which is ignored and is just used as a pointer to the first used node (ie, the second node). Module Parser: The current line is read and held in a buffer called txt_buf. This buffer is scanned along and decoded into the opcodes. At the end of each line, txt_buf is displayed to the screen. The words of information provided by Assem are stored into a cache called cache defined as cache:ARRAY[0..7]OF INTEGER. Also, the variable words is set to indicate how many words have been put into the cache. Test Documentation A PDP assembler program was written containing lines with errors and also correctly formed lines to test the full program. This source file is in the file TEST3.MAC. This was also used while writing Assem to test that the correct lines assembled correctly. The assembler listing, with comments as to what was being tested, is included. Also included is the assember listing of the two programs provided in [IIUSCV].