Thursday, September 25, 2008

ASSEMBLER ALGORITHM AND DATA STRUCTURES

Our simple assembler uses two major internal data structure; the Operation Code table (OPTAB) and the Symbol Table (SYMTAB). OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents. SYMTAB is used to store values(addresses)assigned to labels.

We also need a Location counter LOCCTR. This is a variable that is used to help in the assignment of addresses. LOCCTR is initialized to the beginning address specified in the START statement . After each source statement is processed, the length of the assembled instructions or date area to be generated is added to LOCCTR. Thus whenever we reach a label in the source program , the current value of LOCCTR gives the address to be associated with that label.

The Operation code Table must contain the mnemonic operation code and its machine language equivalent. In more complex assembles , this table also contains information about instruction format and length.During Pass 1 , OPTAB is used to look up and validate opearation coded in the source program. In pass 2 , it is used to translate the operation codes to machine language. Actually , in our simple SIC assembler , both of these processes could be done together in either PASS 1 or PASS 2. However , for a machine that has instructions of different lengths, we must search OPTAB in the first pass to find the instructions length for incrementing LOCCTR. Likewise , we must have the information from OPTAB is PASS 2 to tell us which instruction format to use in assembling the instruction ,and my peculiarities of the object code instruction. We have chosen to retain this structure in the current discussion because it is typical of most real assembles.

OPTAB is usually organized as a hash table , with mnemonic operation code as the key. The hash table organization is particularly appropriate , since it provides fast retrieval with a minimum of searching . In most cases , OPTAB is a static table – that is , entries are not normally added to or deleted from it. In such cases it is possible to design a special hashing function or other data structure to give optimum performance for the particular set of keys being stored. Most of the time , however , a general –purpose hashing method is used.

The Symbol table (SYMTAB) includes the name and value (address) for each label in the source program , together with flags to indicate error conditions ( e.g,. a symbol defined in two different places ). This table may also contain other information about the data area or instruction labeled – for example , its type or length. During Pass 1 of the assembler , labels are entered intoSYMTAB as they are encounterd in the source program , along with their assigned addresses ( form LOCCTR). During Pass 2 , symbols used as operands are looked up in SYMTAB to obtain the addresses to be inserted in the assembled intructions.

SYMTAB is usually organized as a hash table for efficiency or insertion and retrieval. Since entries are rarely deleted from this table , efficiency of deletion is not an important consideration. Because SYMTAB is used havily throughout the assemble , care should be taken in the selection of a hashing function. Programmers often select many labels that have similar characteristics – for example , labels that start or end with the same characters (like LOOP1 , LOOP2 ) or are of the same length . It is important that the hashing fuction used perform well with such no-random keys. Division of the entire key by a prime table length often gives good result.

It is possible to both passes of the assembler to read the original source program as input. However , there is certain information (such as location counter values and error flags for statements) that can or should be communicated between the two passes. For this reason , Pass usually writes an INTERMEDIATE FILE that contains each source statement together with its assigned address , error indicators , etc ,. This file is used as the input to Pass 2.This working copy of the source program can also be used to retain the results of certain operations that may be performed during Pass 1 , so these need not be performed again during Pass2. Similarly , pointers into OPTAB and SYMTAB may retained for each operation code and symbol used . This avoids the need to repeat many of the table-searching operations.We assume for simplicity that the source lines are written in a fixed format with fields LABEL , OPCODE , and OPERAND . If one of these fields contains a characted string that represents a number , we denote its numeric value with the preficx # ( for example , # [OPERAND]).

FUNCTIONS :

1)Convert mnemonic operations codes to their machine language equivalents.
2)Convert symbolic operands to their equivalent machine addresses
3)Build the machine instructions in the proper format
4)Convert the data constants specified in the source program into their internal machine representations.
5)Write the object program and the assembly listing.


All of these functions except number 2 can easily be accomplished by sequential processing of the source program, one line at a time.The translation of addresses ,however, presents a problem . Consider the statement

10 1000 FIRST STL RETADR 141033

This instruction contains a forward reference – that is , a reference to a label(RETADR) that is defined later in the program. If we attempt to translate the program line by line , we will be unable to process this statement because we do not know the address that will be assigned to RETADR. Because of this , most assemblers make two passes over the source program. The first pass does little more than scan the source program for lebel definitions and assign addresses. The second pass performs most of the actual translation previously described.

In addition to translating the instructions of the source program , the assembler must process statement called assemble directives . These statements are not translated into machine instructions (although they may have an effect on the object program). Instead , they provide instructions to the assembler itself. Examples of assembler directives are statements like BYTE and WORD , which direct the assembler to generate constants as part of the object program , and RESB and RESW , which instruct the assembler to reserve memory locations without generating date values. The other assembler directives in our sample program are START , which specifies the starting memory address for the object program , and END , which marks the end of the program.

Finally , the assembler must write the generated object code onto some output device. This object program will later be loaded into memory for execution. The simple object program format we use contains three types of records : Header , Text and End. The header record contains the program name , starting address, and length. Text records contain the translatd (i.e.machine code ) instructions and data of the program , together with an indication of the addresses where these are to be loaded. The End record marks the end of the object program and specifies the address in the program where execution is to begin.(This is taken from the operand of the program’s ENDstatement. If no operand is specified , the address of the first executable instruction is used)

The formats we use for these records are as follows .The details of the formats(column numbers etc ,..) are arbitrary , however the information contained in these records must be present(in some form) in the object program.

Header Record :

Col.1 H
Col.2-7 Pogram Name
Col.8-13 Starting address of object program (hexadecimal)
Col.14-19 Length of object program in bytes (hexadecimal)
Text Record :
Col.1 T
Col.2-7 Starting address for object code in this record
Col.8-9 Length of object code in this record in bytes
Col.10-69 Object cde , represented in hexadecimal
End Record :

Col.1 E
Col.2-7 Address of first executable indtruction in object program


To avoid confusion , we have used the term column rather than byte to refer to positions within object program records.This is not meant to imply the use of any particular medium for the object program. We can now give a general description of the functions of the two passes od our simpler assembler.

PASS 1 ( Define Symbols ) :
1)Assign addresses to all statement in the program
2)Save the values (addresses) assigned to all labels for use un Pass 2
3)Perform some processing of assembler directives.


PASS 2(assembler instructions and generate object program) :
1)Assemble instructions ( translating operations codes and looking up addresses) 2)Generate data values defined by BYTE ,WORD etc ,.
3)Perform processing of assembler directives not done during Pass 1.
4)Write the object program and the assembly listing.

Directives:

START - Specify name and starting address for the program.
END - Indicate the end of the source program and specify the first
executable instruction in the program.
BYTE - Generate character or hexadecimal constant , occupying as many
bytes as needed to represent the constant.
WORD - Generate one-word integer constant.
RESB - Reserve the indicated number of bytes for a data area.
RESW – Reserve the indicated number of words for a data area.
The program contains a main routine that reads records from an input device (identified with device code F1) and copies them to an output device . This main routine calls subroutine RDREC to read a record into a buffer and subroutine WRREC to write the record from the buffer to the output device. Each subroutine must transfer the record one character at a time because the only I/O instructions available are RD and WD. The buffer is necessary because the I/O rated for the two devices, such as a disk and a slow printing terminal , may be very different. The end of each record is marked with a null character (hexadecimal 00). If a record s longer than the length of the buffer (4096 bytes),only the first 4096 bytes are copied. The end of the file to be copied is indicated by a zero-length record. When the end of file is detected, the program writes EOF on the output device and terminates by executing an RSUB instructions. We assume that this program was called by the Operating system using a JSUB instruction ; thus , the RSUB will return control to the operating system.

Assemblers :Indroduction

The design and implementation of assemblers. There are certain fundamental functions that any assembler must perform, such as translating mnemonic operation codes to their machine language equivalents and assigning machine addresses to symbolic labels used by the programmer. If we consider only these fundamental functions , most assemblers are very much like.
Beyond this most basic level, however , the features and design of an assembler depend heavily upon the source language it translates and the machine language it produces. One aspect of this dependence is , of course, the existence of different machine instruction formats and codes to accomplish(for example) an ADD operation. As we shall see, there are also many subtler ways that assemblers depend upon machine architecture. On the other hand , there are some features of an assembler language ( and the corresponding , assembler) that have no direct relation to machine architecture – they are , in a sense, arbitrary decisions made by the designers of the language.
We begin by considering the design of a basic assembler for the standard version of our Simplified Instructional Computer (SIC). Its a most fundamental operations performed by a typical assembler , and describes common ways of accomplishing these functions. The algorithms and data structures that we describe are shared by almost all assemblers. Thus this level of presentation gives us a starting point from which to approach the study of more advanced assembler features. We can also use this basic structures as a framework from
Which to begin the design of an assembler for a completely new or unfamiliar machine.
We examine some typical extensions to the basic assembler structure that might be dictated by hardware considerations. We do this by discussing an assembler fo the SIC/XE machine ( XE means Xtra Equipment). Although this SIC/XE assembler certainly does not include all possible hardware-dependent features , it does certain some of the ones most commonly found in real machines. The principles and techniques should be easily applicable to other computer.
We briefly consider some examples of actual assemblers for real machines. We focus on the most intresting features that are introduced by hardware or software design decisions.

Assemblers translate mnemonic code into machine language. Using some functions , assembler directives , etc ,…