1946 Stop Order Tags

Lecture 39

12th August 1946

Code and Control -- IV. Examples of a Three-Address Code and the Use of 'Stop Order Tags'

C.N. Mooers (Naval Ordnance Laboratory)

This is an interesting paper proposing a novel method of organising control, and providing detailed pieces of coding to illustrate it. Calvin Mooers was from the Naval Ordnance Laboratory, who were considering building their own machine (see Lecture 44). In fact it was never built. He says that the idea derives from a time in November 1945, when he was unhappy with some of the features of the "only coding schemes then extant, devised by John von Neumann", and decided to try designing his own. There is an implication from the lecture that he had been working out detailed coding sequences for various problems for some time, using postulated instruction sets.

The coding examples show that he was aware of the concept of a full subroutine, with a set of parameters and a return address. He was considering the problem of a subsequence cycling through sets of data in full generality, with a particular list of elements being characterised by its start address, finishing address, and the gap between successive elements. At a time before the invention of the index register, such a subsequence (or the sequence calling it) would have to alter each of its instructions which contained a general store access. Remember that unless a store access was to an address known before the program was run, the instruction containing it would have to have the appropriate address field altered before it was obeyed, to the current value of the appropriate dynamic address. However where the access was in a cycle working through store in fixed intervals the alteration could be more simply achieved by adding the appropriate interval at the correct position of the address field in the instruction (e.g. if the address field was at the bottom of the instruction and the interval was s, just adding s to the instruction!) The process was correspondingly more complicated if the instruction code was 3-address rather than 1-address.

However Mooers had a particular interest in very general two dimensional data structures, for example integrating over an area with irregular boundaries (at its worst with every horizontal line and every vertical line having its own start and finish point). This would demand complicated house-keeping to traverse the area.

The computer they were considering had a 50-bit word, providing both floating and fixed point, with a 8192 word store requiring a 13-bit address. Mooers' idea was that the top four bits of every word and instruction in store would be reserved for a "stop-order tag". In the machine architecture an extra component would be added on the path between the control component and the memory, the "sentinel". Every time that an instruction requested a word to be fetched from store, the sentinel would apply a check for consistency between the instruction's stop-order tag and the fetched word's, and if it found a match a jump would be made to some specified address. Clearly this provides a neat method of breaking out of a cycle iterating through store, by tagging terminal elements. Tags on numbers would be ignored during calculations, and storing of calculations would leave the existing tag unaltered. So a stop-order tag once set at a particular address (by a special order) would remain there until explicitly reset, i.e. the tag would be a property of the program's data organisation and not the data values themselves.

A simple form of the idea would be to associate with the sentinel a 15-word table of addresses (any element of which could be set at any time by the program). The consistency rule would be that a match occurred if, for each position where there was a 1 in the instruction's stop-order tag, there was a 1 in the corresponding position of the fetched word's. (An all-zero tag on an instruction would presumably suppress any attempt to match.) On a match a jump would be made to the address in the sentinel's table corresponding to the 4-bit value of the instruction's tag. This would for example allow a particular address to carry out up to 4 independent roles, e.g. the last element in a matrix could be tagged as a row terminator, a column terminator, an array terminator, and even the last in a set of arrays! In practice there is a common problem where the instruction accessing the next element in a cycle, and hence causing the match which terminates the cycle, is followed by one or more orders which complete the standard processing required for each element. These instructions then have to be duplicated at the address jumped to, to complete the standard processing for the terminal element.

The actual proposal is in fact more complicated, there being a set of control words associated with the sentinel. Each control word is a coded request that a jump should be made to a given address if a specified subfield of the store word's tag has a given value and (independently) if a specified subfield of the instruction's tag has a given value. The text of the talk implies that there could be more than one of these control words active at the same time, and so presumably the already sophisticated check for a match associated with a control word would have to be repeated for each active control word.

The lecture goes on to give two examples of subsequences to illustrate the idea:

Product-Sum subsequence, summing the products of two vectors
Matrix Multiplication subsequence, which calls the product-sum subsequence

A simple 16-order subset of a hypothetical code for a 3-address computer is defined, which includes an order providing both conditional and unconditional jumps in the conventional form (i.e. as well as using stop-order tags to specify control jumps).

The product-sum subsequence uses 7 parameters to specify full generality: the address of the start of each vector, the gap between successive elements for each, where to send the answer, the return address, and a specification of the stop-order criteria for one of the vectors (i.e. tag field and tag value). The subsequence uses 10 instructions plus 3 local numbers, but it excludes any instructions that might need to be added to set up initial values of addresses in the code. If the subsequence was used in a completely 'static' program, then these addresses could be substituted when preparing the program, but if e.g. the subsequence was being used on two different pairs of vectors in the same program, then the substitution of values into address fields of the subsequence would have to be done either directly by the calling sequence, or by a prelude of the subsequence which used extra words as required to hold the values of the variable parameters being passed from calling sequence to subsequence.

The matrix multiplication sequence, for say Z = X * Y, assumes that each matrix is held in store in a contiguous set of rows, with the elements of a row in successive locations. It uses 9 parameters: the starts of matrices X, Y and Z, the number of rows and columns of X and Y (using 4 words, but of course two should be equal), the entry point for the product-sum subsequence and the return address (from the matrix multiplication subsequence).

The matrix multiplication uses 77 words, 26 of which are parameters and working space. This excludes the product-sum subsequence. It assumes (in principle) that all 9 parameters have been placed in its local data space by the calling sequence. It includes the instructions required to alter all its own store access instructions, and indeed those of the product-sum subsequence.

Mooers admits that matrix multiplication, despite the large instruction sequence, is not a convincing demonstration of his idea. The boundary structure of the data is still relatively simple, and in fact the subsequence, unlike the product-sum subsequence, does not assume that there are tags already set in sensible places to control obvious traversals of the matrices. In fact the subsequence does not even place tags to control the inner and outer loops of the matrix multiplication. It only uses one temporarily to control a loop setting the tags in the last column of X to terminate the loop each time the product-sum subsequence is called.

The number of instructions is around double what one might expect for a simple 3-address code with indexing, and that is partly due to having to set or increment those fields in instructions which corresponded to addresses which could vary during the program run. However there are special instructions in the order code to help set fields in instructions. Mooers does not give any analysis of how the sequence might compare with e.g. using simple counters and conditional tests to control the loops in a similar machine without stop-order tags. However there may also be a considerable element of clumsy programming, and there are certainly errors. Verzuh recorded (when writing up his day's shorthand notes in the evening) that he was not bothering to transcribe the detailed sequence, as it had emerged during the day that the program could be written in much less space using the "type A" code given in lecture 10. (This was one of the codes proposed for the EDVAC -- very similar, but without the stop-order codes.)

Mooers goes on to make some observations on the heavy dominance of 'housekeeping' operations relative to the actual arithmetic instructions carrying out the actual calculations required in the algorithm. He points out that in terms of the number of instructions obeyed the balance is much better (depending on the size of the arrays). This is because the inner loops have a much better ratio, especially since setting of addresses can be done outside of loops, and incrementing addresses in instructions is no worse than incrementing indexes -- better when you can increment up to three addresses with one instruction in a 3-address machine!

For example the inner loop of the product-sum subsequence has only 4 instructions, and it is very much a matter of definition of 'housekeeping' whether any of them should be classified as such -- Mooers in fact classifies the last two as housekeeping. As called by the matrix multiplication subsequence, the inner loop of the product-sum subsequence is as follows, where S1, S2 and S3 are working values used by the product-sum subsequence:

Multiply XR_k and YC_k and place the result in location S1, where XR is the current row of X and YC the current column of Y, so this forms the product of the k^th element. If the tags match jump to some given address (and exit the loop, remembering to duplicate (2) for the last element!)
Add S1 to location S2 which holds the product-sum so far
Add S3 to instruction (1), where S3 has been preset to have just a 1 in the address field position of XR_k and length-of-row-in-Y in the address field position of YC_k.
Jump to (1)

Of course for a machine without stop-order tags, instruction (4) would have to be replaced by two instructions, say decrementing a counter, and then testing it for 0.

Example of Coding

Using a pseudo code local to this description, and slightly rearranging the subsequence as given in the lecture, the full product-sum subsequence is as follows.

The parameters are:

A	location of start of 1st vector
B	location of start of 2nd vector
C	interval between addresses in 1st vector
D	interval between addresses in 2nd vector
E	specification of the stop-order tag on the last position of either the 1st or 2nd vector
F	address of the answer
G	address to which control is to be returned

Labelling the subsequence using symbolic store addresses S1 to S13, the subsequence code sits in S4 to S13, with entry point S4, and with S1 to S3 the local variables.

The instructions shown are each held in one store word, and it is assumed that each symbolic parametric value A to G has either been filled in directly into the appropriate field of the instruction during program preparation (if known before run time) or has been filled in by the sequence calling the subsequence.

S1		This holds the value of the current product
S2		This holds the accumulated sum of products
S3		A dummy instruction with C in the same address field as that of A in instruction S6 and D in the same address field as that of B in instruction S6. If added to S6 this word will therefore cause the next execution of S6 to refer to the next pair of elements in the respective vectors
S4		Entry point: Clear S2
S5		Set sentinel register 8 with the appropriate code to jump to instruction S10 if any instruction generates a match with the instruction having value 1 in position 1000 (specified by 8-bit group (1000,1000)) and any of its data transfers from store having the match specified by E
S6		(with stop order tag 1000) S1 = A * B
S7		S2 = S2 + S1
S8		S6 = S6 + S3 i.e. increment the addresses for both vectors in the multiply instruction S6
S9		Jump to S6 and repeat for the next product
S10		S2 = S2 + S1 (add in the final product to the product sum)
S11		Clear the sentinel register 8
S12		F = S2
S13		Jump back to the calling sequence at G

The matrix multiply sequence, which is fully general, will have to set all the values for A to G explicitly in the code.

There are two instructions which facilitate this, as follow, using af1 and af2 to refer to the locations specified by the respective address fields 1 and 2 in an instruction, and AF3 to represent a literal constant in address field 3.

Set the contents of af2 to the contents of af1 shifted up AF3 bits
Set the bit positions of af1 which are specified in AF3 into the corresponding positions of af2, without altering any other bit positions of af2. AF3 contains a pair of 6-bit values specifying the starting bit and the finishing bit of the required bit positions.

So if the required value to be substituted into an address field of an instruction is sitting in store somewhere: unless it is going into the least significant bits of the instruction, (a) can be used to shift it into position; then (b) can be used to copy it into the appropriate field of the instruction.

The matrix multiplication sequence can be broken up into various sections as summarised below, using square brackets to indicate the number of words/instructions used.

[8] Words holding the formal parameters, except for the return address. These, and the return address in the last instruction, must be set by the sequence calling the matrix multiplication subsequence.
[4] Miscellaneous constants/workspace, in particular the two control tags used.
[14] A set of working words to hold pieces of instruction for setting various fields in both subsequences, i.e. after using type (a) instructions and in preparation for using type (b) instructions.
[21] This is the start of the subsequence instructions. This section sets up some of the unchanging fields of the product-sum subsequence and the Matrix Multiplication sequence itself. Then there is a cycle to plant the tags to be used by the product-sum subsequence, into the last column of X, i.e. into the terminal elements of the set of rows to be multiplied by columns of Y.
To control the little tag-setting cycle an independent tag is placed in the last element of the last column (i.e. the last element in the matrix) to trigger the end of the little cycle. The tag is then removed, as it is not needed again. This is because, although the element could potentially trigger the end of each inner cycle of the matrix multiplication, it is only accessed in the product-sum subsequence. So a conventional test is used to detect the ends of the inner and outer cycles.
[19] A further set of instructions works out some key addresses of the matrices to be used later, and completes the setting of the parametric fields A to G in the product-sum sequence, ready for the first call. Finally it calls the product-sum sequence for its first call, having set the return address as the following instruction ('IOC').
[5] The first instruction, "IOC", is the starting point of both the inner and outer cycle of the matrix multiplication. This section adjusts the parametric fields for the next product-sum call in the inner cycle, i.e. to the start of the next row of X and the same column of Y. It then sets up a test to see if in fact it has already done the last product-sum for this inner cycle, and if not, jumps to the product-sum subsequence, still with IOC set as the return address.
[5] This similarly adjusts the parametric fields for the first product-sum call for the next outer cycle, i.e. starting on the first row of X again and the next column of Y. It then tests if it has already done the last column, and if not jumps to the product-sum subsequence, again with IOC still set as the return address.
[1] Having completed the matrix multiplication, a jump is made back to the sequence calling the matrix multiplication subsequence. The calling sequence plants the address in this instruction directly, rather then setting the return address as a 9th formal parameter held in section (1). Note that the setting of the values in the result matrix is done direct from the product-sum subsequence, as controlled by its parameter F

Note that there are some unintentional imperfections in the example. Mooers admits that the code in fact leaves the transpose of the correct answer in Z! Also there is no removal of the stop-order tags set in section (4).

The example might have been improved, and been a better advertisement for stop-order tags, if the specification of the matrix multiplication subsequence had demanded that there were already tags on the last row and column of (at least) the first input matrix, of a form passed as parameters, as in the product-sum subsequence. Even though they could not be used to terminate the inner and outer cycles directly (as they are only accessed from the product-sum sequence) it would be easy enough to do a dummy access on return from each product-sum call to check for termination.