Tutorial on Adding Functions to the Icon Run-Time System Kenneth Walker Department of Computer Science, The University of Arizona 1.__Introduction Operations in the Icon run-time system are written in a spe- cial language called RTL [1]. This language is based on C but contains features to help the Icon compiler produce better exe- cutable code. While RTL was designed for the compiler, the Icon interpreter is now also built from the same run-time system coded in RTL. Anyone adding functions to either the compiler or inter- preter writes them in RTL. RTL code is translated into C by the program rtt. The actions performed by rtt vary depending on whether it is performing a translation for the compiler or for the interpreter. A transla- tion for the interpreter is triggered by rtt's -x option and is reasonably straightforward. RTL source files have a .r suffix (or .ri for included files). The translated C files for the interpreter have a suffix of .c with an x prepended to the file names to distinguish them from C files produced for the compiler. For example, the output of translating fmisc.r is xfmisc.c. These files are then compiled with a C compiler. See the Makefile that builds the interpreter for details of the entire translation process. This tutorial introduces the features of RTL that are most often used in writing built-in functions. The same features are also used to create new keywords and operators, but adding these operations to Icon require changes to the translator, icont, and is beyond the scope of this tutorial. 2.__Making_Existing_C_Functions_Available_to_Icon One way to add functionality to the run-time system is to make C functions from existing libraries available to Icon as built-in functions. If the arguments and result types of these C functions consist only of integers, doubles (or floats), and strings, then adding them to Icon's run-time system is simple. A number of the Icon functions in the standard run-time system are created from IPD173 - 1 - October 23, 1991 corresponding C functions. Several of these are used as examples in this section. In order to use existing C functions in Icon built-in func- tions, it is usually necessary to convert values between Icon representations and C representations. The simplest such built-in functions perform three actions: + Convert arguments from Icon representations to C representations. + Call the C function with the converted arguments. + Convert the function result value from a C representation to an Icon representation. Four parts of RTL are used when adding a C function to the Icon run-time system: a function header, type conversions, an abstract type computation, and embedded C code. Consider the existing built-in function sin(). It simply makes the C sin() function available to Icon. A function header introduces the definition of the built-in function. The function header starts with function. This is fol- lowed by a specification of the length of the function's possible result sequences (this is only used by the compiler, but is required by rtt even when translating RTL for the interpreter). The sin() function produces exactly one result. A result sequence of length one is indicated by {1}. As explained below, the Icon version of sin() terminates with an error message if the argument is of the wrong type. It might seem that error termination should be considered a result sequence of length zero. However, a zero-length result sequence is used only to indicate failure (in the Icon sense). Possible error termination must be ignored when determining the length of a result sequence. The next item in the header is the name of the Icon function. It may be the same as the C function or it may be different. The final item is a comma-separated list of parameters enclosed in parentheses. The only form of parameter needed here is a simple identifier. The header for the sin() function is function{1} sin(x) The parameter x can be any Icon value. Even if it is of type real, it has a different representation than the corresponding C type, double. Type checking and conversion must be done before the C function is called with the value. RTL has several features that perform type checking and conversions. The conversion that is needed here has the form IPD173 - 2 - October 23, 1991 cnv:C-type(parameter) where C-type is C_integer, C_double, or C_string. C_integer indicates a C integer type (it is long rather than int on 16-bit machines), C_double indicates the C double type, and C_string indicates a null-terminated character string. This form of cnv performs the standard Icon conversions between strings, integers, reals, and csets, but the result is a C value. The conversion establishes a new scope for the parameter's identifier. Within this scope the identifier refers to the C value. The conversion does not appear by itself; it is used in an RTL if-then statement to check the success of the conversion. Typically, the success is checked with a negated conversion, as it is in the conversion for sin(): if !cnv:C_double(x) then runerr(102, x) runerr() is an executable statement that results in a run-time error message; it automatically handles error conversion (that is, the conversion of an error condition to failure when &error is nonzero). The message is determined by looking up the error number in a table (see the file src/runtime/data.r). If a value is supplied, it is printed. For example, if sin() is called with a null value, the program terminates with the message Run-time error 102 numeric expected offending value: &null Because runerr() is on the execution path for the failure of the conversion, it sees the original parameter x, which is an Icon value. Abstract type computations are used to inform the compiler's type inferencing system of the effects of the function. While these effects can be complicated, particularly for functions that update Icon data structures, for the functions addressed here, they are very simple. They indicate the type of value returned by the built-in function and are of the form abstract { return icon-type } Where the icon-types likely to be needed with pre-existing C functions are null, string, integer, and real. sin() uses real. Abstract type computations are not needed for the interpreter, but are required for the compiler and should be supplied for functions that might be used in the compiler. The last feature of RTL needed for sin() is embedded C code. It takes the form IPD173 - 3 - October 23, 1991 inline { C code with extensions } inline indicates that the C code is reasonable for the compiler to put in-line in generated code. The interpreter makes no dis- tinction between this and body that is used in examples below. In RTL, the C return statement is replaced by statements that perform an Icon-style return, suspend, or fail. Some forms of these statements also do C-to-Icon type conversions. For many C functions all that is needed is a return statement that does such a conversion. These return statments have the form return C-type expr; The expr must produce a C value of the indicated type. In this example, expr is simply the call to the C sin() function: return C_double sin(x); Note that this use of x is in the scope of the type conversion and refers to a C value. Putting these pieces together and preceding them by an optional comment in the form of a string literal results in the following declaration. Note that while semicolons are needed in inline clause because it is essentially C, in the pure RTL por- tion of the code, they are optional. When in doubt, it is always safe to use semicolons in places they would be used in C. "sin(x), x in radians." function{1} sin(x) if !cnv:C_double(x) then runerr(102, x) abstract { return real } inline { return C_double sin(x); } end Sometimes a straightforward implementation of a C function as an Icon built-in function is rather un-Iconish. For example, the C getenv function returns NULL if the requested environment vari- able does not exist. It makes sense for an Icon version of the function to fail under these circumstances rather than return the null value. The actual implementation of the getenv function is shown below. This function can either produce a result or fail, so it has a maximum result sequence length of 1 and a minimum IPD173 - 4 - October 23, 1991 result sequence length of 0. This is represented by {0,1} in the function header. Note the use of the extented C feature fail in the inline code. "getenv(s) - return contents of environment variable s." function{0,1} getenv(s) if !cnv:C_string(s) then runerr(103,s) abstract { return string } inline { register char *p; if ((p = getenv(s)) != NULL) /* get environment variable */ return C_string p; else /* fail if not in environment */ fail; } Another useful form of type conversion is def. This is used for supplying a default value. It has the form def:C-type(parameter, value) def is similar to cnv. However, if parameter is null, the conver- sion does not fail. Instead, value is used for the converted value. The following built-in function makes the Unix sleep() function available to Icon. It uses a default interval of 1 second. This function uses a return statement with no type conversion. The return statement's expression must evaluate to an Icon value, which is represented in C as a descriptor (see [2]). nulldesc is descriptor that always contains the null value. function{1} sleep(n) if !def:C_integer(n, 1) then runerr(101, n) abstract { return null } inline { sleep((unsigned)n); return nulldesc; } end It is always a good idea to cast a C_integer to the desired C integer type in a C function call as demonstrated here. IPD173 - 5 - October 23, 1991 The C sleep() function returns an integer. This could be used as the result of the Icon built-in function, but the following will not work in the compiler (it will work in the interpreter but is a bad programming practice): inline { return sleep((unsigned)n); } This is because of a coding convention required by the compiler. This requirement is that expressions on return statements do not have side effects. If an Icon program ignores the result of the built-in function, the compiler may eliminate the entire return expression from in-line code, removing the side effect along with the result. An auxiliary variable must be used in cases like this. The Icon built-in function atan() provides an example of a more complex function. It calls either the C function atan() or atan2() depending on whether a second argument is given. It uses a type check of the form is:icon-type(parameter) This type check performs no conversions. It simply checks to see if parameter is of the desired type. Like the type conversions, it is used in an if statement. The code for atan() is "atan(x,y) -- x,y in radians; if y is present, produces atan2(x,y)." function{1} atan(x,y) if !cnv:C_double(x) then runerr(102, x) abstract { return real } if is:null(y) then inline { return C_double atan(x); } if !cnv:C_double(y) then runerr(102, y) inline { return C_double atan2(x,y); } end IPD173 - 6 - October 23, 1991 3.__Writing_Operations_From_Scratch Sometimes new Icon functions are needed that have no counter- parts in a C library, or existing C library functions are too low-level to make reasonable Icon built-in functions. In these cases, more substantial built-in functions must be written. Some facilities are best implemented as generators. Even if there are C functions that provide such facilities, they are, of course, not generators. As explained in Section 2, RTL includes an Icon-style suspend statement that is used to create genera- tors. Like the Icon suspend expression, execution continues after the statement if the operation is resumed. The suspend statement comes in the same forms as the return statement. A result sequence of arbitrary length is indicated by {*} in the opera- tion header. The following code implements the seq() function. "seq(i, j) - generate i, i+j, i+2*j, ... ." function{*} seq(from, by) if !def:C_integer(from, 1) then runerr(101, from) if !def:C_integer(by, 1) then runerr(101, by) abstract { return integer } body { /* * Produce error if by is 0, i.e., an infinite sequence of from's. */ if (by == 0) { irunerr(211, by); errorfail; } /* * Suspend sequence, stopping when largest or smallest integer * is reached. */ while ((from <= MaxLong && by > 0) || (from >= MinLong && by < 0)) { suspend C_integer from; from += by; } fail; } end Note the use of body instead of inline. When it is used, the com- piler does not try to put the code in-line, but instead calls a IPD173 - 7 - October 23, 1991 function that contains the body code. Whether inline or body is used is a value judgement made by the programmer who writes the operation. The choice only affects the compiler, not the inter- preter. irunerr() is a function that acts like runerr(), except that its second argument is a C integer rather than a descriptor. In addition, it does not automatically convert errors into Icon failure when &error is nonzero. All it does is return to the code following its call. The Icon failure is accomplished by error- fail. errorfail acts like fail, but tells the compiler that failure only occurs when error conversion is enabled. errorfail is built into runerr() because runerr() is used extensively, but errorfail supplied as an separate feature of RTL for flexibility when runerr() is not appropriate. Note that there is also a drunerr() function, similar to irunerr(), that takes a C double as a value argument. Some operations are polymorphous and take different actions based on the type of an argument. The type_case statement of RTL can be used for this. It similar to a C switch statement (or an Icon case expression), but selection is based on the type of an argument rather than a value. The built-in function type() makes use of it: "type(x) - return type of x as a string." function{1} type(x) abstract { return string } type_case x of { string: inline { return C_string "string"; } null: inline { return C_string "null"; } integer:inline { return C_string "integer"; } real: inline { return C_string "real"; } cset: inline { return C_string "cset"; } file: inline { return C_string "file"; } procedure:inline { return C_string "procedure"; } list: inline { return C_string "list"; } table: inline { return C_string "table"; } set: inline { return C_string "set"; } record: inline { return BlkLoc(x)->record.recdesc->proc.recname; } co_expression:inline { return C_string "co-expression"; } default: runerr(123,x); } end See [2] for an explanation of the record entry. Icon's storage management system includes a garbage collector. This places constraints on other parts of the run-time system. When a garbage collection occurs, it must be able to locate all IPD173 - 8 - October 23, 1991 references to objects in the string and block regions, and to co-expressions. Operations must be careful to keep all such references visible at times when storage allocations (including malloc()s on some systems) might occur. It is important to note that storage allocations may occur while an operation is suspended. RTL has a tended declaration that insures that a variable is visible to garbage collection. For example, a block pointer is tended by the declaration tended union block *variable; This declaration is used by the key() function while generating the keys of a table. The details of this code that deal with data structure manipulations are beyond the scope of this tutorial. See [2] for an explanation of the code that loops through the table. See [1] for an explanation of the abstract type computa- tion. The implementation of key() function is "key(t) - generate successive keys (entry values) from table t." function{*} key(t) if !is:table(t) then runerr(124, t) abstract { return store[type(t).key] } inline { tended union block *ep; # tended since function suspends register int i; for (i = 0; i < TSlots; i++) { for (ep = BlkLoc(t)->table.buckets[i];ep != NULL;ep = ep->telem.clink) suspend ep->telem.tref; } fail; } end Other types of pointers, including pointers into the string region, can be tended. See [1] for a complete list along with declaration syntax to accomplish the tending. 4.__Additional_Notes RTL files may contain definitions of ordinary C functions to use as support routines for operations. Several RTL features may IPD173 - 9 - October 23, 1991 be used in these otherwise ordinary C functions. These features include tended declarations, and type checking and conversions. Note that the usual convention is to put C functions in separate files from the operation definitions even when they use RTL features. rtt and the Icon compiler contain hard-coded information about Icon's type system. It is currently very difficult to augment this information. Attempting to add new types to Icon is not recommended until this problem is solved. Information needed to write functions more complicated than those presented here can be found by studying [1], [2], and the source code for the run-time system. In particular, for writing many functions, it is necessary to have some understanding of Icon's data structures and its storage management system. Note, however, that malloc() and free() may be used as they are in ordinary C functions. 5.__Building_Icon_with_New_Functions A new function is put in a file in the directory src/runtime. If a new file is created for the function, the Makefile must be updated. Note that the Makefile may be copied from elsewhere dur- ing the Icon configuration process; be sure to update the origi- nal if another configuration will be done. The interpreter requires an entry for the new function in src/h/fdefs.h. The entry consists of a call to the macro FncDef() with the function name as the first argument and the number of parameters to the function as the second argument. When the new function is in place and fdefs.h is updated, recompile icont and the runtime system. The runtime system con- sists of the interpreter iconx and, if Icon includes the com- piler, rt.db and a link library. For the compiler, iconc need not be recompiled; this is because iconc dynamically determines from rt.db what functions are available in the link library. References 1. K. Walker, An Implementation Language for Icon Run-Time Routines, The Univ. of Arizona Icon Project Document IPD79, 1992. 2. R. E. Griswold and M. T. Griswold, The Implementation of the Icon Programming Language, Princeton University Press, 1986. IPD173 - 10 - October 23, 1991