Variables

18
Jan

Introduction

Ada variables are objects. When developing an Ada compiler, one must definitively take that in account.

Note that it must be possible for the compiler to handled any variable that are defined in any context. This is because it needs to be capable of running all the available operations on all the constant variables in order to better optimize the output of the compilation. This includes static and dynamic variables (dynamic variables are those that have a non fixed size.)

Note that "all the operations" includes all the integrity tests that the compiler is expected to apply to all computations.

Basic Type Libraries

In order to allow for such operations to work in the compiler as well as the resulting output, we want to make use of libraries, one per basic type. Each library defines a set of functions to handle such a basic type, test data integrity, conversions, etc.

Integer Example

Here I present an example of such a definition for the Integer type.

Value

Since I want to support integers of pretty much any size, the definition of the Integer type is dynamic meaning that the size is specified and a pointer to a buffer is used for the actual value.

The imposed limit is there to avoid exhausting the memory too quickly, but that can be really large (i.e. 1024 bits by default, could grow to 1Mb per number though, thus some 8 million bits values.)

1
 

The inline_value flag is used to know whether the value fits a long integer or not. If it doesn't fit, then a buffer is allocated and the value is defined in that buffer. Note that the integer and long_integer types as defined in this definition would be fixed types (i.e. with inline_value = true.)

Note that the value or long_value do not have defaults. Whether the value is initialized is determined using the status flags (see below.) As far as I know, it is not possible to remove the initialization flag since the type and whether it is always initialized to a default value are two completely separate things. Plus, there are cases where even types with a default value do not get assigned their default value automatically.

Note that the size parameter is expected to be constant in most cases. In those cases, it can be optimized later (removed from the structure for that particular type.) For dynamic types, that can vary is size, the size parameter must stay.

Values are used for:

  • The current value of the data type
  • The invalid value (used when an error occurs)
  • The initialization value of the data type (optional)
  • The minimum value of the data type (may be implied)
  • The maximum value of the data type (may be implied)
  • The modulo of the data type (optional)

For any one type, all of its values are defined in the same way. However, a dynamic integer may still have dynamic entries with different sizes.

Status

Computations on these objects may generate errors. Those are reported directly in the object and a function can be called to deal with the status accordingly. For instance, one may want to ignore errors that wrap around values.

Overflow

The overflow flag is set whenever the result of a computation is too large to fit in the value. Too large means larger than the maximum value of the data type.

1
 

By default, generate an exception on an overflow. Set overflow_mask to false to avoid the overflow exceptions.

The result of the computation is saved in the invalid value field for access by the exception.

Underflow

This is the same as the Overflow when the result of a computation is lower than the minimum value.

1
 

Note that in some circumstances we cannot easily determine whether the result is too small or too large (in the event the value may have gone around.) In that case, both the Overflow and Underflow flags are set.

The underflow_mask works like with overflow_mask.

Precision

The precision flag is masked by default. It is used to signal errors in computations that fit value but generate losses of bits. Computations will be optimized whenever precision flagging is not required.

A simple example of precision loss is:

1
 

As we can see, 3 is odd. In this expression, we divide by 3 by 2 which returns 1 in integer math. Now when we apply the opposite operation: 1 x 2 we do not get 3 again. This is a loss of precision. In most cases, this is not turned on since that is the expected behavior of an integer.

1
 

Precision math with integer checks divisions, modulus, and shifts.

Fixed points also checks shifts that happen when copying a value between two different types and thus bits may get lost.

Dynamic Constraints

There can be constraints to the values defined on a type other than just the minimum and maximum values. Constraints are functions attached to a type. For instance, you could define a type as a value from -100 to +100 that only accepts even numbers. The constraint can be written as follow:

1
 

The pragma attaches the even_only() function to the my_percent type. Every time the value of a variable of my_percent type is set, the function is called. If the function returns true, the value is accepted. If the function returns false, the set fails with an exception being raised.

Note that the function is also used for the succ() and pred() functions. The succ() implementation is something like this:

1
 

The value'type'without_constraint is a sub-type of value'type which is the same type without the constraints.

1
 

The range constraint looks something like this:

1
 

Note that we also implement the range constraint in this way. The pointer may be null if there is no range constraint (i.e. a modulo that perfectly fits a multiple of 8 bits.)

We also want to support pragmas to create a type_succ() and type_pred() so it is effectively a lot faster than the default function in the event next/previous values have large gaps in between or quite complicated validity checks that would slow down the system quite much otherwise. In our example, the type_succ() would do result := result + 2; instead of the default + 1 which would always be refused the first time.

1
 
Note

It would be possible to declare functions with "the correct name and parameter type". Yet, those are reserved to the user so I think it is preferable to have pragmas. Plus, having functions with the correct type sounds like magic (uncontrolled behavior.)

Dynamic

In order to accommodate large numbers, we want to have a large number library. This library handles numbers of arbitrary length (i.e. 4096 bits numbers). When handling small values with those numbers, we want to be able to use small buffers. In this case, we want to be able to resize the value buffer as the value changes with time.

We want to have a flag to ensure that we know that an integer is dynamic. The flag prevents the compiler from optimizing out the size parameter of the variable when a variable is dynamic. Note that the dynamism may be optimized out when it is possible to tell that it is not required (i.e. if the value is defined between -256 and +256, it fits in 16 bits and thus we do not need to use the dynamic size.)

1
 

By default values are considered static (pre-allocated with a static size.) However, this flag may be set to true at run time.

Constant flag

It is very important for the optimizer to know whether a value is a constant. A constant can completely be optimized out since it does not change. Not only that, it can be converted to its machine code in the final program (watch out for necessary debug information.)

1
 

Type

All values have a type definition. All values have a pointer to their type (which may be implied in some cases. In those cases, the type will somehow be saved in the debug data of the program.)

1
 

The type includes all the data that does not need to be defined in the value directly (i.e. precision mask, range, etc.)

In the other field definitions, the keyword [TYPE] is noted when the type defines that value and not directly the object. This means the value is the same for all the variables.

Types are themselves composed of objects which means that some basic types need to be declared internally to get started (i.e. integer, boolean, etc.)

Array of Values

When dealing with an array of values, many of the flags are common to all the values. These can be kept in the array definition as one flag. However, some flags need to be duplicated for each value, specifically, the initialization flag needs to be duplicated for each value. Although just one bit is enough, it still represents (array size / 8) bytes rounded up.

Note that the overflow, underflow and precision error flags should be repeated too, especially if the mask is false (i.e. no error generated) so it can be checked on a per value basis. On the other hand, the overflow and underflow values can be computed on the fly so we can have them once in the array definition and when a value generate such an error, the global flag gets turned on.

Arrays also include boundaries that may be defined on multiple layers in case of a multi-dimension array. In this case we want an array of dimensions defining the boundaries of each dimension. The array itself can be one large buffer of data (opposed to array of pointers repeated for all the dimensions except the last which is the array of data.) Note that in this case we may want to forbid dynamically allocated values, although that would just require an array of size and access pointers for each value.

1
 

Note that in this case we can easily pass a slice or sub-array by creating another array header and defining the boundaries one layer down.

In case of an array of arrays, each sub-array has to be defined the same way as the top array. This is important if we want to be able to call a function with a sub-array as in:

1
 

Examples of value handling

The definition of value comes along with a large set of functions. With quite heavy optimization (i.e. knowing that some variables are constants of a known static size,) the result may end up being one assembly language instruction.

Addition

The operation in this statement:

1
 

is handled with a call as follow:

1
 

Notice that the name of the function is an internal name (starts with an underscore.) The parameter a is an out only and b and c are in.

The function can then handle all cases as required. The last parameter is used to know whether exceptions should be raised before returning. This is important since internally many operations should not raise exceptions until later.

The implementation of _integer_lib_add() is very complex. It is assumed, however, that the type of a, b and c are all the same since it is not otherwise possible to write the statement we first presented. Say a and b are of type MySmallInt and c is of type MyOtherInt, then you would have to write this to do the addition:

1
 

In other words, you do not need to cast within the add function itself (the cast is another operation altogether.)

The first test in the function can be the type:

1
 

There are limits to this because we want to be able to add constant types and those can appear as internal constant types such as in:

1
 

In that case, the compiler has to cast 3 to MySmallInt in some automatic fashion1 (3 actually uses the special integer type called Universal Integer.) Yet, again, this should be done before calling the add function.

Now that we tested the type, we want to do the addition. Assuming that we always have access to a type that is larger than the largest type the user is given access to, we can write the following, very much simplified, addition

1
 

Here we assume that the + is the actual "processor level" addition (in ta := tb + tc). Notice that we first convert b and c to a new type that supports the addition without overflows or underflows. Then do the operation and compare the results.

There cannot be any loss of precision so we do not check that flag.

Division

The basics for the division are the same as the addition. The main difference is that the division itself needs to be checked for a remainder:

1
 

Here we get the quotient and the remainder. If the remainder is not zero, then we have a precision error. Whether the precision error raises an exception depends on the precision_mask flag as stated before.

  • 1. It is important to note that it is not a true cast because 3 may not be a legal value for the type named MySmallInt and yet the addition may very well be valid.