One of Many Worlds: Draft 2 of a proposal for generics in Go

N.B. This is draft version 2 of a work in progress. It is intended to help me understand the subject better, elicit comments from more knowledgeable people, and elucidate the nuances of generics in Go's context. The following proposal makes use of a syntax for ease of presentation. The syntax is, however, not intended to be the primary subject, nor is it intended to represent the complexities or possible complications of the underlying mechanisms.

Generics

In Go, there are fundamental differences between built-in types and user-defined types. When we try to come up with mechanisms that help treat both of them on a completely equal footing, they seem to stretch the language along difficult dimensions. Particularly troublesome are issues such as the following.

Built-in types (like int and float64) do not have methods; they only have a pre-defined set of operators. Users cannot redefine those operators, nor can they define new operators.
Operator overloading is not allowed, preventing user-defined types from exhibiting operator compatibility with the built-in types.
Operators are not methods themselves, and hence do not allow a symmetry between the built-in operations and the user-defined.

Rather than explore possible solutions to the above, the current proposal accepts them as facts that will not change. In other words, the inherent differences between built-in types and user-defined types are preserved, and no unification is attempted. By adding one new concept (that of type classes, see hereunder) to the language, backwards-compatibility can be maintained even as generics are introduced into the language.

This proposal considers the following points as the base requirements for a working model of generics in Go.

Enabling generics for built-in types as well as user-defined types.
Avoiding boxing/unboxing and implicit casts.
Preserving Go's (package-level) separate compilation model.
Maintaining reasonably high compilation speed.
Promoting (generated) code sharing.
Allowing for reflection of generics.

Storage Classes

Types T1 and T2 belong to the same storage class if elements of T1 can either be assigned to those of T2, or converted to T2, and vice versa. Please note that only conversion is considered, not a type assertion.

Type Classes

Types T1 and T2 belong to the same type class if the following are true:

T1 and T2 belong to the same storage class and
every operation defined on elements of T1 is also defined on those of T2, and vice versa.

As per the above definition, for instance, int32 and int64 belong to the same type class, while int64 and complex64 do not. Thus, type classes play the equivalent role of interfaces for built-in types. However, type classes do not introduce new concrete types.

Predefined Type Classes

Go language defines a set of predefined type classes for numeric types, as enumerated here.

Integer This type class encompasses the following built-in types: int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64 and byte.
Float This type class encompasses the following built-in types: float32 and float64.
Real This type class composes the type classes Integer and Float above.
Complex This type class encompasses the following built-in types: complex64 and complex128.

In addition, the set of all named interface types in a program constitutes a single type class. This class is unnamed, and is not accessible to the users; it is defined solely for the purposes of implementation.

It is not possible to define new type classes.

Inability to define new type classes may be a limitation. However, my current thinking is that the capability to define such may not be a pressing requirement. In any case, such a capability cannot be provided until the mechanisms governing the composition of type classes is unambiguously understood.

Generic Types

A type G with a type parameter T is said to be generic in T. It is denoted by G<T>. It is allowed for a generic type to be generic in more than one type. Where multiple type parameters need to be specified together, they are comma-separated. Go's generic types are reified, and are amenable to introspection at run time.

A type argument T can be any valid, assignable (within the package scope of applicability) Go type, with the exception of interface{}. The semantics of interface{} interfere with the compile-time checks that generics require. Moreover, the choice of interface{} for a type expressly avoids compile-time specification of a concrete type on the part of the user. In a sense, the compile-time mechanisms of generics constitute the conjugate space of that constituted by the run-time mechanisms of interface{}.

The semantics of generic types are dependent on the type parameter constraints (described hereunder) and actual type arguments; in particular, value types and reference types lead to different semantics.

A type parameter T can appear in the following positions (or sites):

type parameter declaration,
variable type specification,
function/method parameter type specification and
allocation.

Type Parameter Declaration

Type parameter declaration makes a new type parameter available for subsequent use within the applicable scope. Such a declaration does not introduce a new concrete type. It is illegal to declare a type parameter that is unused within that scope; similarly, it is illegal to use a type parameter that is not declared in that scope or any of its enclosing scopes. Type parameters are not shadowed. Hence, it is illegal to declare a type parameter that is already in scope, again.

A generic type G<T> with a specification for T introduces a new concrete type. This new concrete type behaves in full compliance with the rules governing normal Go types, including those for type inference, with explicit exceptions where noted.

Given a generic type G that is dependent on type parameters T₁, ..., T_m, it is possible to construct a generic type H that is the same as G with n, n<m type parameters instantiated. This results in H being generic in m-n parameters.

type HashTable<K, V> struct {
    ...
}

type HashSet<K> HashTable<K, bool>

Variable Type Specification

Using a generic type as the type of a variable requires specification of concrete types for the generic type's type parameters. The number of specified type arguments should match that in the generic type's declaration. It is illegal to leave any type parameter uninstantiated, unless such parameter is in scope from an enclosing declaration.

var h1 HashTable<string, float64>  //   Valid.
var h2 HashTable<string, T>        //   Invalid; T is not instantiated.

It is allowed to specify a type parameter as the type of a variable, as long as the type parameter has a valid declaration in the current scope.

It is illegal for a naked type parameter to itself act as a generic type. For instance, the following is illegal.

type A<T, U> struct {
    x T<U>  // Invalid.
}

Function/Method Parameter Type Specification

A function or a method can utilize type parameters available in scope. The function/method will then be generic in those type parameters utilized.

func (h *HashTable<K, V>) At(k K) V {
    ...
}

A function/method may itself be generic in one type parameter or more. This is independent of its being generic in one type parameter or more from an outer scope.

func modulus<T>(v T) float64 {
    ...
}

Allocation

In the presence of type parameters, allocation requirements may be determined at run time. The type class of the type parameter determines the allocation mechanism. Also, value types and reference types have different semantics, as described later.

type S struct {
    ...
}

var v = new(HashTable<string, S>)

An Example

The following example illustrates the above.

// T appears in the position of slice type specification.
type Vector<T> []T

// Note that T needs to be specified again in Vector<T> for it to be in scope.
// T appears in the position of method parameter type specification.
func (v *Vector<T>) Push(el T) {
    *v = append(*v, el)
}

 // U is a method-specific type parameter.
func (v *Vector<T>) Map<U>(mapper func (T) U) *Vector<U> {
    u := new(Vector<U>)  // Allocation depends on U.
    for _, el := range *v {
        u.Push(mapper(el))
    }
    return u
}

Constrained Generic Types

Since an unconstrained generic type cannot assume anything specific about the nature of the types eventually instantiated into its type parameters, the set of possible operations on the elements of such types is very small. Constraints can improve the situation by allowing a wider set of operations.

A type parameter can optionally be constrained to only those types that satisfy a particular interface. This enables all the methods from the interface to be utilized in the context of variables belonging to the applicable type parameter.

type Comparable<E> interface {
    Compare(E) int
}

type OrderedSet<E : Comparable<E>> struct {
    ...
}

The above declaration allows ordered sets to be constructed over any type whose elements are ordered according to the Compare() relation.

Similarly, a type parameter can optionally be constrained to only those types that belong to a particular type class.

type Vector<T : Integer> []T // A vector over all integer types.

The above declaration allows a generic vector type to be written with algorithms that provide operations for any integral type.

Constraints on type parameters can appear at all sites except allocation sites. A constraint on a type parameter need only be specified when it is introduced for the first time. In the example above, for instance, the methods of OrderedSet need not repeat the constraint on E.

It is illegal to specify more than one constraint for a given type parameter.

Implementation Notes

Caveat: I do not understand the current Go compilers. Hence, the following is only a hint. More knowledgeable people should be able to comment on the naïveness, the feasibility and the complexity of implementing such a scheme as what is suggested here. I have represented some of the logic in the form of a high-level pseudocode, in the hope of its making it easier to discuss the same.

The following implementation notes cover various aspects of compilation, linking and reflection.

Compilation of Generics

In Go, compilation of a generic type generates two outputs: (a) generic type metadata and (b) partially compiled form of the generic.

Generic Type Metadata

At least the following metadata is generated for a given generic type, during compilation. This information is packed into a structure, which we refer to as type handle, that is amenable to reflection.

unique identifier of the generic,
number of type parameters,
type handles of all type parameters,
function descriptors of all methods and
a pointer to the method table.

A similar type handle is generated for each type parameter. It contains at least the following metadata.

unique identifier of the type parameter,
a slot to hold the size of an instantiated type,
any constraint specified for it and
a pointer to its entry in the applicable function/method table.

In the case of functions and methods, a function descriptor is generated. As with a type handle, this too is amenable to reflection. It contains at least the following metadata.

unique identifier of the function/method and
type handles of all type parameters.

It should be noted that the implementation distinguishes function-specific, or method-, type parameters from those declared in any applicable enclosing generic type.

If the generic type or function is exported, the above information is exported from the package of origin, and is available to the compiler, the linker and the runtime.

Partial Compilation

Each type parameter of the generic type serves as a slot that is filled for each instance of the generic type. A pointer to the metadata structure mentioned above fills this slot.

When compiling code, the size of the involved data types needs to be known. The following mechanism is used in order to arrive at one, for each type parameter.

When a generic type is compiled, the compiler first checks to see if its type parameters are constrained. For each type parameter constrained to a type class, the size of its elements is chosen to be that of the widest type (not in the variance sense, since Go does not support classical inheritance, but in the storage sense) in that class.

For each type parameter that is either constrained to an interface or unconstrained, the size is looked up from its type handle. This size is only a placeholder; it is filled, and may differ, for each instantiation of the generic type. This is applicable to all user-defined types supplied as value type arguments to any generic type.

compileGeneric() {
    if generic is a struct or an interface {
        generate type handle metadata placeholder
    } else {  // Function or method.
        generate a function descriptor placeholder
    }

    for each type-parameter {
        if type-parameter is constrained {
            if constrained to a type-class {
                use the size of the widest type in class
                point to the predefined pseudo-method-set of this type-class
            } else {  // Constrained to an interface.
                set a mark to obtain the size from type handle
                setup, or point to, the method set of the interface
            }
        } else {
            set a mark to obtain the size from type handle
            point to a predefined empty method set
        }
    }

    type check the code as per the above type handles and method sets

    generate Go AST using the above type handles and function descriptors
    serialize the generated AST into the package binary
    set pointers for functions, methods and method tables
}

Notes

The choice of an AST representation of the generic's code is with the intention of preserving the separate compilation model of Go. It also relieves us of the need to invent a new exportable intermediate language; we just use a standard part of the language. The package go/ast may have to be enhanced to support this requirement.
Please note that the generic code is type checked only once, in its package of origin. For each instantiation, the supplied type arguments are checked only against any specified constraints. This speeds up compilation, since there is no need for repeated full type checks. Even though we are using an approach opposite to that of C++ (where type checks are completely use-site based), we are not going anywhere near H-M type system.
The above mechanism - as described - is not space-efficient for type parameters constrained to type classes. For instance, a vector of bytes will occupy more than a byte per byte. This is a trade-off between monomorphizing (code bloat) and user data space-efficiency. However, this could be tuned.
This look up of the actual size from the corresponding type handle introduces a small run time overhead. However, it greatly facilitates code sharing. It may be possible, and even easy, to avoid this overhead through the use of some clever trick. I do not know enough!
I also have not thought through any possible repercussions of differently-instantiated recursive functions and methods. For instance, foo<T1>(x T1) may have a call to foo<T2>(y), where y has type T2.

Compilation of Client Code

Once the generic type is compiled as above, client code can be written to instantiate and use it.

Metadata

When the generic type is instantiated in client code, and type parameters are specified, those type arguments are substituted in a copy of the generic's metadata. The result is a new concrete type. For most type checking purposes, the generic origins of this new concrete type can be ignored, as long as the applicable constraints are taken into account.

It should be noted that the unique identifier of the generic can be used to trace this instance back to its generic type definition. Similarly for type parameters and their constraints. This filled copy of the metadata is available to the compiler, the linker and the runtime.

Instance Code Generation

Generation of machine code for a specific instance of the generic type depends on factors such as whether a supplied type argument is a value type or not, and whether it is constrained or not. The following high-level logic could help illustrate the mechanism.

generateClientCode() {
    create a local copy of the generic type's metadata structure
    fill the metadata with actual type arguments

    // Now we have a new concrete type.
    if new concrete type in package dictionary {
        // Not really a new type.
        register this type to use the corresponding code
        return
    }

    if new concrete type type-class-compatible with an existing type {
        // We can still share code.
        // Also note that interface types also fit in here.
        register this type to use the corresponding code
        return
    }

    read the serialized AST of the generic from the package of origin

    for each type-argument {
        type check against any applicable constraint

        if type-argument is value-type {
            handleValueType(type-argument)
        } else {
            handleReferenceType(type-argument)
        }
    }

    generate binary code from the AST, filling type information

    register new concrete type in package dictionary with pointer to generated code
}

//

handleValueType(type-argument) {
    if type-argument belongs to a built-in type-class {
        use the pre-selected (widest) element size
    } else {
        use the actual size of the given type
    }
}

handleReferenceType(type-argument) {
    use the predefined size of standard interface
}

The package dictionary referred to is always exported, i.e., it is available to the compiler, the linker and the runtime. This is independent of whether a particular generic instance is itself exported.

I currently think that the above exported information need not contain package namespace qualifiers. But, I could be wrong in thinking so, and namespace qualifiers may be necessary for some purpose.

In order to facilitate the process of looking up the size of a given value type at run time, a hidden parameter is inserted into all applicable function and method signatures. The corresponding type handle of the supplied type argument is passed into the said hidden parameter at run time. When multiple type arguments are involved, a single consolidated meta handle of their type handles is passed into the hidden parameter. This simplifies the internal signatures of the functions/methods.

Notes

Since Go does not support traditional inheritance, there is no transitive closure of types that we need to take into account. Thus, the amount of code explosion is much less than what it would be had transitive closure been applicable. It should be noted that this applies independent of whether generic metadata is used to reuse generated code or not. But, the combination of the two results in a big win over what causes code bloat otherwise.
A topic that is not discussed relates to partial specialization. It can be dealt with once the mechanisms governing the simpler cases are understood and established.

Linking

During linking, the generic metadata exported from various packages is checked to see if there are redundant copies of:

the same metadata in different packages and
the same code in different packages (library files).

A global dictionary is created by merging the above information from all packages. This global dictionary is constructed using the exported package-level dictionaries themselves. The detected redundant copies are not linked into the final executable.

Reflection Capabilities

Reflection of generic types at run time is possible. Type handles and function descriptors are utilized for this purpose. At least the following capabilities are provided through the reflection API.

Query if a type is generic.
Query the number of type parameters.
Query the number of type-specific parameters.
Query the number of function-specific (or method-) parameters.
Retrieve specific or all type handles.
Retrieve the information within a type handle or a function descriptor.

The specific form of the API can be determined later to suit the philosophy of the current reflection API.

3 comments:

Gln Brwn said...: In my experiments with the grammar, I found the '' syntax to be incompatible with semicolon insertion, but that '~(T)' worked, and could be simplified to ~T when T is a simple type name.

Haskell uses type classes ( http://learnyouahaskell.com/types-and-typeclasses#typeclasses-101 ) . We'd do well to consider them.

Your idea of type classes for basic types could naturally be extended to interfaces by defining an interface{..} for each type class, but only if something like 'receiver type matching' a.k.a 'Generic Operators' at http://learnyouahaskell.com/types-and-typeclasses#typeclasses-101 were implemented. But if one does this and also adds inter-module inlining of small methods, which seems inevitable in the long run, then you can write generics to the interface instead of using basic operators like '<' and '==' and not sacrifice runtime performance. To me, this seems the simplest full-featured design.

write generics to the interface (instead of using basic operators like '<' and '==') to avoid having; 2 May 2012 at 04:49
Gln Brwn said...: That should be "'<T>' syntax", not "'' syntax" but blogspot ate the apparent-html markup...; 2 May 2012 at 04:52
Unknown said...: Thanks for the comment Glenn! Yes, perhaps `<T>' isn't the best syntax. I am fine with any unambiguous syntax. On the latter point, though, you may have noticed that I expressly do not propose user-definable interface-like behaviour for primitive types. The proposed type classes are compiler magic. My proposal also assumes that Go will not have an equivalence between operators and methods (implied by interfaces).; 3 May 2012 at 08:58

2011-11-11

Draft 2 of a proposal for generics in Go

Generics

Storage Classes

Type Classes

Predefined Type Classes

Generic Types

Type Parameter Declaration

Variable Type Specification

Function/Method Parameter Type Specification

Allocation

An Example

Constrained Generic Types

Implementation Notes

Compilation of Generics

Generic Type Metadata

Partial Compilation

Notes

Compilation of Client Code

Metadata

Instance Code Generation

Notes

Linking

Reflection Capabilities

3 comments: