2011-11-02

A proposal for generics in Go

N.B. This is a work in progress. It is intended to help me understand the subject better, elicit comments from more knowledgeable people, and elucidate the nuances of generics in Go's context. The following proposal makes use of a syntax for ease of presentation. The syntax is, however, not intended to be the primary subject, nor is it intended to represent the complexities or possible complications of the underlying mechanisms.

Background

In Go, there are fundamental differences between built-in types and user-defined types. When we try to come up with mechanisms that help treat both of them on a completely equal footing, it appears to be stretching the language along difficult dimensions. Particularly troublesome are issues such as the following.

  • Built-in types (like int and float64) do not have methods; they only have a pre-defined set of operators. Users cannot redefine those operators, nor can they define new operators.
  • Operator overloading is not allowed, preventing user-defined types from exhibiting operator compatibility with the built-in types.
  • Operators are not methods themselves, and hence do not allow a symmetry between the built-in operations and the user-defined.

Rather than explore possible solutions to the above, the current proposal accepts them as facts that will not change. In other words, the inherent differences between built-in types and user-defined types are preserved, and no unification is attempted. By adding one new concept (that of , see hereunder) to the language, backwards-compatibility can be maintained even as generics are introduced into the language.

Storage Classes

Types T1 and T2 belong to the same if elements of T1 can either be assigned to those of T2, or converted to T2, and vice versa. Please note that only conversion is considered, not a type assertion.

Type Classes

Types T1 and T2 belong to the same if the following are true:

  1. T1 and T2 belong to the same storage class and
  2. every operation defined on elements of T1 is also defined on those of T2, and vice versa.

As per the above definition, for instance, int32 and int64 belong to the same type class, while int64 and complex64 do not. Thus, type classes play the equivalent role of interfaces for built-in types. However, type classes do not introduce new concrete types.

Predefined Type Classes

Go language defines a set of predefined type classes for numeric types, as enumerated here.

  • Integer This type class encompasses the following built-in types: int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64 and byte.
  • Float This type class encompasses the following built-in types: float32 and float64.
  • Complex This type class encompasses the following built-in types: complex64 and complex128.

In addition, the set of all named interface types in a program constitutes a single type class. This class is unnamed, and is not accessible to the users; it is defined solely for the purposes of implementation. It is not possible to define new type classes.

Generic Types

A type G with a type parameter T is said to be in T. It is denoted by G<T>. It is allowed for a generic type to be generic in more than one type. Where multiple type parameters need to be specified together, they are comma-separated. Go's generic types are reified, and are amenable to introspection at run time.

A type parameter T can represent any valid, assignable Go type, with the exception of interface{}. The semantics of interface{} interfere with the compile-time checks that generics require. Moreover, the choice of interface{} for a type expressly avoids compile-time specification of a concrete type on the part of the user.

The semantics of generic types are dependent on the type parameters; in particular, value types and pointer types lead to different semantics. Go's generic types are reified, and additional type information is passed through hidden parameters (i.e., the dictionary approach). In the most general case, for efficiency reasons, a distinct reified copy may be generated for each instantiated type class. However, this is an implementation issue.

The dictionary approach incurs some run time cost, but usually does not need the compiler to be available at run time. In my opinion, it is desirable, since that preserves the current Go model.

A type parameter T can appear in the following positions (or sites):

  1. type parameter declaration,
  2. variable type specification,
  3. function/method parameter type specification and
  4. allocation.

Type Parameter Declaration

Type parameter declaration makes a new type parameter available for subsequent use within the applicable scope. Such a declaration does not introduce a new concrete type. It is illegal to declare a type parameter that is unused within that scope; similarly, it is illegal to use a type parameter that is not declared in that scope or any of its enclosing scopes. Type parameters are not shadowed. Hence, it is illegal to declare a type parameter that is already in scope, again.

A generic type G<T> with a specification for T introduces a new concrete type. This new concrete type behaves in full compliance with the rules governing normal Go types, including those for type inference, with explicit exceptions where noted.

Given a generic type G that is dependent on type parameters T1, ..., Tm, it is possible to construct a generic type H that is the same as G with n, n<m type parameters instantiated. This results in H being generic in m-n parameters.

type HashTable<K, V> struct {
    ...
}

type HashSet<K> HashTable<K, bool>

Variable Type Specification

Using a generic type as the type of a variable requires specification of concrete types for the generic type's type parameters. The number of specified type arguments should match that in the generic type's declaration. It is illegal to leave any type parameter uninstantiated.

var h1 HashTable<string, float64>  
var h2 HashTable<string, T>        

It is allowed to specify a type parameter as the type of a variable, as long as the type parameter has a valid declaration in the current scope.

Function/Method Parameter Type Specification

A function or a method can utilize type parameters available in scope. The function/method will then be generic in those type parameters utilized.

func (h *HashTable<K, V>) At(k K) V {
    ...
}

A function/method may itself be generic in one type parameter or more. This is independent of its being generic in one type parameter or more from an outer scope.

func modulus<T>(v T) float64 {
    ...
}

Allocation

In the presence of type parameters, allocation requirements may be determined at run time. The type class of the type parameter determines the allocation mechanism. Also, value types and pointer types have different semantics, as with non-generic types.

type S struct {
    ...
}

var v = new(HashTable<string, S>)

An Example

The following example illustrates the above.

type Vector<T> []T  

func (v *Vector<T>) Push(el T) {  
                                  
    *v = append(*v, el)
}

func (v *Vector<T>) Map<U>(mapper func (T) U) *Vector<U> {  
    u := new(Vector<U>)                                     
    for _, el := range *v {
        u.Push(mapper(el))
    }
    return u
}

Constrained Generic Types

Since an unconstrained generic type cannot assume anything specific about the nature of the types eventually instantiated into its type parameters, the set of possible operations on the elements of such types is very small. Constraints can improve the situation by allowing a wider set of operations. A type parameter can optionally be constrained to only those types that satisfy a particular interface. This enables all the methods from the interface to be utilized in the context of variables belonging to the applicable type parameter.

type Comparable<E> interface {
    Compare(E) int
}

type OrderedSet<E : Comparable<E>> struct {
    ...
}

The above declaration allows ordered sets to be constructed over any type whose elements are ordered according to the Compare() relation.

Similarly, a type parameter can optionally be constrained to only those types that belong to a particular type class.

type Vector<T : Integer> []T  

The above declaration allows a generic vector type to be written with algorithms that provide operations for any integral type.

Constraints on type parameters can appear at all sites except allocation sites. It is illegal to specify more than one constraint for a given type parameter.

No comments: