2014-03-04

On Effects

Background

Today, I happened to watch a panel discussion from Lang.NEXT 2012. It featured Anders Hejlsberg, Martin Odersky, Gilad Bracha and a certain Peter Alvaro, and was anchored by the inimitable Erik Meijer. At the end of what turned out to be a very interesting discussion, a Q&A session was hosted. The last few questions relate to effects-checking at the language level. Odersky responded by saying that effects-checking in static type systems is currently at about the same primitive level as where static typing itself was in Pascal: clunky and cumbersome!

Watching that video finally prompted me to collect my thoughts into a written note.

The Beginning

The first time I pondered the question of effects was around 1994-5, when I started working on C++ programs with sizes in the range of a hundred thousand to a million lines of code. A prominent stimulus was the const annotation on methods. The C++ compiler was capable of tracing changes to the current object made by a given method. In particular, I took note of the transitive nature of const. This transitive nature was simultaneously useful and painful. The const-ness problem in C++ is well known and well documented as well.

Curiously, a method annotated const could still call external code, perform I/O, etc., as long as it did not mutate the current object. Often, I wanted a const method to not get too intelligent and perform unspecified operations. This was particularly true of third-party code without accompanying source. But, C++ provided no mechanism to declare and enforce any such.

Unmet Need

I went on to write programs for IBM 360/370 family of mainframe operating systems, a few flavours of UNIX, Windows NT, etc. Over the years, numerous times, I felt the same need for better guarantees on methods (and functions/procedures). I found it interesting that none of the languages that I worked with, in any of those environments, provided a solution to this problem.

Every once in a while, I would think of effects. Those were mostly unstructured thoughts, though. In addition, I was a typical applications programmer, with no formal background in computer science and programming languages. Having moved from theoretical Physics into programming, I often tried drawing analogies and parallels. Some of them were useful – sometimes and to some extent – but would always breakdown eventually.

Explicit Effects

By 1998-9, I had begun developing a better appreciation for dynamically-typed languages. Not weakly-typed languages such as Perl and Tcl, but strongly-typed ones such as Smalltalk, Python and (later) Ruby. I had accidentally come across the Smalltalk Blue Book by Goldberg and Robson. It opened my eyes to several new windows and doors! I employed Python and Ruby in a variety of projects, with great results for my clients. In the process, for a few years, I did not explore statically-typed languages. Nonetheless, the issue of effects surfaced time and again, particularly as the sizes of code bases and teams increased.

I returned to a large (> a million LoC) C++ project in 2002, and that work stirred my thoughts on effects yet again. Based on my experiences, I began collecting a wish list of the kinds of effects that I wanted the compiler to track. Towards, that I began comparing my thoughts to the facilities provided by some of the languages that I used or became aware of.

Java

Java's checked exceptions force a method to either handle a thrown exception or re-throw the same, and declare so in the method signature. While no other effects can be declared, a consumer knows from the signature that such a method may not return a value, but may throw one of the specified exceptions.

Haskell

I came across Haskell in 2003. I found the basics easy enough to follow, and wrote small exercise programs to gain some familiarity with it. Those days, there were not many easy tutorials for beginners, requiring some research into the scant documentation rather frequently. As I read about Haskell, I found three interesting aspects standing out [1]:

  • all Haskell functions are technically unary,
  • its system of type classes, and
  • its effects system.

The latter, of course, is relevant to the current discussion. Haskell does not require us to say anything specific about a pure function. On the other hand, when a function is not pure, Haskell requires us to utilise an appropriate type (usually a monad) to indicate the specific manner in which the function causes side effects. This allows for user-defined monads to specialise the kind of effects caused by a function. Once defined and used, these monads are utilised by Haskell's powerful type system to ensure consistency of use across the program.

Much later, I happened to watch the video of a talk by Erik Meijer, in which he remarked there are many ways for a function to be impure, but there is only one way to be pure. And, the dots connected!

D

D follows the approach of C++, but takes it further. Unlike C++'s const, a pure function in D must be free of side effects. This is a much stronger guarantee, and helps significantly. However, notice the difference between Haskell's philosophy and D's: functions (and methods) in D are assumed impure by default. And, thus, pure functions (and methods) have to be explicitly marked pure.

Nimrod

An interesting variation can be found in Nimrod. It provides some pragmas to specify effects[2]. In particular, we can specify the possible exceptions thrown by a proc or a method. If it does not declare any exceptions, it is assumed to throw the base exception type. To avoid that, it has to expressly declare an empty list of exceptions.

There are plans to implement read and write tracking in Nimrod. In addition, an interesting feature is the capability to tag a proc or a method with some types. The meaning of those types is ascribed by the user; Nimrod doesn't appear to care! However, once specified, these tagged types are tracked by the compiler analogously to how exceptions are tracked. Thus, it provides an expressive mechanism to introduce user-defined effect types as long as they behave similarly to exceptions.

Evolution of My Thoughts

The numerous projects that worked on shaped the development of my own thoughts on effects. Apart from working on huge assembler, COBOL, PL/I and Rexx code bases on IBM mainframes, I worked on large projects that used C, C++, Java, Python, Ruby, etc. in a wide variety of application domains. Particular combinations of application domains and languages sometimes led to specific realisations.

Tracking Effects

I believe that effects tracking can be effectively implemented in both statically-typed languages and dynamically-typed ones. Type systems for effects appear to be orthogonal to those for values. Accordingly, the following discussion does not distinguish between the static vs. dynamic nature of types for values. Similarly, it does not distinguish between object-oriented and non-object-oriented languages. It does, on the other hand, assume that there is an ahead-of-time or just-in-time compilation phase — i.e. parsing the source should not result in an AST that is directly executed immediately.

Compiler-Defined Effects

An analysis of the program by the compiler is necessary for any effects system to be useful. The signature of each function or method in the program has to be verified against inferred effects of that function or method. All deviations have to be marked as errors, and the compiler should refuse to compile such. Effects should be annotated as a possible combination of:

mutates
mutates object state,
mutates_params
mutates one or more input parameters passed by reference,
reads
reads input from the world: heap, message queues, files, network, etc.,
writes
writes output to the world: heap, message queues, files, network, etc.,
tainted
invokes untrusted external code,
recursive
may not return due to self recursion,
i_recursive
may not return due to mutual or more indirect recursion, and
throws
may throw one or more exceptions.

A function or method with none of the above effects is considered pure: it is a mathematical function!

User-Defined Effects

User-defined effects are second class citizens. They are tracked like compiler-defined ones, but the compiler itself cannot relate to the meaning of such effects.

Propagation of Effects

throws and user-defined effects are mutable. It should be possible to handle them, and stop their propagation. Whether the resolution of such handled exceptions is nominal, involves subtyping, etc., is dependent on the type system of the language. Except for those, all other effects are fundamentally transitive in nature. Each effect propagates up the call hierarchy from where it occurs.

This mandates run-time compilation in the cases of languages supporting fully separate compilation of modules/packages/…. Cross-boundary passing of lambdas and methods to higher-order functions and methods necessitates dynamic compiler checks at run time. Violations of effects guarantees should lead to a designated run-time exception that cannot be handled.

This can yet be avoided should it be possible to perform a whole-program analysis upon dynamic linking. But, may be that opens a different Pandora's box!


[1] At that time, I could not comprehend the machinery behind those. Not that I fully comprehend it now, too; my current understanding is only marginally better!

[2] http://nimrod-lang.org/manual.html#effect-system

2014-01-08

Another go at Go ... failed!

After a considerable gap, a gave Go another go!

The Problem

As part of a consulting engagement, I accepted a project to develop some statistical inference models in the area of drug (medicine) repositioning. Input data comprises three sets of associations: (i) between drugs and adverse effects, (ii) between drugs and diseases, and (iii) between drugs and targets (proteins). Using drugs as hub elements, associations are inferred between the other three kinds of elements, pair-wise.

The actual statistics computed vary from simple measures such as sensitivity (e.g. how sensitive is a given drug to a set of query targets?) and clustering coefficients of the multi-mode graph, to construction of rather complex confusion matrices, computation of measures such as Matthews Correlation Coefficient, to construction of generalised profile vectors for drugs, diseases, etc. Accordingly, the computational intensity varies considerably across parts of the models.

For the size of the test subset of input data, the in-memory graph of direct and transitive associations currently has about 15,000 vertices and over 14,000,000 edges. This is expected to grow by two orders of magnitude (or more) when the full data set is used for input.

Programming Language

I had some temptation initially to prototype the first model (or two) in a language like Ruby. Giving the volume of data its due weight though, I decided to use Ruby for ad hoc validation of parts of the computations, with coding proper happening in a faster, compiled language. I have been using Java for most of my work (both open source as well as for clients). However, considering the fact that statistics instances are write-only, I hoped that Go could help me make the computations parallel easily[1].

My choice of Go caused some discomfort on the part of my client's programmers, since they have to maintain the code down the road. No serious objections were raised nevertheless. So, I went ahead and developed the first three models in Go.

Practical Issues With Go

The Internet is abuzz with success stories involving Go; there isn't an additional perspective that I can add! The following are factors, in no particular order, that inhibited my productivity as I worked on the project.

No Set in the Language

Through (almost) every hour of this project, I found myself needing an efficient implementation of a set data structure. Go does not have a built-in set; it has arrays, slices and maps (hash tables). And, Go lacks generics. Consequently, whichever generic data structure is not provided by the compiler can not be implemented in a library. I ended up using maps as sets. Everyone who does that realises the pain involved, sooner than later. Maps provide uniqueness of keys, but I needed sets for their set-like properties: being able to do minus, union, intersection, etc. I had to code those in-line every time. I have seen several people argue vehemently (even arrogantly) in golang-nuts that it costs just a few lines each time, and that it makes the code clearer. Nothing could be further from truth. In-lining those operations has only reduced readability and obscured my intent. I had to consciously train my eyes to recognise those blocks to mean union, intersection, etc. They also were very inconvenient when trying different sequences of computations for better efficiency, since a quick glance never sufficed!

Also, I found the performance of Go maps wanting. Profiling showed that get operations were consuming a good percentage of the total running time. Of course, several of those get operations are actually to check for the presence of a key.

No BitSet in the Standard Library

Since the performance of maps was dragging the computations back, I investigated the possibility of changing the algorithms to work with bit sets. However, there is no BitSet or BitArray in Go's standard library. I found two packages in the community: one on code.google.com and the other on github.com. I selected the former both because it performed better and provided a convenient iteration through only the bits set to true. Mind you, the data is mostly sparse, and hence both these were desirable characteristics.

Incidentally, both the bit set packages have varying performance. I could not determine the sources of those variations, since I could not easily construct test data to reproduce them on a small scale. A well-tested, high performance bit set in the standard library would have helped greatly.

Generics, or Their Absence

The general attitude in Go community towards generics seems to have degenerated into one consisting of a mix of disgust and condescension, unfortunately. Well-made cases that illustrate problems best served by generics, are being dismissed with such impudence and temerity as to cause repulsion. That Russ Cox' original formulation of the now-famous tri-lemma is incomplete at best has not sunk in despite four years of discussions. Enough said!

In my particular case, I have six sets of computations that differ in:

  • types of input data elements held in the containers, and upon which the computations are performed (a unique combination of three types for each pair, to be precise),
  • user-specified values for various algorithmic parameters for a given combination of element types,
  • minor computational steps and
  • types (and instances) of containers into which the results aggregate.

These differences meant that I could not write common template code that could be used to generate six versions using extra-language tools (as inconvenient as that already is). The amount of boiler-plate needed externally to handle the differences very quickly became both too much and too confusing. Eventually, I resorted to six fully-specialised versions each of data holders, algorithms and results containers, just for manageability of the code.

This had an undesirable side effect, though: now, each change to any of the core containers or computations had to be manually propagated to all the corresponding remaining versions. It soon led to a disinclination on my part to quickly iterate through alternative model formulations, since the overhead of trying new formulations was non-trivial.

Poor Performance

This was simply unexpected! With fully-specialised versions of graph nodes, edges, computations and results containers, I was expecting very good performance. Initially, it was not very good. In single-threaded mode, a complete run of three models on the test set of data took about 9 minutes 25 seconds. I re-examined various computations. I eliminated redundant checks in some paths, combined two passes into one at the expense of more memory, pre-identified query sets so that the full sets need not be iterated over, etc. At the end of all that, in single-threaded mode, a complete run of three models on the test set of data took about 2 minutes 40 seconds. For a while, I thought that I had squeezed it to the maximum extent. And so thought my client, too! More on that later.

Enhancement Requests

At that point, my client requested for three enhancements, two of which affected all the six + six versions of the models. I ploughed through the first change and propagated it through the other eleven specialised versions. I had a full taste of what was to come, though, when I was hit with the realisation that I was yet working on Phase 1 of the project, which had seven proposed phases in all!

Back to Java!

I took a break of one full day, and did a hard review of the code (and my situation, of course). I quickly identified three major areas where generics and (inheritance-based) polymorphism would have presented a much more pleasant solution. I had already spent 11 weeks on the project, the bulk of that going into developing and evaluating the statistical models. With the models now ready, I estimated that a re-write in Java would cost me about 10 working days. I decided to take the plunge.

The full re-write in Java took 8 working days. The ease with which I could model the generic data containers and results containers was quite expected. Java's BitSet class was of tremendous help. I had some trepidation about the algorithmic parts. However, they turned out to be easier than I anticipated! I made the computations themselves parts of formally-typed abstract classes, with the concrete parts such as substitution of actual types, the user-specified parameters and minor variations implemented by the subclasses. Conceptually, it was clear and clean: the base computations were easy to follow in the abstract classes. The overrides were clearly marked so, and were quite pointed.

Naturally, I expected a reduction in the size of the code base; I was not sure by how much, though. The actual reduction was by about 40%. This was nice, since it came with the benefit of more manageable code.

The most unexpected outcome concerned performance: a complete run of the three models on the test set of data now took about 30 seconds! My first suspicion was that something went so wrong as to cause a premature (but legal) exit somewhere. However, the output matched what was produced by the Go version (thanks Ruby), so that could not have been true. I re-ran the program several times, since it sounded too good to be true. Each time, the run completed in about 30 seconds.

I was left scratching my head. My puzzlement continued for a while, before I noticed something: the CPU utilisation reported by /usr/bin/time was around 370-380%! I was now totally stumped. conky showed that all processor cores were indeed being used. How could that be? The program was very much single-threaded.

After some thought and Googling, I saw a few factors that potentially enabled a utilisation of multiple cores.

  • All the input data classes were final.
  • All the results classes were final, with all of their members being final too.
  • All algorithm subclasses were final.
  • All data containers (masters), the multi-mode graph itself, and all results containers had only insert and look-up operations performed on them. None had a delete operation.

Effectively, almost all of the code involved only final classes. And, all operations were append-only. The compiler may have noticed those; the run-time must have noticed those. I still do not know what is going on inside the JRE as the program runs, but I am truly amazed by its capabilities! Needless to say, I am quite happy with the outcome, too!

Update: As several responses (both here and on Hacker News) stated, Java's multi-threaded GC appears to be primary reason for the utilisation of all the processor cores.

Conclusions

  • If your problem domain involves patterns that benefit from type parameterisation or[2] polymorphism that is easily achievable through inheritance, Go is a poor choice.
  • If you find your Go code evolving into having few interfaces but many higher-order functions (or methods) that resort to frequent type assertions, Go is a poor choice.
  • Go runtime can learn a trick or two from JRE 7 as regards performance.

These may seem obvious to more informed people; but to me, it was some enlightenment!


[1] I tried Haskell and Elixir as candidates, but nested data holders with multiply circular references appear to be problematic to deal with in functional languages. Immutable data presents interesting challenges when it comes to cyclic graphs! The solutions suggested by the respective communities involved considerable boiler-plate. More importantly, the resulting code lost direct correspondence with the problem's structural elements. Eventually, I abandoned that approach.

[2] Not an exclusive or.