2012-12-27

Return to blogging!

After several months of silence, here I return to blogging! A few quick updates are in order, in no particular order.

Trivia

  • When the battery of my MacBook Pro began failing in May, I purchased a relatively low-end HP Pavilion dv6 7040TX pre-installed with Windows 7. I mostly like it. It generates very little heat. By contrast, the MacBook Pro is a mini heater for the winter. Another noticeable feature is battery life: I am getting about five hours of development time per charge. The only downside is the low screen resolution, which is 1366x768. In practice, though, it has proved to be adequate for my development needs.
  • It was a rare occasion when I surprised myself by impulsively upgrading the HP computer to Windows 8! My unfamiliarity with Windows was amply proved by the numerous devices and driver difficulties I encountered upon upgrading. Reading a related Microsoft Knowledge Base article revealed that there was an important step that I missed. [For the curious, we are supposed to uninstall and re-install the devices when upgrading in-place.]
  • VMware Player now offers OpenGL-based 3D support for Linux guests. Upon upgrading to the new version of Player, I realised promptly that Debian Wheezy had a problem that prevented it from recognising and utilising 3D devices. It appears as if Sid has this problem as well, since my experimental Aptosid image failed to turn on desktop effects.
  • Thus, I now run Linux Mint 14 KDE. [Of course, it is KDE!] It has been quite stable for my daily development needs (several Emacs windows, Eclipse 3.7 and several Konsole windows and tabs). This is in stark contrast to the frustrating experience with the Cinnamon version, which I downloaded first, mistaking it to be the KDE version. This demonstrates — yet again — why choice is so important, and why it underlies the philosophy of free and open source software!

Largely distracted months

I went through several months of non-work distractions. I am glad that those are nearing their respective conclusions. Not being able to concentrate on work can be really frustrating. More so if one's To Do list is long.

Experiments with languages

During these largely unproductive months, I studied a few languages, peripherally. Here is a summary.

Haskell

I had briefly looked at Haskell, in 2000. It looked so different, I promptly left it. Having gained a little more of functional thinking in the meantime, I decided to take another look at it. A good motivation was Hughes' influential paper "Why Functional Programming Matters". Some Haskell features are straight-forward: type classes, pattern matching, guards in functions, list comprehensions, etc. Some others are deep: higher-order functions, currying, lazy evaluation, etc. A few others go even deeper: functors, applicatives, monads, etc. Haskell demands considerable time and effort — not only the language proper, but the tool chain too. The syntax is layout-sensitive, and I could not even find a decent major mode for Emacs. The package tool called cabal repeatedly left my Haskell environment in an unstable state. Tooling is certainly a problem for the beginner, but otherwise, Haskell is a profound language that makes your thinking better!

Dart

Dart is a curious mix of the semantics of Smalltalk, the syntax of Java and JavaScript, and memory-isolated threads that communicate by sending messages. Added into this curious mix is a compile-time type system that does not affect the run-time object types! Mind you, Dart is strongly-typed. Even though there is a compile-time type system, it is optional and is primarily intended for better tooling, and the language itself is dynamically-typed. The types are carried by the objects, but the variables themselves are untyped. Dart's biggest promise is the ability to write scalable Web applications using a single language on both the server side and the client. The server side seems to present no problems, but the Web programming community is divided in its opinion on Dart's client side promise. The contention arises because Dart has its own virtual machine. Using the VM requires the user to install it as a plug-in in her browser. For those who do not want to use the VM, Dart comes with a cross-compiler that outputs equivalent JavaScript code.

D

I had known of the existence of D for several years, even through I never looked at it in detail. Reading a little about the history of D, I realised that it underwent a rather tumultuous adolescence. With D2, it appears to have entered adulthood. The motivation to look at D was an MSDN Channel 9 video of a presentation by Andre Alexandrescu. D was designed to be a better C++. Several of the design decisions behind it can be better appreciated if we hold that premise in mind. It has a simplified, more uniform syntax, garbage collection, a decent standard library and easier generics. It maintains the ability to directly call a C function in an external library, immediately making a vast set of C libraries accessible. Scoped errors and compile-time function evaluation are examples of D's interesting features. Another notable feature is the partitioning of the language features into safe, trusted and unsafe subsets, with the ability to verify that a module is completely safe, etc. D has good performance that is reasonable compared to that of C++.

Others

I also looked briefly at Erlang and Clojure. However, I did not spend enough time on them to be able to form an opinion.

2012-04-23

The changing face of urban Hyderabad

A few days ago, my family went shopping to the Ameerpet area of Hyderabad. We shopped for about an hour-and-a-half; the time was 17:00. My wife wanted to have some coffee (so did I, in fact). We could not find a place that served coffee in the immediate vicinity. We walked in the general direction of a few restaurants. Thus began an amazing hour of discovery!

We went into the first restaurant that we came across. We seated ourselves at the first table available. Presently, a waiter turned up. ``Two strong, hot coffees," I said. ``No, sir," he replied promptly, ``we don't serve coffee." I was surprised. We picked up the bags, and walked on.

At the next restaurant, we were cautious. We did not go as far as seating ourselves; rather, we waited for a waiter to approach us. ``Do you serve coffee?" I enquire. We get the same reply, ``No." I was more surprised. We walked on.

The third restaurant was a familiar one. It has been around for over twenty five years. The last I had visited it, it used to serve coffee, tea and snacks. However, that was several years ago. My four-year-old son complained of hunger by this time. He wanted a pesarattu (a special Telugu dish that is a kind of thin-and-large pancake). I felt that there was a high probability that this restaurant would serve both pesarattu and coffee. So, we climbed up a floor to the restaurant. ``No, sir. We used to serve South Indian food until about six months ago. We no longer do. Now, we serve Mughalai, Tandoori and Chinese!" I was mildly astonished. My wife and I sighed simultaneously, and we walked on.

My son was very disappointed. As we walked, he was eagerly watching for another restaurant. This time, we had to walk quite some distance before we came across another. Its look made it clear that it was a very non-vegetarian-oriented restaurant. We did not bother to walk in. My wife and I had a quick consultation, and decided to turn around, pass the shopping area, and try in the other direction.

My son's disappointment grew with each passing twenty five metres, or so. He started getting petulant. We negotiated the distance back to the shopping area with some difficulty, coaxing my son along the way. As we walked past that, we soon realised that there were no restaurants within sight! By this time, we had spent close to an hour covering a total of a little over a kilometre, without finding a place that served South Indian snacks and coffee! We resigned, got into the car, and drove back home.

The episode left me wondering, however, about the dramatic transformation that Hyderabad has undergone in the last couple of decades. It is very difficult these days to find decent (or even semi-decent) restaurants that serve Telugu vegetarian food. I have noticed the same trend in Bengaluru too, particularly for supper. A large number of restaurants have colluded to systematically eliminate South Indian menus. A key reason is that Mughalai, Tandoori, Chinese, etc. food is much more expensive. The restaurants earn significantly more per table-hour when they serve them. The constant in-flow of North Indians into Hyderabad has only made it easier for the restaurants to switch over.

Another dimension that has seeped in over the years is that of western fast food (pizzas, burgers, etc.). In the name of maintaining international quality at an international price, the western chains charge ridiculously high prices (by Indian standards) for such fast food. We have to remember, however, that economic liberalisation has placed sudden money and means in the hands of an entire new crop of employees and entrepreneurs (and their pizzas-and-potato-chips brats). India has, consequently, been witnessing rapid changes in urban social patterns. The new-found affluence has resulted in a large number of families dining out several times a week. And, in the name of novelty, a vast majority of them patronise the more expensive varieties. The smaller restaurants, obviously, do not wish to let the opportunity slip by. We see, thus, a steady decline in the number of restaurants serving native food.

Craving for the new often dislodges the old! In this instance, Telugu (South Indian, in general) food and beverages are the casualty!

2012-04-07

Graphs and molecules - 2

Note: This post utilises MathJax to display mathematical notation. It may take a second or two to load, interpret and render; please wait!

If you have not read the previous post in this series, please read it first here.

Ordering

The notion of ordering is very intuitive in the context of natural numbers. Indeed, when we learn natural numbers, their representation \(\bbox[1pt]{\{1, 2, 3, 4, \ldots\}}{}\) itself imprints an ordering relationship in our minds. Soon enough, we learn to assign a sense of relative magnitude to those numbers: 4 is larger than 2, etc. This concept extends naturally to negative numbers and rational numbers too.

A little rigour

Suppose that we represent the ordering relationship between two elements of a set using the symbol \(\le\). Then, we can define the properties that a set \(S\) should satisfy for it to be ordered.

  • Reflexivity: \(a \le a\forall a \in S\)
  • Antisymmetry: if \(a \le b\) and \(b \le a\), then \(a = b\ \forall a, b \in S\)
  • Transitivity: if \(a \le b\) and \(b \le c\), then \(a \le c\ \forall a, b, c \in S\)

We can readily see that integers and rational numbers satisfy the above properties. Accordingly, we say that integers and rational numbers are ordered, if we assign the meaning smaller than or equal to to the ordering relationship \(\le\).

In fact, we can see that integers and rational numbers also satisfy an additional property.

  • Totality: \(a \le b\) or \(b \le a\ \forall a, b \in S\)

A distinction

Totality is a stricter requirement than the preceding three. It mandates that an ordering relationship exist between any and every pair of elements of the set. While reflexivity is easy enough to comprehend, the next two specify the conditions that must hold if the elements concerned do obey an ordering relationship.

It is easy to think of sets that satisfy the former three properties, but without satisfying the last. As an example, let us consider the set \(X = \{1, 2, 3\}\). Now, let us construct a set of some of its subsets \(S = \{\{2\}, \{1, 2\}, \{2, 3\}, \{1, 2, 3\}\}\). Let us define the ordering relationship \(\le\) to mean subset of represented by \(\subseteq\). Exercise: verify that the first three properties hold in \(S\).

We see that \(\{1, 2\}\) and \(\{2, 3\}\) are elements of \(S\), but neither is a subset of the other.

Therefore, mathematicians distinguish sets satisfying only the first three from those satisfying all the four. The former are said to have partial ordering, and they are sometimes called posets or partially-ordered sets. The latter are said to have total ordering.

More ordering

Now, let us expand the discussion to include irrational numbers. Do our definitions apply? There is an immediate difficulty: irrational numbers have non-terminating decimal parts! How do we compare two such numbers? How should we define the ordering relationship? The integral part is trivial; it is the decimal part that presents the difficulty.

Sequence comparison

In order to be able to deal with irrational numbers, we have to introduce an additional notion — sequences. A sequence is a set (finite or infinite) where the relative positions of the elements matter. Another distinction is that elements can repeat, occurring at multiple places. The number of elements in a sequence, if it is finite, is called its length. Thus, sequences can be used to represent the decimal parts of irrational numbers.

Let \(X = \{x_1, x_2, x_3, \ldots\}\) and \(Y = \{y_1, y_2, y_3, \ldots\}\) be two sequences. We can define an ordering relationship between sequences as follows. We say \(X \le Y\) if one of the following holds.

  • \(X\) is finite with a length \(n\), and \(x_i = y_i\ \forall i \le n\) and \(y_{n+1}\) exists.
  • \(X\) and \(Y\) are infinite, and \(\exists\ n\) such that \(x_i = y_i\ \forall i \le n\), and \(x_{n+1} \le y_{n+1}\).

Armed with the above definition, we can readily see that we can compare two irrational numbers — in fact, any two sequences. Exercise: verify this claim by comparing two irrational numbers and two sequences of non-numerical elements!

Bottom and top elements

If a set \(S\) of sequences has an element \(b\) such that \(b \le s\ \forall s \in S\), the element \(b\) is called the bottom element of the set. The element t that we get if we replace \(\le\) with \(\ge\) is called the top element of the set. The bottom element and the top are unique in a given set.

In our first example above, \(\{2\}\) is the bottom element of the set, while \(\{1, 2, 3\}\) is the top. However, it is important to understand that bottom and top elements may not exist in a given set of sequences. Exercise: think of one such set.

Minimal and maximal elements

When a set does not have a bottom element, it is yet possible for it to have minimal elements. For an element \(m\) to be a minimal element of the set \(S\), \(s \le m \implies s = m\) should hold. If we replace \(\le\) with \(\ge\), we get maximal elements.

Minimal and maximal elements are difficult to establish (and, sometimes, even understand) in the context of infinite sets or complex ordering relationships. The same applies to bottom and top elements, too.

Conclusion

You may have begun wondering if the title of this post was set by mistake. On the contrary, these concepts are very important to understand before we tackle canonical representation of molecules, ring systems in molecules, etc., which we shall encounter in future posts.

2012-03-20

Linux distribution chosen!

I had promised to post an update towards the end of January. I did not. However, even the casual reader may have noticed my recent posts related to Debian Wheezy. Those must have served as hints. So, Debian Wheezy it is! I have finally settled on Debian Wheezy with KDE.

Perhaps apt's mechanisms suit my thinking. yum is very powerful, yet the rpm family could not win me over. In fact, at one point, I went close to going back to Arch Linux. However, it is often too cutting-edge — even for a development system.

At the same time, GUI played a non-trivial role in my decision. It also explains why Ubuntu – with its Unity desktop – did not survive in my computer. I felt GNOME to be too restrictive. Some people think that KDE has too many knobs and switches; that it is daunting. Again, perhaps its mechanisms suit my thinking.

With the decision made, I have removed the ISO files of well over a dozen distributions and the VMware images of about half-a-dozen. Peace!

2012-03-17

Debian Wheezy : updating Java alternatives

Debian Wheezy (or even Sid) defaults to Java 6. Originally, my computer had openjdk-6-jdk. I wanted to utilise the newer features in Java 7 such as higher performance and lower memory footprint, and try the Fork-Join framework. Accordingly, I installed openjdk-7-jdk. It updated the Debian alternatives for Java to point to the newer version. So far, so good!

Dependencies can upset the apple cart

Then, I installed Eclipse using apt-get. The version of Eclipse installed is 3.7.1, which is fine. However, it pulls in Java 6 as a dependency. I somehow did not pay attention to that. As the installation completed, I noticed several messages informing me that the alternatives for Java were being reset to Java 6. I bit my lip hard! I think that apt-get should explicitly warn the user if an installation downgrades a package, or more, due to dependencies.

Simple remedy

Fortunately, a simple remedy is possible. But before we begin, we should check the priorities with which both versions are installed. To check the same, issue the following command in a terminal.

> update-alternatives --display javac


javac - auto mode
  link currently points to /usr/lib/jvm/java-6-openjdk-amd64/bin/javac
/usr/lib/jvm/java-6-openjdk-amd64/bin/javac - priority 1061
  slave javac.1.gz: /usr/lib/jvm/java-6-openjdk-amd64/man/man1/javac.1.gz
/usr/lib/jvm/java-7-openjdk-amd64/bin/javac - priority 100
Current 'best' version is '/usr/lib/jvm/java-6-openjdk-amd64/bin/javac'.

Please note the numbers at the end of the full paths of javac. So, both Java 6 and Java 7 are installed, but Java 6 has a higher priority — 1061 to 100. It is, therefore, considered the best version. We can check where /etc/alternatives/javac points, too, for confirmation.

The remedy to apply itself utilises update-alternatives. In order to take care of all the important JDK components in one shot, I collected the commands into a shell script.

> cat up-java-alt.sh

#!/usr/bin/env sh
#
# Update Debian alternatives for Java.

update-alternatives --install \
        /usr/bin/java java \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/java 1100

update-alternatives --install \
        /usr/bin/appletviewer appletviewer \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/appletviewer 1100

update-alternatives --install \
        /usr/bin/apt apt \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/apt 1100

update-alternatives --install \
        /usr/bin/extcheck extcheck \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/extcheck 1100

update-alternatives --install \
        /usr/bin/idlj idlj \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/idlj 1100

update-alternatives --install \
        /usr/bin/jar jar \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/jar 1100

update-alternatives --install \
        /usr/bin/jarsigner jarsigner \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/jarsigner 1100

update-alternatives --install \
        /usr/bin/javac javac \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/javac 1100

update-alternatives --install \
        /usr/bin/javadoc javadoc \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/javadoc 1100

update-alternatives --install \
        /usr/bin/javah javah \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/javah 1100

update-alternatives --install \
        /usr/bin/javap javap \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/javap 1100

update-alternatives --install \
        /usr/bin/jconsole jconsole \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/jconsole 1100

update-alternatives --install \
        /usr/bin/jdb jdb \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/jdb 1100

update-alternatives --install \
        /usr/bin/jhat jhat \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/jhat 1100

update-alternatives --install \
        /usr/bin/jinfo jinfo \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/jinfo 1100

update-alternatives --install \
        /usr/bin/jmap jmap \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/jmap 1100

update-alternatives --install \
        /usr/bin/jps jps \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/jps 1100

update-alternatives --install \
        /usr/bin/jrunscript jrunscript \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/jrunscript 1100

update-alternatives --install \
        /usr/bin/jsadebugd jsadebugd \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/jsadebugd 1100

update-alternatives --install \
        /usr/bin/jstack jstack \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/jstack 1100

update-alternatives --install \
        /usr/bin/jstat jstat \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/jstat 1100

update-alternatives --install \
        /usr/bin/jstatd jstatd \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/jstatd 1100

update-alternatives --install \
        /usr/bin/native2ascii native2ascii \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/native2ascii 1100

update-alternatives --install \
        /usr/bin/rmic rmic \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/rmic 1100

update-alternatives --install \
        /usr/bin/schemagen schemagen \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/schemagen 1100

update-alternatives --install \
        /usr/bin/serialver serialver \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/serialver 1100

update-alternatives --install \
        /usr/bin/wsgen wsgen \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/wsgen 1100

update-alternatives --install \
        /usr/bin/wsimport wsimport \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/wsimport 1100

update-alternatives --install \
        /usr/bin/xjc xjc \
    /usr/lib/jvm/java-7-openjdk-amd64/bin/xjc 1100

Please note that we used a priority value of 1100, so that we can assign Java 7 a higher priority than that of Java 6. Now, we run the above script, and check again the alternatives status, and where /etc/alternatives/javac points.

> update-alternatives --display javac


javac - auto mode
  link currently points to /usr/lib/jvm/java-7-openjdk-amd64/bin/javac
/usr/lib/jvm/java-6-openjdk-amd64/bin/javac - priority 1061
  slave javac.1.gz: /usr/lib/jvm/java-6-openjdk-amd64/man/man1/javac.1.gz
/usr/lib/jvm/java-7-openjdk-amd64/bin/javac - priority 1100
Current 'best' version is '/usr/lib/jvm/java-7-openjdk-amd64/bin/javac'.

Enjoy Java 7 again! Don't forget to change the default JVM path in Eclipse, though.

What about the man pages installed as slaves? That part is left as an exercise :-)

2012-03-08

Convergent synthesis, finally!

The legacy C++ version of my chemistry product has finally gained the capability to do convergent synthesis — in a primitive form for now.

What is convergent synthesis?

Consider a complex molecule that we wish to synthesise. In the usual method, the synthesis steps proceed linearly. Suppose that the following is the sequence of synthesis steps, where Goal designates the molecule to synthesise.

A → B → C → D → E → F → G → H → I → J → Goal

The above sequence has 10 steps in the process. We start with a simple, available molecule A. Presumably, a functional group is either added, replaced or deleted at each step. [Reality includes several more dimensions, but let us keep the discussion simple.] While easy to comprehend, the biggest problem with this is the effective yield of the route. For the purposes of discussion, let us assume an average yield of 85% per step. With 10 steps, the effective yield is less than 20%!

Contrast this with the following scheme.

P → Q → R ⌉
          | → Goal
S → T → U ⌋

In this case, we have two independent paths leading to moderately complex molecules R and U. Then, these two paths converge to give rise to a more complex molecule, which in this case is our goal molecule. The expectation is that since R and U are only moderately complex, they can be independently synthesised in a couple of steps each. The effective yield for the convergent route, then, is about 44%! This is, obviously, much more attractive.

How does it work in retro-synthesis?

My product actually does retro-synthesis, i.e., it starts with the goal molecule, and constructs the steps in reverse order. At each step, the product molecule is broken into reactants. In most scenarios, the coreactant is a trivial molecule; or, it is directly available for purchase from a company like Sigma Aldrich.

If we wish to take advantage of convergent synthesis, on the other hand, how we break a product molecule into possible sets of reactants becomes a matter of extreme significance. In the step R + U → Goal, effective convergence is possible if and only if R and U have about the same degree of complexity.

What next?

The immediate challenge is to locate such reactions, and build a repertoire of them. Another, of course, is to be able to resolve a product molecule to utilise one such! Our chances depend on being able to identify a reasonably central atom that is suitable for initiating the cleavage of the product molecule. The algorithms have to be refined to exploit this new capability, as well!

2012-03-07

Difficult decision - 2

Firstly, I am surprised at the amount of traffic my previous post has generated. I had over a thousand visitors within a few hours: redirected from Google, Reddit, Hacker News, etc.! I wonder how may of them would have read the post had it been the opposite way, i.e., a declaration that I was adopting Go for the next version of my chemistry product? It appears to demonstrate an intriguing aspect of human nature!

Plug-in

Several people have suggested various ways in which out-of-process plug-ins are better. Some of these suggestions arose probably because of me not describing a plug-in adequately. I tried to remedy the situation in individual responses. I am collecting some of those points hereunder.

I am looking at plug-ins for at least the following benefits. In all cases, the said plug-in could potentially be supplied by me, a third-party developer or the user herself.

  • Substitute an algorithm for another.
  • Substitute an algorithm implementation for another.
  • Add a new algorithm not originally shipped with the product.
  • Add a calculator or a transformer as a hook in a particular processing step.

Performance

My application has to process millions (sometimes tens of millions) of molecules per run. A large number of them are processed by the proposed plug-ins. The number reduces with each advancing stage of processing, owing to elimination in each stage. The load is, hence, lower towards the tail of the work flow. But, upstream plug-ins are invoked for most molecules.

An out-of-process plug-in will require the following steps for communication:

  • serialisation of input in the main program,
  • deserialisation of input in the plug-in,
  • serialisation of output in the plug-in, and
  • deserialisation of output in the main program.

The above steps are in addition to the unavoidable protocol handshake for each request. Evidently, the larger the amount of data that needs to be exchanged between the main program and the plug-in, the slower the above process will be. Let us look at the data structure that will get exchanged the highest in my application, viz., Molecule. It has the following information, at a minimum:

  • a unique identifier,
  • a list of atoms, where each atom has:
    • a unique identifier,
    • element type,
    • spatial coordinates,
    • net charge,
    • stereo configuration,
    • number of implicit H atoms attached to it,
    • aromaticity,
    • list of rings it is a part of,
    • whether it is a bridgehead,
  • a list of bonds, where each bond has:
    • a unique identifier,
    • the atoms it joins,
    • order: single, double, triple, aromatic, …,
    • stereo configuration,
    • aromaticity,
    • list of rings it is a part of,
  • a list of rings with their own properties,
  • a list of components,
  • a list of functional groups,
  • … .

It may be possible to have a protocol to allow a plug-in to declare the subset of the above data that it actually needs. However, checking the protocol and selectively serialising the data has a cost itself.

On the contrary, an in-memory plug-in accesses the object using a pointer, with just a transfer of ownership but no transfer of data.

Manageability

External plug-ins also raise the subject of their life cycle management and resolution. The questions that need to be addressed include the following.

  • When is a plug-in activated? Together with the main program? On demand?
  • Should plug-ins die with the main program? How do we handle the main program terminating abnormally?
  • How long should a plug-in continue to run, if idle?
  • How should zombie plug-ins be handled?
  • If socket-based, how should port numbers be managed?
  • How should multiple plug-ins providing the same capability be resolved? How about versions?

While in-memory plug-ins do not automatically solve all the above, they do eliminate some of them easily.

Interesting Options Suggested

An interesting option that was suggested was to package the Go binary distribution as part of my product. Then, when a plug-in is downloaded, the main program itself could be re-compiled and re-linked to include the plug-in. This is a possibility. Some infrastructure code has to be written, though.

Another family relates to embedding a scripting language. This is another possibility. However, it is neither fair nor acceptable to force third-party plug-ins to have to always suffer (the relatively) lower performance because of the scripting language itself. This may become very important if the plug-in is intended for use in the initial stages of the work flow.

The Need for a Java API

Independent of the above, there is no straight-forward way to provide a Java API on the top of an application written in Go. This issue remains unaddressed.

2012-03-01

Difficult decision

Considering the number of hits this post continues to have on a weekly basis, I should point out that a follow-up post can be found at A Difficult Decision - 2.

I like Go as a language. It is a rare, good balance between simplicity, power and expressiveness. I was so impressed with it as to begin the development of the next version of my chemistry product in it. There were minor hiccups, but I made progress, and wrote as many as about 12,000 lines of code (includes comments).

Then, I paused.

Now, about two months later, I have decided to shift the development of the new version of my product to … Java. If that sounds anti-climactic, so it is!

There are two significant reasons for this decision.

Plug-ins

The intended users of my product (and its API) are organic chemists and cheminformatics programmers. In practice, several organic chemists neither program themselves nor have programming assistants.

My product has several basic capabilities built into it. These are exposed as callable or configurable algorithms. What I also want to do is enable the user to extend the capabilities of the product through plug-ins.

In the Java land, this is well-studied problem, with more than one established way of solving. In the Go scenario, on the other hand, this is not straight-forward. Go programs are linked statically. There are requisites to enable plug-ins.

What would I need to do?

  1. Supply the statically-linked full product.
  2. Supply *.a files of my product's packages, and *.o files of the driver part.

What would my customer need to do?

  1. Install gcc and family and their dependencies. [Once; update: not needed after Go1. Thanks Andrew Gerrand!]
  2. Install (and maintain) the Go development environment. [Once, at least after Go1.]
  3. Download the plug-in package file .a, and place it in a designated directory.
  4. Build and install the application.[1]

Why will this not work?

It could, with programmers. It, unfortunately, will not, with chemists. For many chemists, the exertion of supplying input to a computer program and reading its output using a suitable viewer, is a considerable condescension. Should I ask them to compile code, and that too for each plug-in, typical chemists will laugh me out of court!

The API Consumption Issue

Most foot soldier programmers at pharmaceutical companies, biotechnology companies and informatics services providers have graduated in the last ten years, or so. Most of them learned Java as their first language, and have used it the most. Naturally, they tend to gravitate towards solutions with a published Java API.

Chemistry product companies acknowledge this. Almost all of them provide a Java API for their products and libraries.[2] Those that have remained with a non-Java API are certainly not the leaders today!

Therefore, for me to appeal to those programmers, I must provide a fully capable Java API. My prospects of commercial success depend on that factor in no minor way.

Conclusion

Accordingly, I have shifted the product to Java. My team and I have begun re-designing and re-implementing the legacy product — this time to suit a more Java-like style; to appeal to Java audience!



[1] goinstall does not help, since it needs source code to be available. Plug-in authors may not be willing to publish source code.

[2] If they cannot provide a proper Java API, they at least provide a JNI wrapper.

2012-02-27

Debian Wheezy : Removing unused locales and translations

Removing Locales

Debian includes several locales and translations in its default installation. Typically, we need very few of them, mostly just one! Here is how I removed unused locales and translations from my Debian Wheezy installation. This should apply to Sid too, but I have not verified it. The following steps should be executed as root.

  • Issue the command locale -a -v to see a full list of the locales currently installed. Should this list already match your requirement, you can skip the remainder of this section.
  • Edit (or create) the file /etc/default/locale. Include the entries for your desired locale. My file looks as follows.
  • #  File generated by update-locale
    LANG=en_IN
    LANGUAGE="en_IN:en"
    
  • Remove all the files in the directory /usr/lib/locale.
  • Now, regenerate the locales based on your default configuration.

Here is the sequence of steps.

> locale -a -v


> vi /etc/default/locale


> cd /usr/lib/locale
> rm -fr *

> locale-gen

Removing Translations

Unneeded translations consume disk space, network bandwidth (when updating or upgrading), and can potentially make glibc larger. To remove unused translations, execute the following steps as root.

> cd /etc/apt/apt.conf.d
> touch 99translations


> cat 99translations
Acquire::Languages "none";


> cd /var/lib/apt/lists
> rm -f *Translation*

If you want to be sure, you can reboot the system. Now, when you apt-get update or apt-get upgrade, you should no longer have unused translations checked for, updated or downloaded.

2012-01-09

Debian Wheezy : Running in VMware Fusion 4.1

Note: The following applies to Debian Sid as well.

Here is some information on how I could get Debian Wheezy to run successfully in VMware Fusion 4.1.

The installation itself was uneventful. Once Wheezy installed, I proceeded to unpack VMware Tools, and run the installation script. However, the version of VMware Tools that comes with VMware Fusion 4.1 does not work properly with kernel 3.2.x.

After a few failed attempts at patching the code and trying again, I uninstalled VMware Tools. I decided to try open-vm-tools instead.

Here is the sequence of steps that I executed (as root, of course).

  • apt-get update
  • apt-get install build-essential
  • apt-get install open-vm-tools

You will see some errors about modules not being available. It is normal. Please ignore them, and proceed.

  • apt-get install open-vm-dkms

The above should install dkms itself as a dependency. By the way, dkms is a system that can install dynamically loadable kernel modules. The above package contains the source code of the Open VM Tools kernel modules.

  • cd /usr/src
  • ls

You should see a directory for open-vm-tools. Please note the version number, which is the part after open-vm-tools-. Now, issue the command

  • dkms install open-vm-tools/<version>

where you should substitute the version number that you found out above. That installs the necessary kernel modules. It is necessary to create an appropriate entry in /etc/fstab for your host Mac's share. An example entry is shown here.

.host:/mac    /mnt/hgfs/mac    vmhgfs    defaults    0  1

For X automatic re-sizing and copy-and-paste between OS X and Wheezy, you have to install one last package.

  • apt-get install open-vm-toolbox

That is it. Now, reboot the virtual image, and enjoy your Debian Wheezy with better host integration!

2012-01-07

Braces, scopes and compilers

I ran into an interesting problem with my legacy chemistry application (not the new one). We have over 5,700 tests for the application. Each test includes a molecule that characterises a family of products and another of reactants. Outside those tests, we use several standard – as in drugs – molecules to test the capabilities of the program.

For some time now, due an error in the test scripts driver, the tests have not been actually getting run. The driver has falsely been reporting success for each test. Two days ago, I noticed that, and corrected the driver script. I then ran the corrected script. 5,743 of the 5,744 tests passed.

Test m5212, the failing molecule, was an interesting case. After some debugging, I localised the error to a file SRC/map.cpp, function initialmaps(). Here is the code segment in question.

4533:  if (ignoretop)
4534:      for (j = 1; j <= pmol.numat; ++j)
4535:        if ((uniqueinmol(pmol,j) || (pmol.at[j].top == j))
4536:          &&
4537:          (1 != pmol.at[j].unsat))
4538:              for (g = 1; g <= mol1.numat; ++g)
4539:                  if ((uniqueinmol(mol1,g) || mol1.at[g].top)
4540:                  &&
4541:                  ((pmol.at[j].attribute == mol1.at[g].attribute) ||
4542:                  ((pmol.at[j].atnum == mol1.at[g].atnum) &&
4543:                  pmol.at[j].heteroaromatic &&
4544:                  mol1.at[g].heteroaromatic) ||
4545:                  tautomeric(pmol,mol1,j,g)))
4546:                      if (fit(pmol,tempmap,j,g))
4547:                          {
4548:                          tempmap.addapair(pmol,j,g);
4549:                          nmap.AddToTail(tempmap);
4550:                          check_nummap_limit();
4551:                          tempmap.clear();
4552:                          }
4553:  else
4554:      for (j = 1; j <= pmol.numat; ++j)
4555:          if ((uniqueinmol(pmol,j) ||
4556:          (pmol.at[j].top == j) ||
4557:          ((6 != pmol.at[j].atnum) && !pmol.at[j].top) ||
4558:          ((gring = atring.ringcontaining(pmol,j)) &&
4559:          pmol.ringlist[gring].present(pmol.at[j].top)))
4560:          &&
4561:          (1 != pmol.at[j].unsat))
4562:              for (g = 1; g <= mol1.numat; ++g)
4563:                  if ((uniqueinmol(mol1,g) ||
4564:                  (mol1.at[g].top == g) ||
4565:                  ((rring = atring.ringcontaining(mol1,g)) &&
4566:                  (mol: 1.ringlist[rring].present(mol1.at[g].top))))
4567:                  &&
4568:                  ((pmol.at[j].attribute == mol1.at[g].attribute) ||
4569:                  ((pmol.at[j].atnum == mol1.at[g].atnum) &&
4570:                  pmol.at[j].heteroaromatic && mol1: .at[g].heteroaromatic)
4571:                  || tautomeric(pmol,mol1,j,g))) // ex.  MOL/: m2944
4572:                      if (fit(pmol,tempmap,j,g))
4573:                          {
4574:                          tempmap.addapair(pmol,j,g);
4575:                          nmap.AddToTail(tempmap);
4576:                          check_nummap_limit();
4577:                          tempmap.clear();
4578:                          }

Experienced programmers must have already realised the issue. For the novices, here is a little explanation. From the indentation, it appears as if the programmer's intention was to have the else on line 4553 act in conjunction with the if on line 4533. Unfortunately, though, C++ does not look at the code that way. It matches the else on line 4553 with the most recent unbalanced if. And, such an if occurs on line 4546. Here is how re-indented code looks.

4533:      if (ignoretop)
4534:          for (j = 1; j <= pmol.numat; ++j)
4535:              if ((uniqueinmol(pmol,j) || (pmol.at[j].top == j))
4536:                      &&
4537:                      (1 != pmol.at[j].unsat))
4538:                  for (g = 1; g <= mol1.numat; ++g)
4539:                      if ((uniqueinmol(mol1,g) || mol1.at[g].top)
4540:                              &&
4541:                              ((pmol.at[j].attribute == mol1.at[g].attribute) ||
4542:                               ((pmol.at[j].atnum == mol1.at[g].atnum) &&
4543:                                pmol.at[j].heteroaromatic &&
4544:                                mol1.at[g].heteroaromatic) ||
4545:                               tautomeric(pmol,mol1,j,g)))
4546:                          if (fit(pmol,tempmap,j,g))
4547:                          {
4548:                              tempmap.addapair(pmol,j,g);
4549:                              nmap.AddToTail(tempmap);
4550:                              check_nummap_limit();
4551:                              tempmap.clear();
4552:                          }
4553:                          else
4554:                              for (j = 1; j <= pmol.numat; ++j)
4555:                                  if ((uniqueinmol(pmol,j) ||
4556:                                              (pmol.at[j].top == j) ||
4557:                                              ((6 != pmol.at[j].atnum) && !pmol.at[j].top) ||
4558:                                              ((gring = atring.ringcontaining(pmol,j)) &&
4559:                                               pmol.ringlist[gring].present(pmol.at[j].top)))
4560:                                          &&
4561:                                          (1 != pmol.at[j].unsat))
4562:                                      for (g = 1; g <= mol1.numat; ++g)
4563:                                          if ((uniqueinmol(mol1,g) ||
4564:                                                      (mol1.at[g].top == g) ||
4565:                                                      ((rring = atring.ringcontaining(mol1,g)) &&
4566:                                                       (mol1.ringlist[rring].present(mol1.at[g].top))))
4567:                                                  &&
4568:                                                  ((pmol.at[j].attribute == mol1.at[g].attribute) ||
4569:                                                   ((pmol.at[j].atnum == mol1.at[g].atnum) &&
4570:                                                    pmol.at[j].heteroaromatic && mol1.at[g].heteroaromatic)
4571:                                                   || tautomeric(pmol,mol1,j,g))) // ex.  MOL/m2944
4572:                                              if (fit(pmol,tempmap,j,g))
4573:                                              {
4574:                                                  tempmap.addapair(pmol,j,g);
4575:                                                  nmap.AddToTail(tempmap);
4576:                                                  check_nummap_limit();
4577:                                                  tempmap.clear();
4578:                                              }

How nice! This disconnect between the programmer's intention and C++'s understanding of the code is – evidently – caused by the absence of braces { and } in appropriate places to delimit the scopes. For that – and to aid my own comprehension – I use braces always; even if the scope has only one statement.

Now, if I insert a { at the end of line 4533 and a } before the else on line 4553, test m5212 passes. But, four other tests, all of which pass otherwise, fail! They are m3651, m3747, m4750 and m5211. Sigh! This needs a deeper investigation.