## 2013-10-16

### Captions on Indian trucks - an unexpected lesson

Several years ago, when I lived in Bengaluru, my company's offices used to be in Electronics City (for a few years). Owing to the distance and the disgusting volume of traffic, I used to leave for the office rather early, around 07:00.

In those early hours, long-distance trucks were permitted to travel through the city. As I overtook them, I used to read the captions written on those trucks. A particular caption was very, very common on trucks coming from the North: burI nazar vAlE, tErA muH kAlA" (roughly oh you who cast an evil eye on me, your face shall become blackened").

After a while, I grew so familiar with it, that I usually read only the first word before turning my attention to the next truck.

On a particular day, I had this truck right ahead of me, when we stopped at a traffic signal. The caption began with the usual burI", and I almost turned in another direction, but something pulled my attention back. The caption read: burI nazar vAlE, tErA bhI bhalA hO" (roughly oh you who cast an evil eye on me, I wish you well in spite of that").

I was stunned! It took me a while to digest that. Am I equal to that spirit? I don't think so. Nonetheless, it has had a rather mysterious affect on my thinking!

## 2013-08-14

### Ring Detection - 1

In this two-part series, we look at how Ojus Chemistry Toolkit (OCT) currently implements ring detection. In this part, detection of rings and ring systems is described.

## Preparation

N.B. OCT's molecule reader converts any hydrogen atoms that it encounters into implicit hydrogens, i.e., the list of atoms and bonds in an OCT molecule never contains a hydrogen atom. Similarly, the number of hydrogens attached to an atom is determined looking at the atom's valence and any charge specified for it.

N.B. The Frerejacque value of the molecule is checked to see if there are any cycles at all, before the following procedure is employed.

When we are interested in finding cycles (and only cycles), we should not look at open branches — those that contain only leaves. Accordingly, we remove them. In reality, all of this computation happens on temporary data structures, not the actual molecule itself.

### Removing Terminal Chains

Remember terminal atoms from this old article? Quickly, a terminal atom is one which has a single neighbour. We remove all such atoms. However, removing a terminal atom could make its sole neighbour a terminal atom itself! We have to remove it, too. But, then, removing it … ad infinitum. Here is the pseudocode.

    var removing := true.

while (removing) {
removing := false.

for each atom a {
if a is a terminal {
remove a.
removing := true.
}
}
}


Note that removing an atom entails removing all the bonds in which it participates, etc. Thus, by the time the outer loop exits, all terminal chains will have been removed.

## How Many Rings?

### One Ring or More?

At this point, if all the atoms have exactly two neighbours, there is only one ring in the molecule. Why? More than one ring implies that a second ring is either fused to the first one, or is independent of it, but connected through a linking chain of atoms. In either case, there should be at least one atom that is a junction — it should have at least three bonds.

## Cycle Detection

### Case of One Ring

In the case of the molecule having only one ring, we mark that ring as the sole ring. We also create a single ring system with that sole ring. And, we are done!

### Case of Multiple Rings

Detection of two or more rings present in the molecule is (rather) evidently more complex than that of a single ring. Over the last decade, or so, I have written four or five distinct algorithms to detect multiple rings in a molecule. All of them employed a recursive depth-first approach. The recursive approach made certain parts of the algorithm easier and clearer, while complicating certain others.

This time, though, I chose to employ a breadth-first approach. The outline is as follows.

    var candidates := new list of paths.
var path := new list of atoms.

var a := the first non-junction atom in the molecule.
if no such atom exists {
a := the first atom in the molecule.
}
path := path with a appended.
candidates := candidates with path appended.

while (candidates is not empty) {
var test-path := fetch the first path in candidates.
try-path test-path.
}


As you can see, I have a queue into which I inject all candidate paths. A candidate path is potentially incomplete, and may need to be extended by adding a neighbour of the last atom in it, should one such exist. But that is part of processing the path.

#### Process a Test Path

    var current := last atom in test-path.

for each neighbour nbr of current {
if nbr was visited already in test-path {
var ring := path from previous occurrence of nbr to current.
if ring is valid {
record ring.
}
continue loop.
}

var new-path := test-path with nbr appended.
candidates := candidates with new-path appended.
}


Thus, we add each path extension to the queue of candidates, at each atom that we encounter. Now, given a candidate ring, how do we validate it? First, the easy case: if the candidate has only three atoms, then it is certainly a genuine ring. Good! But, what if it has more than three?

#### Preliminary Validation

    if size of test-path is 3 {
answer true.
}

for each atom a in test-path {
if more than 2 neighbours of a exist in test-path {
/* Must be a pair of rings fused by a single bond. */
answer false.
}
}

/* Must be a simple closed curve. */
answer true.


## Ring System Detection

Once we have recorded all rings, we sort them by size. The next task is to partition this set of rings into equivalence classes of ring systems. Two given rings belong to the same ring system if they have at least one bond or at least one atom in common.

### Partitioning

The actual task of partitioning the set of rings is quite straightforward.

    var ring-systems := new list.

outer:
for each recorded ring r {
for each ring system rs {
if r and rs share a bond or
r and rs share an atom {
mark r as belonging to rs.
rs := rs with r appended.
continue outer.
}
}

var new-rs := new ring system.
new-rs := new-rs with r appended.
ring-systems := ring-systems with new-rs appended.
}


## Conclusion

We have, now, identified the rings in the molecule, and have partitioned the set into ring systems. However, this set could contain potentially spurious rings. Eliminating the false candidates is a surprisingly more difficult procedure. We shall look into it in the next post!

## 2013-04-30

### C++11's strongly-typed enums

C++11 provides strongly-typed enums. These new enums are no longer simple sets of integers; they are classes in their own right. And, their instances are objects, too!

In the following example, we define several enumerated values for possible stereo configurations of a chemical bond.

enum class BondStereo : int8_t
{
NONE
, UP //!< Bond points up from the plane.
, DOWN //!< Bond points into the plane.
, UP_OR_DOWN //!< Bond may point up from or into the plane.
, E //!< Bond may be single or double, and has entgegen configuration.
, Z //!< Bond may be single or double, and has zusammen configuration.
, CIS //!< Double bond having cis configuration.
, TRANS //!< Double bond having trans configuration.
, UNSPECIFIED //!< Bond has unknown stereo geometry.
};


Note that the above enum class BondStereo has int8_t as its underlying data type. C++11 allows you to specify the precise integral data type that should be used. In turn, that allows us to choose specific types to best suit the required ranges of the enumerated values.

However, with this felicity comes an inconvenience. Since the instances of these enums are no longer simple integers, they cannot be used as such! The following is a simple and generic way of obtaining the underlying integral value of a given instance.

template <typename E>
auto enumValue(const E e) -> typename std::underlying_type<E>::type
{
return static_cast<typename std::underlying_type<E>::type>(e);
}


Note the use of auto and the trailing return type declaration. This is needed, since we do not know in advance the exact underlying type of a given instance.

Great, you say, but which headers should I include to be able to use this functionality? Well, that is left as an exercise :-) Enjoy!

## 2013-04-23

### Operational Disaster

My return to blogging did not turn out to be a real return, evidently! Here is an update after several months, again!

## Operational Disaster

An unprecedented, peculiar and curious sequence of quick events took place in the middle of March. I was investigating the possible use of VirtualBox image as a distribution mechanism for my organic synthesis planning product. My scientist created a CentOS 6.4 image with the program and its dependencies, and shared it. I made a copy of it, and imported the same into VirtualBox in my Windows 8 computer. It worked well, as intended. Thus far, the experiment was successful!

To ease testing the program in the VirtualBox environment, I mounted my primary data virtual disk in the CentOS image. The testing itself was successful. It appeared as if this mechanism was proving to be technologically sound and convenient as well.

### Disaster Strikes!

Some more testing later, I decided to re-test the entire procedure. I removed the CentOS image from VirtualBox. However, it complained of an existing image when I tried to import the appliance the next time. Since the old image was no longer needed, I asked VirtualBox to delete it from the disk. It did so!

I ran through the procedure again, with the same initial success. I was quite satisfied with this approach. Then, exactly as before, I tried mounting my primary data virtual disk in this new CentOS image. Navigating to the drive where it was located, I was astonished to not find it there! Concerned, I opened Windows explorer, and navigated to the said drive. Indeed, the .vmdk file was gone!

As of the time that I deleted the CentOS image, the mounted external virtual disk had an entry in its /etc/fstab. So, when VirtualBox deleted the image, it deleted the mounted disk's file as well! VirtualBox claims that mounted disks that are used by other images are not deleted. How did this deletion happen, then?

The answer is straight-forward: the mounted disk was not used by any other VirtualBox image! It was the home partition of another Linux image, but that was a VMware image! So, VirtualBox was indeed true to its promise. But, my data was lost!

### More Disaster!

I was distressed at having lost my home partition in this unforeseen manner. The lost virtual disk was my primary data drive. The host Windows home folder did not have anything of consequence, except in the Downloads folder. Fortunately, I had a backup from January. It was stored in a separate NTFS drive in the laptop, as well as an external USB disk.

So, losing no time, I proceeded to restore the contents from that backup. It was an encrypted backup. Since I always name my top-level folders with a common prefix, I asked for the files to be restored to my Windows home folder. Towards the very end of the unpacking exercise, something went wrong, and the system froze! I gave it several minutes to recover, but to no avail. There was no disk activity; no sign of life; even the Caps Lock key stopped responding after a while. I waited patiently ... with my fingers crossed, for over half-an-hour. The computer had really frozen!

Disappointed, I switched the power off. With considerable trepidation, I switched it on a few seconds later. Truly enough, the system froze while booting! I felt really let down. I tried restarting the computer a few times, with the same inevitable result — the login screen wouldn't appear. My entire Windows had become unusable! Two strokes in a row.

### Rescue Attempt

I used my wife's computer to read about rescuing Windows. There was a Windows rescue partition (drive) in the hard disk. I tried to restore Windows using that rescue partition, following the instructions that I found in Microsoft's Web site. Alas! My earlier impulse upgrade to Windows 8 haunted me unpleasantly there. The rescue image was that of Windows 7, which came pre-installed with the HP hardware. When I attempted a rescue, it kept informing me that I had a more recent version of Windows installed, and that it couldn't rescue it!

For an hour, or so, I was quite dumbstruck. But then, life has to go on, of course! I contemplated possible courses of action. And finally chose what I knew best.

## Go The Linux Way!

I burned Kubuntu 12.04 LTS to a DVD, and booted into the Live image. After making sure that all essential hardware worked as expected, I installed it to the hard disk as the only operating system. During the installation, I opted for the proprietary firmware for the Broadcom 4313GN wireless card, etc., to be downloaded and installed. Everything went smoothly. The only irritating aspect was the download of over a dozen English language packs! Readers will remember that I was irritated by this in my old Debian (virtual) installation too.

Upon re-booting, I found – as expected – a few hundreds of updates. Accordingly, I allowed the system to update. Next, from the Kubuntu PPA, I upgraded KDE to 4.10.1. After verifying that the system was working properly, I restored the backup from the external USB disk. It Just Worked!

### Laptop Power Management

Initially, the laptop battery powered the system for only about 2 hours. Windows (both 7 and 8) used to last between 4 and 5 hours on a single charge. I read several articles and blog posts on what all improve battery life in Linux. None of them improved the situation measurably! powertop showed a baseline discharge rate of over 23W when idling. That was disappointing!

My HP laptop has an integrated Intel 4000 HD graphics card and an nVidia GEFORCE 630M discrete graphics card. I realised that I had to try Bumblebee. Following the instructions in one of the Ubuntu fora, I installed Bumblebee. I installed the primus configuration rather than the optimus one.

Having read a few bad things about Bumblebee, I had a trepidation similar to that I had when I re-booted the frozen Windows system. Fortunately, though, Kubuntu booted normally, and to my great relief, Bumblebee worked. powertop showed a new baseline consumption rate of a little over 10W when idling! Now I get the same 4-5 hours of battery life on a single charge!

### What Do I Not Like?

The power management is a little too eager. It puts the wireless interface to sleep every few minutes. For it to wake up takes several seconds upon next use. I have to keep staring at the screen until then, sometimes a little impatiently. These days, though, I use powertop to put the Broadcom 4313GN wireless card into bad state so that it is not put to sleep so aggressively.

### What Do I Miss?

All of this is fine, of course, but do I miss anything? What I miss most is Google Drive native application. I usually do not sign on-line petitions, but I did sign the one at Drive4Linux. I was disappointed to find only about 1,500 signatures. Nevertheless, I signed it, and requested my G+ contacts to follow suit if they use Linux and Google Drive.

Other than the above, I have not felt any notable inconvenience or loss of functionality, so far! Thus, after a break of about five years, I have returned to running Linux natively in my primary computer!!

## 2012-12-27

After several months of silence, here I return to blogging! A few quick updates are in order, in no particular order.

## Trivia

• When the battery of my MacBook Pro began failing in May, I purchased a relatively low-end HP Pavilion dv6 7040TX pre-installed with Windows 7. I mostly like it. It generates very little heat. By contrast, the MacBook Pro is a mini heater for the winter. Another noticeable feature is battery life: I am getting about five hours of development time per charge. The only downside is the low screen resolution, which is 1366x768. In practice, though, it has proved to be adequate for my development needs.
• It was a rare occasion when I surprised myself by impulsively upgrading the HP computer to Windows 8! My unfamiliarity with Windows was amply proved by the numerous devices and driver difficulties I encountered upon upgrading. Reading a related Microsoft Knowledge Base article revealed that there was an important step that I missed. [For the curious, we are supposed to uninstall and re-install the devices when upgrading in-place.]
• VMware Player now offers OpenGL-based 3D support for Linux guests. Upon upgrading to the new version of Player, I realised promptly that Debian Wheezy had a problem that prevented it from recognising and utilising 3D devices. It appears as if Sid has this problem as well, since my experimental Aptosid image failed to turn on desktop effects.
• Thus, I now run Linux Mint 14 KDE. [Of course, it is KDE!] It has been quite stable for my daily development needs (several Emacs windows, Eclipse 3.7 and several Konsole windows and tabs). This is in stark contrast to the frustrating experience with the Cinnamon version, which I downloaded first, mistaking it to be the KDE version. This demonstrates — yet again — why choice is so important, and why it underlies the philosophy of free and open source software!

## Largely distracted months

I went through several months of non-work distractions. I am glad that those are nearing their respective conclusions. Not being able to concentrate on work can be really frustrating. More so if one's To Do list is long.

## Experiments with languages

During these largely unproductive months, I studied a few languages, peripherally. Here is a summary.

I had briefly looked at Haskell, in 2000. It looked so different, I promptly left it. Having gained a little more of functional thinking in the meantime, I decided to take another look at it. A good motivation was Hughes' influential paper "Why Functional Programming Matters". Some Haskell features are straight-forward: type classes, pattern matching, guards in functions, list comprehensions, etc. Some others are deep: higher-order functions, currying, lazy evaluation, etc. A few others go even deeper: functors, applicatives, monads, etc. Haskell demands considerable time and effort — not only the language proper, but the tool chain too. The syntax is layout-sensitive, and I could not even find a decent major mode for Emacs. The package tool called cabal repeatedly left my Haskell environment in an unstable state. Tooling is certainly a problem for the beginner, but otherwise, Haskell is a profound language that makes your thinking better!

### Dart

Dart is a curious mix of the semantics of Smalltalk, the syntax of Java and JavaScript, and memory-isolated threads that communicate by sending messages. Added into this curious mix is a compile-time type system that does not affect the run-time object types! Mind you, Dart is strongly-typed. Even though there is a compile-time type system, it is optional and is primarily intended for better tooling, and the language itself is dynamically-typed. The types are carried by the objects, but the variables themselves are untyped. Dart's biggest promise is the ability to write scalable Web applications using a single language on both the server side and the client. The server side seems to present no problems, but the Web programming community is divided in its opinion on Dart's client side promise. The contention arises because Dart has its own virtual machine. Using the VM requires the user to install it as a plug-in in her browser. For those who do not want to use the VM, Dart comes with a cross-compiler that outputs equivalent JavaScript code.

### D

I had known of the existence of D for several years, even through I never looked at it in detail. Reading a little about the history of D, I realised that it underwent a rather tumultuous adolescence. With D2, it appears to have entered adulthood. The motivation to look at D was an MSDN Channel 9 video of a presentation by Andre Alexandrescu. D was designed to be a better C++. Several of the design decisions behind it can be better appreciated if we hold that premise in mind. It has a simplified, more uniform syntax, garbage collection, a decent standard library and easier generics. It maintains the ability to directly call a C function in an external library, immediately making a vast set of C libraries accessible. Scoped errors and compile-time function evaluation are examples of D's interesting features. Another notable feature is the partitioning of the language features into safe, trusted and unsafe subsets, with the ability to verify that a module is completely safe, etc. D has good performance that is reasonable compared to that of C++.

### Others

I also looked briefly at Erlang and Clojure. However, I did not spend enough time on them to be able to form an opinion.

## 2012-04-23

### The changing face of urban Hyderabad

A few days ago, my family went shopping to the Ameerpet area of Hyderabad. We shopped for about an hour-and-a-half; the time was 17:00. My wife wanted to have some coffee (so did I, in fact). We could not find a place that served coffee in the immediate vicinity. We walked in the general direction of a few restaurants. Thus began an amazing hour of discovery!

We went into the first restaurant that we came across. We seated ourselves at the first table available. Presently, a waiter turned up. Two strong, hot coffees," I said. No, sir," he replied promptly, we don't serve coffee." I was surprised. We picked up the bags, and walked on.

At the next restaurant, we were cautious. We did not go as far as seating ourselves; rather, we waited for a waiter to approach us. Do you serve coffee?" I enquire. We get the same reply, No." I was more surprised. We walked on.

The third restaurant was a familiar one. It has been around for over twenty five years. The last I had visited it, it used to serve coffee, tea and snacks. However, that was several years ago. My four-year-old son complained of hunger by this time. He wanted a pesarattu (a special Telugu dish that is a kind of thin-and-large pancake). I felt that there was a high probability that this restaurant would serve both pesarattu and coffee. So, we climbed up a floor to the restaurant. No, sir. We used to serve South Indian food until about six months ago. We no longer do. Now, we serve Mughalai, Tandoori and Chinese!" I was mildly astonished. My wife and I sighed simultaneously, and we walked on.

My son was very disappointed. As we walked, he was eagerly watching for another restaurant. This time, we had to walk quite some distance before we came across another. Its look made it clear that it was a very non-vegetarian-oriented restaurant. We did not bother to walk in. My wife and I had a quick consultation, and decided to turn around, pass the shopping area, and try in the other direction.

My son's disappointment grew with each passing twenty five metres, or so. He started getting petulant. We negotiated the distance back to the shopping area with some difficulty, coaxing my son along the way. As we walked past that, we soon realised that there were no restaurants within sight! By this time, we had spent close to an hour covering a total of a little over a kilometre, without finding a place that served South Indian snacks and coffee! We resigned, got into the car, and drove back home.

The episode left me wondering, however, about the dramatic transformation that Hyderabad has undergone in the last couple of decades. It is very difficult these days to find decent (or even semi-decent) restaurants that serve Telugu vegetarian food. I have noticed the same trend in Bengaluru too, particularly for supper. A large number of restaurants have colluded to systematically eliminate South Indian menus. A key reason is that Mughalai, Tandoori, Chinese, etc. food is much more expensive. The restaurants earn significantly more per table-hour when they serve them. The constant in-flow of North Indians into Hyderabad has only made it easier for the restaurants to switch over.

Another dimension that has seeped in over the years is that of western fast food (pizzas, burgers, etc.). In the name of maintaining international quality at an international price, the western chains charge ridiculously high prices (by Indian standards) for such fast food. We have to remember, however, that economic liberalisation has placed sudden money and means in the hands of an entire new crop of employees and entrepreneurs (and their pizzas-and-potato-chips brats). India has, consequently, been witnessing rapid changes in urban social patterns. The new-found affluence has resulted in a large number of families dining out several times a week. And, in the name of novelty, a vast majority of them patronise the more expensive varieties. The smaller restaurants, obviously, do not wish to let the opportunity slip by. We see, thus, a steady decline in the number of restaurants serving native food.

Craving for the new often dislodges the old! In this instance, Telugu (South Indian, in general) food and beverages are the casualty!

## 2012-04-07

### Graphs and molecules - 2

Note: This post utilises MathJax to display mathematical notation. It may take a second or two to load, interpret and render; please wait!

## Ordering

The notion of ordering is very intuitive in the context of natural numbers. Indeed, when we learn natural numbers, their representation $$\bbox[1pt]{\{1, 2, 3, 4, \ldots\}}{}$$ itself imprints an ordering relationship in our minds. Soon enough, we learn to assign a sense of relative magnitude to those numbers: 4 is larger than 2, etc. This concept extends naturally to negative numbers and rational numbers too.

### A little rigour

Suppose that we represent the ordering relationship between two elements of a set using the symbol $$\le$$. Then, we can define the properties that a set $$S$$ should satisfy for it to be ordered.

• Reflexivity: $$a \le a\forall a \in S$$
• Antisymmetry: if $$a \le b$$ and $$b \le a$$, then $$a = b\ \forall a, b \in S$$
• Transitivity: if $$a \le b$$ and $$b \le c$$, then $$a \le c\ \forall a, b, c \in S$$

We can readily see that integers and rational numbers satisfy the above properties. Accordingly, we say that integers and rational numbers are ordered, if we assign the meaning smaller than or equal to to the ordering relationship $$\le$$.

In fact, we can see that integers and rational numbers also satisfy an additional property.

• Totality: $$a \le b$$ or $$b \le a\ \forall a, b \in S$$

### A distinction

Totality is a stricter requirement than the preceding three. It mandates that an ordering relationship exist between any and every pair of elements of the set. While reflexivity is easy enough to comprehend, the next two specify the conditions that must hold if the elements concerned do obey an ordering relationship.

It is easy to think of sets that satisfy the former three properties, but without satisfying the last. As an example, let us consider the set $$X = \{1, 2, 3\}$$. Now, let us construct a set of some of its subsets $$S = \{\{2\}, \{1, 2\}, \{2, 3\}, \{1, 2, 3\}\}$$. Let us define the ordering relationship $$\le$$ to mean subset of represented by $$\subseteq$$. Exercise: verify that the first three properties hold in $$S$$.

We see that $$\{1, 2\}$$ and $$\{2, 3\}$$ are elements of $$S$$, but neither is a subset of the other.

Therefore, mathematicians distinguish sets satisfying only the first three from those satisfying all the four. The former are said to have partial ordering, and they are sometimes called posets or partially-ordered sets. The latter are said to have total ordering.

## More ordering

Now, let us expand the discussion to include irrational numbers. Do our definitions apply? There is an immediate difficulty: irrational numbers have non-terminating decimal parts! How do we compare two such numbers? How should we define the ordering relationship? The integral part is trivial; it is the decimal part that presents the difficulty.

### Sequence comparison

In order to be able to deal with irrational numbers, we have to introduce an additional notion — sequences. A sequence is a set (finite or infinite) where the relative positions of the elements matter. Another distinction is that elements can repeat, occurring at multiple places. The number of elements in a sequence, if it is finite, is called its length. Thus, sequences can be used to represent the decimal parts of irrational numbers.

Let $$X = \{x_1, x_2, x_3, \ldots\}$$ and $$Y = \{y_1, y_2, y_3, \ldots\}$$ be two sequences. We can define an ordering relationship between sequences as follows. We say $$X \le Y$$ if one of the following holds.

• $$X$$ is finite with a length $$n$$, and $$x_i = y_i\ \forall i \le n$$ and $$y_{n+1}$$ exists.
• $$X$$ and $$Y$$ are infinite, and $$\exists\ n$$ such that $$x_i = y_i\ \forall i \le n$$, and $$x_{n+1} \le y_{n+1}$$.

Armed with the above definition, we can readily see that we can compare two irrational numbers — in fact, any two sequences. Exercise: verify this claim by comparing two irrational numbers and two sequences of non-numerical elements!

### Bottom and top elements

If a set $$S$$ of sequences has an element $$b$$ such that $$b \le s\ \forall s \in S$$, the element $$b$$ is called the bottom element of the set. The element t that we get if we replace $$\le$$ with $$\ge$$ is called the top element of the set. The bottom element and the top are unique in a given set.

In our first example above, $$\{2\}$$ is the bottom element of the set, while $$\{1, 2, 3\}$$ is the top. However, it is important to understand that bottom and top elements may not exist in a given set of sequences. Exercise: think of one such set.

### Minimal and maximal elements

When a set does not have a bottom element, it is yet possible for it to have minimal elements. For an element $$m$$ to be a minimal element of the set $$S$$, $$s \le m \implies s = m$$ should hold. If we replace $$\le$$ with $$\ge$$, we get maximal elements.

Minimal and maximal elements are difficult to establish (and, sometimes, even understand) in the context of infinite sets or complex ordering relationships. The same applies to bottom and top elements, too.

## Conclusion

You may have begun wondering if the title of this post was set by mistake. On the contrary, these concepts are very important to understand before we tackle canonical representation of molecules, ring systems in molecules, etc., which we shall encounter in future posts.

## 2012-03-20

### Linux distribution chosen!

I had promised to post an update towards the end of January. I did not. However, even the casual reader may have noticed my recent posts related to Debian Wheezy. Those must have served as hints. So, Debian Wheezy it is! I have finally settled on Debian Wheezy with KDE.

Perhaps apt's mechanisms suit my thinking. yum is very powerful, yet the rpm family could not win me over. In fact, at one point, I went close to going back to Arch Linux. However, it is often too cutting-edge — even for a development system.

At the same time, GUI played a non-trivial role in my decision. It also explains why Ubuntu – with its Unity desktop – did not survive in my computer. I felt GNOME to be too restrictive. Some people think that KDE has too many knobs and switches; that it is daunting. Again, perhaps its mechanisms suit my thinking.

With the decision made, I have removed the ISO files of well over a dozen distributions and the VMware images of about half-a-dozen. Peace!

## 2012-03-17

### Debian Wheezy : updating Java alternatives

Debian Wheezy (or even Sid) defaults to Java 6. Originally, my computer had openjdk-6-jdk. I wanted to utilise the newer features in Java 7 such as higher performance and lower memory footprint, and try the Fork-Join framework. Accordingly, I installed openjdk-7-jdk. It updated the Debian alternatives for Java to point to the newer version. So far, so good!

### Dependencies can upset the apple cart

Then, I installed Eclipse using apt-get. The version of Eclipse installed is 3.7.1, which is fine. However, it pulls in Java 6 as a dependency. I somehow did not pay attention to that. As the installation completed, I noticed several messages informing me that the alternatives for Java were being reset to Java 6. I bit my lip hard! I think that apt-get should explicitly warn the user if an installation downgrades a package, or more, due to dependencies.

### Simple remedy

Fortunately, a simple remedy is possible. But before we begin, we should check the priorities with which both versions are installed. To check the same, issue the following command in a terminal.

> update-alternatives --display javac

# You should see something like the following.
javac - auto mode
/usr/lib/jvm/java-6-openjdk-amd64/bin/javac - priority 1061
slave javac.1.gz: /usr/lib/jvm/java-6-openjdk-amd64/man/man1/javac.1.gz
/usr/lib/jvm/java-7-openjdk-amd64/bin/javac - priority 100
Current 'best' version is '/usr/lib/jvm/java-6-openjdk-amd64/bin/javac'.


Please note the numbers at the end of the full paths of javac. So, both Java 6 and Java 7 are installed, but Java 6 has a higher priority — 1061 to 100. It is, therefore, considered the best version. We can check where /etc/alternatives/javac points, too, for confirmation.

The remedy to apply itself utilises update-alternatives. In order to take care of all the important JDK components in one shot, I collected the commands into a shell script.

> cat up-java-alt.sh

#!/usr/bin/env sh
#
# Update Debian alternatives for Java.

update-alternatives --install \
/usr/bin/java java \
/usr/lib/jvm/java-7-openjdk-amd64/bin/java 1100

update-alternatives --install \
/usr/bin/appletviewer appletviewer \
/usr/lib/jvm/java-7-openjdk-amd64/bin/appletviewer 1100

update-alternatives --install \
/usr/bin/apt apt \
/usr/lib/jvm/java-7-openjdk-amd64/bin/apt 1100

update-alternatives --install \
/usr/bin/extcheck extcheck \
/usr/lib/jvm/java-7-openjdk-amd64/bin/extcheck 1100

update-alternatives --install \
/usr/bin/idlj idlj \
/usr/lib/jvm/java-7-openjdk-amd64/bin/idlj 1100

update-alternatives --install \
/usr/bin/jar jar \
/usr/lib/jvm/java-7-openjdk-amd64/bin/jar 1100

update-alternatives --install \
/usr/bin/jarsigner jarsigner \
/usr/lib/jvm/java-7-openjdk-amd64/bin/jarsigner 1100

update-alternatives --install \
/usr/bin/javac javac \
/usr/lib/jvm/java-7-openjdk-amd64/bin/javac 1100

update-alternatives --install \

update-alternatives --install \
/usr/bin/javah javah \
/usr/lib/jvm/java-7-openjdk-amd64/bin/javah 1100

update-alternatives --install \
/usr/bin/javap javap \
/usr/lib/jvm/java-7-openjdk-amd64/bin/javap 1100

update-alternatives --install \
/usr/bin/jconsole jconsole \
/usr/lib/jvm/java-7-openjdk-amd64/bin/jconsole 1100

update-alternatives --install \
/usr/bin/jdb jdb \
/usr/lib/jvm/java-7-openjdk-amd64/bin/jdb 1100

update-alternatives --install \
/usr/bin/jhat jhat \
/usr/lib/jvm/java-7-openjdk-amd64/bin/jhat 1100

update-alternatives --install \
/usr/bin/jinfo jinfo \
/usr/lib/jvm/java-7-openjdk-amd64/bin/jinfo 1100

update-alternatives --install \
/usr/bin/jmap jmap \
/usr/lib/jvm/java-7-openjdk-amd64/bin/jmap 1100

update-alternatives --install \
/usr/bin/jps jps \
/usr/lib/jvm/java-7-openjdk-amd64/bin/jps 1100

update-alternatives --install \
/usr/bin/jrunscript jrunscript \
/usr/lib/jvm/java-7-openjdk-amd64/bin/jrunscript 1100

update-alternatives --install \

update-alternatives --install \
/usr/bin/jstack jstack \
/usr/lib/jvm/java-7-openjdk-amd64/bin/jstack 1100

update-alternatives --install \
/usr/bin/jstat jstat \
/usr/lib/jvm/java-7-openjdk-amd64/bin/jstat 1100

update-alternatives --install \
/usr/bin/jstatd jstatd \
/usr/lib/jvm/java-7-openjdk-amd64/bin/jstatd 1100

update-alternatives --install \
/usr/bin/native2ascii native2ascii \
/usr/lib/jvm/java-7-openjdk-amd64/bin/native2ascii 1100

update-alternatives --install \
/usr/bin/rmic rmic \
/usr/lib/jvm/java-7-openjdk-amd64/bin/rmic 1100

update-alternatives --install \
/usr/bin/schemagen schemagen \
/usr/lib/jvm/java-7-openjdk-amd64/bin/schemagen 1100

update-alternatives --install \
/usr/bin/serialver serialver \
/usr/lib/jvm/java-7-openjdk-amd64/bin/serialver 1100

update-alternatives --install \
/usr/bin/wsgen wsgen \
/usr/lib/jvm/java-7-openjdk-amd64/bin/wsgen 1100

update-alternatives --install \
/usr/bin/wsimport wsimport \
/usr/lib/jvm/java-7-openjdk-amd64/bin/wsimport 1100

update-alternatives --install \
/usr/bin/xjc xjc \
/usr/lib/jvm/java-7-openjdk-amd64/bin/xjc 1100


Please note that we used a priority value of 1100, so that we can assign Java 7 a higher priority than that of Java 6. Now, we run the above script, and check again the alternatives status, and where /etc/alternatives/javac points.

> update-alternatives --display javac

# You should see something like the following.
javac - auto mode
/usr/lib/jvm/java-6-openjdk-amd64/bin/javac - priority 1061
slave javac.1.gz: /usr/lib/jvm/java-6-openjdk-amd64/man/man1/javac.1.gz
/usr/lib/jvm/java-7-openjdk-amd64/bin/javac - priority 1100
Current 'best' version is '/usr/lib/jvm/java-7-openjdk-amd64/bin/javac'.


Enjoy Java 7 again! Don't forget to change the default JVM path in Eclipse, though.

What about the man pages installed as slaves? That part is left as an exercise :-)

## 2012-03-08

### Convergent synthesis, finally!

The legacy C++ version of my chemistry product has finally gained the capability to do convergent synthesis — in a primitive form for now.

### What is convergent synthesis?

Consider a complex molecule that we wish to synthesise. In the usual method, the synthesis steps proceed linearly. Suppose that the following is the sequence of synthesis steps, where Goal designates the molecule to synthesise.

A → B → C → D → E → F → G → H → I → J → Goal


The above sequence has 10 steps in the process. We start with a simple, available molecule A. Presumably, a functional group is either added, replaced or deleted at each step. [Reality includes several more dimensions, but let us keep the discussion simple.] While easy to comprehend, the biggest problem with this is the effective yield of the route. For the purposes of discussion, let us assume an average yield of 85% per step. With 10 steps, the effective yield is less than 20%!

Contrast this with the following scheme.

P → Q → R ⌉
| → Goal
S → T → U ⌋


In this case, we have two independent paths leading to moderately complex molecules R and U. Then, these two paths converge to give rise to a more complex molecule, which in this case is our goal molecule. The expectation is that since R and U are only moderately complex, they can be independently synthesised in a couple of steps each. The effective yield for the convergent route, then, is about 44%! This is, obviously, much more attractive.

### How does it work in retro-synthesis?

My product actually does retro-synthesis, i.e., it starts with the goal molecule, and constructs the steps in reverse order. At each step, the product molecule is broken into reactants. In most scenarios, the coreactant is a trivial molecule; or, it is directly available for purchase from a company like Sigma Aldrich.

If we wish to take advantage of convergent synthesis, on the other hand, how we break a product molecule into possible sets of reactants becomes a matter of extreme significance. In the step R + U → Goal, effective convergence is possible if and only if R and U have about the same degree of complexity.

### What next?

The immediate challenge is to locate such reactions, and build a repertoire of them. Another, of course, is to be able to resolve a product molecule to utilise one such! Our chances depend on being able to identify a reasonably central atom that is suitable for initiating the cleavage of the product molecule. The algorithms have to be refined to exploit this new capability, as well!