2013-10-16

Captions on Indian trucks - an unexpected lesson

Several years ago, when I lived in Bengaluru, my company's offices used to be in Electronics City (for a few years). Owing to the distance and the disgusting volume of traffic, I used to leave for the office rather early, around 07:00.

In those early hours, long-distance trucks were permitted to travel through the city. As I overtook them, I used to read the captions written on those trucks. A particular caption was very, very common on trucks coming from the North: ``burI nazar vAlE, tErA muH kAlA" (roughly ``oh you who cast an evil eye on me, your face shall become blackened").

After a while, I grew so familiar with it, that I usually read only the first word before turning my attention to the next truck.

On a particular day, I had this truck right ahead of me, when we stopped at a traffic signal. The caption began with the usual ``burI", and I almost turned in another direction, but something pulled my attention back. The caption read: ``burI nazar vAlE, tErA bhI bhalA hO" (roughly ``oh you who cast an evil eye on me, I wish you well in spite of that").

I was stunned! It took me a while to digest that. Am I equal to that spirit? I don't think so. Nonetheless, it has had a rather mysterious affect on my thinking!

2013-08-14

Ring Detection - 1

In this two-part series, we look at how Ojus Chemistry Toolkit (OCT) currently implements ring detection. In this part, detection of rings and ring systems is described.

Preparation

N.B. OCT's molecule reader converts any hydrogen atoms that it encounters into implicit hydrogens, i.e., the list of atoms and bonds in an OCT molecule never contains a hydrogen atom. Similarly, the number of hydrogens attached to an atom is determined looking at the atom's valence and any charge specified for it.

N.B. The Frerejacque value of the molecule is checked to see if there are any cycles at all, before the following procedure is employed.

When we are interested in finding cycles (and only cycles), we should not look at open branches — those that contain only leaves. Accordingly, we remove them. In reality, all of this computation happens on temporary data structures, not the actual molecule itself.

Removing Terminal Chains

Remember terminal atoms from this old article? Quickly, a terminal atom is one which has a single neighbour. We remove all such atoms. However, removing a terminal atom could make its sole neighbour a terminal atom itself! We have to remove it, too. But, then, removing it … ad infinitum. Here is the pseudocode.

    var removing := true.

    while (removing) {
        removing := false.

        for each atom `a` {
            if `a` is a terminal {
                remove `a`.
                removing := true.
            }
        }
    }

Note that removing an atom entails removing all the bonds in which it participates, etc. Thus, by the time the outer loop exits, all terminal chains will have been removed.

How Many Rings?

One Ring or More?

At this point, if all the atoms have exactly two neighbours, there is only one ring in the molecule. Why? More than one ring implies that a second ring is either fused to the first one, or is independent of it, but connected through a linking chain of atoms. In either case, there should be at least one atom that is a junction — it should have at least three bonds.

Cycle Detection

Case of One Ring

In the case of the molecule having only one ring, we mark that ring as the sole ring. We also create a single ring system with that sole ring. And, we are done!

Case of Multiple Rings

Detection of two or more rings present in the molecule is (rather) evidently more complex than that of a single ring. Over the last decade, or so, I have written four or five distinct algorithms to detect multiple rings in a molecule. All of them employed a recursive depth-first approach. The recursive approach made certain parts of the algorithm easier and clearer, while complicating certain others.

This time, though, I chose to employ a breadth-first approach. The outline is as follows.

    var candidates := new list of paths.
    var path := new list of atoms.

    var a := the first non-junction atom in the molecule.
    if no such atom exists {
        a := the first atom in the molecule.
    }
    path := `path` with `a` appended.
    candidates := `candidates` with `path` appended.

    while (`candidates` is not empty) {
        var test-path := fetch the first path in `candidates`.
        try-path `test-path`.
    }

As you can see, I have a queue into which I inject all candidate paths. A candidate path is potentially incomplete, and may need to be extended by adding a neighbour of the last atom in it, should one such exist. But that is part of processing the path.

Process a Test Path

    var current := last atom in `test-path`.

    for each neighbour `nbr` of `current` {
        if `nbr` was visited already in `test-path` {
            var ring := path from previous occurrence of `nbr` to `current`.
            if `ring` is valid {
                record `ring`.
            }
            continue loop.
        }

        var new-path := `test-path` with `nbr` appended.
        candidates := `candidates` with `new-path` appended.
    }

Thus, we add each path extension to the queue of candidates, at each atom that we encounter. Now, given a candidate ring, how do we validate it? First, the easy case: if the candidate has only three atoms, then it is certainly a genuine ring. Good! But, what if it has more than three?

Preliminary Validation

    if size of `test-path` is 3 {
        answer `true`.
    }

    for each atom `a` in `test-path` {
        if more than 2 neighbours of `a` exist in `test-path` {
            
            answer `false`.
        }
    }

    
    answer `true`.

Ring System Detection

Once we have recorded all rings, we sort them by size. The next task is to partition this set of rings into equivalence classes of ring systems. Two given rings belong to the same ring system if they have at least one bond or at least one atom in common.

Partitioning

The actual task of partitioning the set of rings is quite straightforward.

    var ring-systems := new list.

    outer:
    for each recorded ring `r` {
        for each ring system `rs` {
            if `r` and `rs` share a bond or
               `r` and `rs` share an atom {
                mark `r` as belonging to `rs`.
                rs := `rs` with `r` appended.
                continue `outer`.
            }
        }

        var new-rs := new ring system.
        new-rs := `new-rs` with `r` appended.
        ring-systems := `ring-systems` with `new-rs` appended.
    }

Conclusion

We have, now, identified the rings in the molecule, and have partitioned the set into ring systems. However, this set could contain potentially spurious rings. Eliminating the false candidates is a surprisingly more difficult procedure. We shall look into it in the next post!

2013-04-30

C++11's strongly-typed enums

C++11 provides strongly-typed enums. These new enums are no longer simple sets of integers; they are classes in their own right. And, their instances are objects, too!

In the following example, we define several enumerated values for possible stereo configurations of a chemical bond.

enum class BondStereo : int8_t
{
    NONE
    , UP 
    , DOWN 
    , UP_OR_DOWN 
    , E 
    , Z 
    , CIS 
    , TRANS 
    , UNSPECIFIED 
};

Note that the above enum class BondStereo has int8_t as its underlying data type. C++11 allows you to specify the precise integral data type that should be used. In turn, that allows us to choose specific types to best suit the required ranges of the enumerated values.

However, with this felicity comes an inconvenience. Since the instances of these enums are no longer simple integers, they cannot be used as such! The following is a simple and generic way of obtaining the underlying integral value of a given instance.

template <typename E>
auto enumValue(const E e) -> typename std::underlying_type<E>::type
{
    return static_cast<typename std::underlying_type<E>::type>(e);
}

Note the use of auto and the trailing return type declaration. This is needed, since we do not know in advance the exact underlying type of a given instance.

Great, you say, but which headers should I include to be able to use this functionality? Well, that is left as an exercise :-) Enjoy!

2013-04-23

Operational Disaster

My return to blogging did not turn out to be a real return, evidently! Here is an update after several months, again!

Operational Disaster

An unprecedented, peculiar and curious sequence of quick events took place in the middle of March. I was investigating the possible use of VirtualBox image as a distribution mechanism for my organic synthesis planning product. My scientist created a CentOS 6.4 image with the program and its dependencies, and shared it. I made a copy of it, and imported the same into VirtualBox in my Windows 8 computer. It worked well, as intended. Thus far, the experiment was successful!

To ease testing the program in the VirtualBox environment, I mounted my primary data virtual disk in the CentOS image. The testing itself was successful. It appeared as if this mechanism was proving to be technologically sound and convenient as well.

Disaster Strikes!

Some more testing later, I decided to re-test the entire procedure. I removed the CentOS image from VirtualBox. However, it complained of an existing image when I tried to import the appliance the next time. Since the old image was no longer needed, I asked VirtualBox to delete it from the disk. It did so!

I ran through the procedure again, with the same initial success. I was quite satisfied with this approach. Then, exactly as before, I tried mounting my primary data virtual disk in this new CentOS image. Navigating to the drive where it was located, I was astonished to not find it there! Concerned, I opened Windows explorer, and navigated to the said drive. Indeed, the .vmdk file was gone!

What Had Happened?

As of the time that I deleted the CentOS image, the mounted external virtual disk had an entry in its /etc/fstab. So, when VirtualBox deleted the image, it deleted the mounted disk's file as well! VirtualBox claims that mounted disks that are used by other images are not deleted. How did this deletion happen, then?

The answer is straight-forward: the mounted disk was not used by any other VirtualBox image! It was the home partition of another Linux image, but that was a VMware image! So, VirtualBox was indeed true to its promise. But, my data was lost!

More Disaster!

I was distressed at having lost my home partition in this unforeseen manner. The lost virtual disk was my primary data drive. The host Windows home folder did not have anything of consequence, except in the Downloads folder. Fortunately, I had a backup from January. It was stored in a separate NTFS drive in the laptop, as well as an external USB disk.

So, losing no time, I proceeded to restore the contents from that backup. It was an encrypted backup. Since I always name my top-level folders with a common prefix, I asked for the files to be restored to my Windows home folder. Towards the very end of the unpacking exercise, something went wrong, and the system froze! I gave it several minutes to recover, but to no avail. There was no disk activity; no sign of life; even the Caps Lock key stopped responding after a while. I waited patiently ... with my fingers crossed, for over half-an-hour. The computer had really frozen!

Disappointed, I switched the power off. With considerable trepidation, I switched it on a few seconds later. Truly enough, the system froze while booting! I felt really let down. I tried restarting the computer a few times, with the same inevitable result — the login screen wouldn't appear. My entire Windows had become unusable! Two strokes in a row.

Rescue Attempt

I used my wife's computer to read about rescuing Windows. There was a Windows rescue partition (drive) in the hard disk. I tried to restore Windows using that rescue partition, following the instructions that I found in Microsoft's Web site. Alas! My earlier impulse upgrade to Windows 8 haunted me unpleasantly there. The rescue image was that of Windows 7, which came pre-installed with the HP hardware. When I attempted a rescue, it kept informing me that I had a more recent version of Windows installed, and that it couldn't rescue it!

For an hour, or so, I was quite dumbstruck. But then, life has to go on, of course! I contemplated possible courses of action. And finally chose what I knew best.

Go The Linux Way!

I burned Kubuntu 12.04 LTS to a DVD, and booted into the Live image. After making sure that all essential hardware worked as expected, I installed it to the hard disk as the only operating system. During the installation, I opted for the proprietary firmware for the Broadcom 4313GN wireless card, etc., to be downloaded and installed. Everything went smoothly. The only irritating aspect was the download of over a dozen English language packs! Readers will remember that I was irritated by this in my old Debian (virtual) installation too.

Upon re-booting, I found – as expected – a few hundreds of updates. Accordingly, I allowed the system to update. Next, from the Kubuntu PPA, I upgraded KDE to 4.10.1. After verifying that the system was working properly, I restored the backup from the external USB disk. It Just Worked!

Laptop Power Management

Initially, the laptop battery powered the system for only about 2 hours. Windows (both 7 and 8) used to last between 4 and 5 hours on a single charge. I read several articles and blog posts on what all improve battery life in Linux. None of them improved the situation measurably! powertop showed a baseline discharge rate of over 23W when idling. That was disappointing!

My HP laptop has an integrated Intel 4000 HD graphics card and an nVidia GEFORCE 630M discrete graphics card. I realised that I had to try Bumblebee. Following the instructions in one of the Ubuntu fora, I installed Bumblebee. I installed the primus configuration rather than the optimus one.

Having read a few bad things about Bumblebee, I had a trepidation similar to that I had when I re-booted the frozen Windows system. Fortunately, though, Kubuntu booted normally, and to my great relief, Bumblebee worked. powertop showed a new baseline consumption rate of a little over 10W when idling! Now I get the same 4-5 hours of battery life on a single charge!

What Do I Not Like?

The power management is a little too eager. It puts the wireless interface to sleep every few minutes. For it to wake up takes several seconds upon next use. I have to keep staring at the screen until then, sometimes a little impatiently. These days, though, I use powertop to put the Broadcom 4313GN wireless card into bad state so that it is not put to sleep so aggressively.

What Do I Miss?

All of this is fine, of course, but do I miss anything? What I miss most is Google Drive native application. I usually do not sign on-line petitions, but I did sign the one at Drive4Linux. I was disappointed to find only about 1,500 signatures. Nevertheless, I signed it, and requested my G+ contacts to follow suit if they use Linux and Google Drive.

Other than the above, I have not felt any notable inconvenience or loss of functionality, so far! Thus, after a break of about five years, I have returned to running Linux natively in my primary computer!!