One of Many Worlds: 11/01/2010

2010-11-22

Large files and multiple cores

As part of my product, I have an available chemicals database (ACD) that is compiled from third-party catalogs. Such catalogs are typically in SDF file format, as specified by MDL. I ran into some issues when importing them, resulting in aborted runs. It was getting tedious to locate the troublesome molecule in the vastness of the SD files. After a while, the programmer in me woke up! The consequence is a small (~750 LoC) utility written in Go, which I blandly called sdf.

~ % sdf help

Usage:

sdf help
Prints this usage notes message, and exits.

sdf show in=file [from=m] [to=n]
Fetches and displays molecules from the file 'file',
starting with the molecule numbered 'm', optionally to the
molecule numbered 'n'. If 'from' is not specified, the
first molecule is used as the starting molecule. If 'to'
is not specified, all molecules until the end of file are
displayed. Specifically, to display only the m'th molecule,
you should specify 'from=m to=m'.

sdf copy in=file1 out=file2 [from=m] [to=n]
Similar to 'show' above, difference being that the output
is written to 'file2'. Any existing 'file2' will be
truncated.

sdf searcha in=file [from=m] [to=n] symbol=count [symbol=count] [mx=c]
Performs a search for the first molecule in the given range
that has the specified number of atoms of each element type.
The number of processor cores to use can be specified using
'mx'; default is 2.

sdf searcht in=file [from=m] [to=n] tag=name tagval=value [tag=name tagval=value] [mx=c]
Performs a search for the first molecule in the given range
that has the specified tags and values. The number of
processor cores to use can be specified using 'mx'; default
is 2.

2010-11-13

Java and native code

Last week, one of my clients' projects ran into a problem. The team had introduced a new feature in the latest release. It ran fine in testing, but began having mysterious crashes in production. Upon some investigation, they found that a particular third-party native library (`.so' file) was causing the error. This native library was used to compute the InChI value of a given molecule. The application itself is a Java Web application written using the Struts framework.

Initially, there was a path related problem. They solved it. Then there was a 32-bit vs. 64-bit problem (or so they thought). They solved it. Then there was some version mismatch. They solved it. At that point, they moved it into production. Presently, the mysterious crashes started. Yesterday, I was called in.

Even before I landed there, my thinking was: (a) such molecules could be known corner cases of the third-party library, or (b) there is a memory corruption (e.g. JVM trying to garbage-collect memory allocated by C runtime).

I began by understanding the issue's history. I familiarized myself with the part of the code that was interfacing with the native code. Together with the team, I set up various scenarios to reproduce the crash. After a couple of hours, the first crash did occur. Using trace information, we looked up the forums of the third-party vendor who supplied the native library and its wrapper Java library. The forums had discussions concerning some of the exactly same scenarios that we had set up for testing. They were stated to be known problems. We also saw that a newer version of the vendor library was available. At this point, it appeared as if my first guess was probably correct.

Re-running the scenarios with the updated library caused similar crashes, nevertheless! After a couple of more hours, the situation appeared to have reached a dead end.

Then, I suddenly saw a pattern in the crashes -- they all happened only in multi-user scenarios! Then I remembered my second line of reasoning. It was memory corruption, but probably due to global data in the native library! We quickly verified this by simultaneously launching the said functionality in independent JVM instances. There were no crashes!

So, that indeed was the problem! The native library was not thread-safe! We then re-wrote that part of the application to factor the native part into a separate process to be run in a new instance of JVM. Each client request invoking that particular functionality would run in a separate JVM, handing the results over in a text file, back to the mother program. With thread-safety problem out of the way, the application went back to running happily!

2010-11-04

Happy ending!

I was thinking of filing an issue for the stat problem of Go for Windows. I was about to do that this evening, when I received my daily digest of Go mailing list. I found an announcement therein that a new release of Go was made yesterday night. The release notes said, to my disbelief, that the stat problem in Windows was solved!

I eagerly downloaded the new version for Windows, and re-compiled the SHA-1 summing program with it. I ran it, I admit, with some trepidation. It ran as it should (have) -- smoothly!

Now, that is some coincidence! Happy ending, thusly!

2010-11-03

Fruition :-)

I spent a couple of hours rewriting the Go version of the program in C++ using Qt. This version worked flawlessly, much to my relief!

Qt is a very well-designed library! This is the same feeling that gets reinforced every time I work with it.

I used a class called QDirIterator, whose function is self-describing. While Qt supports STL-style iterators, it recently introduced Java-style iterators. These new iterators support hasNext() and next() combination for traversal through the containers. Needless to say, these are much easier to use than the corresponding STL-style iterators.

In all, I enjoyed the program's rewrite exercise, which also came to fruition!