2011-07-01

Performance improvement - 2

No sooner had I finished my previous post than I began reviewing the algorithmic aspects of ring discovery in the program. I observed two points standing out.

  1. During the analysis of the molecule's graph, we build a matrix of pair-wise distances between all atoms of the molecule.
  2. During ring discovery, we remove all terminal chains (remember terminal nodes from here?), and then explore the remaining graph to detect rings.

As I reviewed the code, I saw that I was recursively walking the graph in both cases. When building the distance matrix, the logic is simple: directly bonded neighbors are at a distance of 1; recurse over the neighbors of each neighbor, and increment the distance by 1 at each level. Whenever we encounter a shorter path between the same pair, we record this shorter distance. Exercise: Why do multiple paths exist at all? A similar algorithm is followed for ring discovery, but walking only the subgraph comprising non-terminal atoms.

Whenever a ring is detected, it is necessary to determine if it is a Jordan curve (simple closed path) or not. We are interested in Jordan curves; we mark others as fused systems. How do we, then, determine if the current path is a Jordan curve? The parent molecule of every component has a map that holds information about bonds. The key of the map is a hash; the value, the bond itself. Suppose that the bond is between the atoms a1 and a2. The hash is computed as: 10000 * aL.id + aG.id, where aL is the atom with the lower ID of the two, and aG is that with the higher. This map is constructed when parsing the input, and is, consequently, ready by the time ring discovery is requested. Clearly, it is adequate to determine if a given path is a Jordan curve or not. How? :-)

I suddenly recollected that in the first version of the ring discovery algorithm, the distance matrix was used to test if a path was a Jordan curve. In that version, it was necessary to pre-calculate the distance matrix. With it being no longer mandatory, I removed the call to build it during ring discovery.

Lo and behold — the running time dropped from 18.5-19s to 6.5-7s! This time, the improvement in performance had nothing to do with Go, and everything to do with my own oversight. Needless to say, I am happier now! :-)

No comments: