2010-10-28

Fruitless effort :-(

I wrote a small program to compute SHA-1 sums of directory trees, to easily identify and locate files occurring more than once. I wrote it in Linux using Go language. I tested it on large trees of sizes in the range 1-4 GB. I cross-checked the results - through random sampling - using `sha1sum', and I was happy! I then booted into Windows 7 VMware image and compiled it. Then began the trouble.

Go is not officially supported on Windows yet. There is an unofficial port that closely follows the official releases. It is clearly marked `experimental'. I should have realized that when an OS is not yet officially supported, the weakest areas would be OS interfaces. The inane dud that I am, I did not think enough!

While walking the directory tree, I was doing a `stat' of each entry: (a) to find whether it is a directory or a file, and (b) if it is a file, to know its size in bytes. In Windows, the `stat' call was failing. After some gymnastics, the program was walking the directory tree. But, some directories were getting recognized as files. In addition, I could not make it `stat' the entries whose full path contained spaces. I have exhausted all conventional mechanisms to no avail.

So, I give up! At least, for now. I have begun using Go for serious work. I have not encountered any issues with the runtime so far. But ... that is in Linux. From a few small programs that I run in both Linux and Windows, I see that there are no issues with Go ... as long as OS interfaces are not used in Windows!

My choice of language for this particular job was wrong, leading to this fruitless effort. Sigh!

Modern computing

I came across this in Go mailing list today. Irresistible!
Such is modern computing: everything simple is made too complicated because it's easy to fiddle with; everything complicated stays complicated because it's hard to fix.

Rob Pike

2010-10-04

Time problems

Last week, I encountered two interesting problems.

Background: The product that I am building has a newly-developed licensing feature. The feature uses hardware information of the user's machine a la Windows activation. In addition, licenses are annual, so the program must commit suicide at the end of one year.

Problem 1: After the first successful run of the product, all attempts at running it again were failing with the error message `Current time older than that of the most recent run.' The time of last run is updated in the license file (which is an encrypted file) at the beginning of each run. After several attempts at debugging, I could find nothing. I switched to other work, and returned a few hours later. To my surprise, the program ran successfully, but just once! It then returned to the now-familiar error message.

I noticed that 4 hours had elapsed between the two successful runs. I suddenly remembered that I was storing the time of last run as UTC, and was signed in into my collaborator's computer at Toronto (UTC-4:00). Upon searching the code for all instances of storing/reading time, I noticed that conversions of local time to and from UTC were inconsistent. I made them consistent, and Problem 1 disappeared.

Problem 2: No sooner had I finished resolving the above than I needed to run the program - in batch - about 6000 times. I wrote a shell script to do that. I tested it for 10 runs. Surprise! The error message was back, only even more confusingly. Jobs 1 and 2 ran successfully. Job 3 failed with the above error. Job 4 ran successfully. Job 5 failed. Jobs 6 and 7 ran successfully. Job 8 failed. Jobs 9 and 10 ran successfully. The actual jobs that ran successfully changed but slightly each time I re-ran the shell script. It really left me scratching my head.

I then noticed that there was a pattern. The jobs before the failed ones completed really quickly. I inserted a `sleep 1' after each job in the shell script. Now, every job completed successfully!

I then checked the license verifier code. One of the important checks was that current time should be greater than that of the last run. However, resolution of time was in seconds. Thus, if a job completed in less than a second, its immediately next job would fail. So, I modified the condition to current time should not be less than that of the last run. The shell script ran happily ever after!