My thoughts on programming

Introduction

My older brother Michael got me hooked on programming when I was about 12, and I've been doing it ever since.

Source control management

When I first learned about this amazing concept called revision control, I started using perhaps the most ancient revision control systems out there, CVS. CVS works, but only just. Merging branches in CVS is a catastrophe. You could do a comparable job with the adept use of diff, tar, and patch. To hear Linus Torvalds's splendid and often hilarious views on CVS and other source control management things, watch this Google Tech Talk from 2007.

Now I use git. What makes it nice is that 1.) it's fast, and 2.) it's distributed, which means "nobody's special", i.e., there is no central repository like in SVN or CVS.

Writing pointless code

Many computational physicists are guilty of reinventing the wheel when they write code. Among other things, they write their own integrators for solving differential equations, they write their own interpolators, and they write their own Makefiles. For small code bases, sometimes this is the easiest way to proceed, but often we write code in physics to solve large, complex problems, in which case this approach inevitably leads to buggy, incomprehensible code that future maintainers/graduate students will have to spend countless hours trying to decipher.

If you have the luxury of writing a new code (as opposed to inheriting an unwieldy existing one), there is no reason to write your own Runge-Kutta integrator or Newton-Krylov solver. The best computer scientists and numerical analysts on the planet have devoted their entire careers to writing software that does all of the things I just mentioned (any many more), and the code they write is generally both robust and efficient. And best of all, much of this software is freely available for anybody to use!

Software I use almost daily

GNU Scientific Library (GSL) - a C library which contains every numerical method you can imagine. Being a GNU project, it adheres to strict coding style standards, which makes it a nice tool to learn the language.
Boost - a bleeding-edge library for doing all kinds of weird and amazing things with C++. Many features of Boost have eventually become part of the C++ standard. Be careful about using Boost features in your code, though: it's a massive, sometimes unwieldy library that stresses even the most sophisticated compilers.
The ultimate vim configuration - self-explanatory. I use vim for most text editing needs, and this adds a huge amount of versatility to it.
Git - the most sophisticated source control management software I've ever seen. Linus Torvalds himself wrote the first few versions of it, so that's not too surprising.
Bazaar - Canonical's answer to the mind-boggling complexity of git. It features much of the same utility as git (although the Linux kernel developers would probably disagree), and although it's slower, Canonical argues it's far easier to use. Since I'm not a kernel developer juggling millions of patches and trees, using git for my code is sort of like taking the space shuttle to the grocery store. (Thanks for the metaphor, Joe Bonamassa.)
Cmake - a new build system. It's not a replacement for the GNU build system because it doesn't build your source code itself. Rather, it writes the Makefiles itself and then you invoke gmake or whatever other flavor of make you prefer. The documentation for Cmake is sadly lacking and I've learned most of what I know from mailing lists and Google, but once you figure out how to use it, it's a joy to use.

C vs. C++ vs. Fortran (for physics)

The age-old question. Here is what I have learned in my experience with all three:

C: If you want your code to be portable, write it in C. It's an old language, the standard is mostly "complete", and there is a C compiler on pretty much every *NIX machine in existence. Also, if you get stuck, it's the language your computer science friends will most likely be able to help you fix, since almost everybody's first language is C
Fortran: The language originally written for solving problems in science and math. It's an easy language to read, which is not coincidence given that it was designed in the first place to be useful pretty much only for crunching numbers and not for doing, e.g., low-level hardware manipulation. I find it to be very useful for writing small codes to do quick calculations, if only because it has things like built-in trigonometric and linear algebra operations, which C/C++ do not. Unfortunately it has many other "features" which I, as a young-gun who didn't grow up programming in the 1970s, don't particularly enjoy. Such features include requiring compilers to create separate files for modules (which forces one to compile files in a very specific order if one uses modules), and maintaining 100% backward-compatability with all previous Fortran standards back to FORTRAN 77. This has helped battle-hardened libraries such as LAPACK and BLAS survive for many years, but it also provides little incentive for programmers to break their bad habits which were necessary to form when they were stuck with common blocks and compile-time-only memory allocation in FORTRAN 77. Having inherited a great deal of decades-old code from my research group, this feature of Fortran has forced me to become a very, very, very patient person. However, despite its sometimes backwards-looking features, it nevertheless remains the most popular low-level language in computational science and mathematics, and the ISO keeps writing new standards, so if you plan on collaborating with anybody to write code in the physics or astronomy community, chances are you'll need to be able to read it.
C++: It's growing on me. Migrating from procedural programming to object-oriented programming is not always easy when writing physics and astronomy codes. In general the programmer is simply implementing a deterministic mathematical algorithm which has already been worked out on paper, so the interactions among "objects" in code is generally simplistic, which means a great deal of the rich object-oriented features available in C++ go unused. However as codes become larger and take more and more physics into account, I suspect that will change. Just recently I discovered a quite clever code called MOOSE which uses many of the OOP features of C++ to great effect.

When I write new codes I like to cycle among C, C++, and Fortran. This exercise has helped me understand the more subtle differences among these languages and has provided me a deep appreciation for all three.