Introduction
My older brother Michael got me hooked on programming when I was about
12, and I've been doing it ever since.
Source control management
When I first learned about this amazing concept called revision control,
I started using perhaps the most ancient revision control systems out
there, CVS. CVS
works, but only just. Merging branches in CVS is a catastrophe. You could
do a comparable job with the adept use of diff, tar, and patch. To hear Linus Torvalds's
splendid and often hilarious views on CVS and other source control
management things, watch this Google
Tech Talk from 2007.
Now I use git. What makes it nice is
that 1.) it's fast, and 2.) it's distributed, which means "nobody's
special", i.e., there is no central repository like in SVN or CVS.
Writing pointless code
Many computational physicists are guilty of reinventing the wheel when
they write code. Among other things, they write their own integrators for
solving differential equations, they write their own interpolators, and
they write their own Makefiles. For small code bases, sometimes this is the
easiest way to proceed, but often we write code in physics to solve large,
complex problems, in which case this approach inevitably leads to buggy,
incomprehensible code that future maintainers/graduate students will have
to spend countless hours trying to decipher.
If you have the luxury of writing a new code (as opposed to inheriting
an unwieldy existing one), there is no reason to write your own Runge-Kutta
integrator or Newton-Krylov solver. The best computer scientists and
numerical analysts on the planet have devoted their entire careers
to writing software that does all of the things I just mentioned (any many
more), and the code they write is generally both robust and efficient. And
best of all, much of this software is freely available for anybody to use!
Software I use almost daily
- GNU Scientific Library
(GSL) - a C library which contains every numerical method you can
imagine. Being a GNU project, it adheres to strict coding style
standards, which makes it a nice tool to learn the language.
- Boost - a bleeding-edge library
for doing all kinds of weird and amazing things with C++. Many features
of Boost have eventually become part of the C++ standard. Be careful
about using Boost features in your code, though: it's a massive,
sometimes unwieldy library that stresses even the most sophisticated
compilers.
- The ultimate
vim configuration - self-explanatory. I use vim for most text
editing needs, and this adds a huge amount of versatility to it.
- Git - the most sophisticated
source control management software I've ever seen. Linus Torvalds
himself wrote the first few versions of it, so that's not too
surprising.
- Bazaar - Canonical's
answer to the mind-boggling complexity of git. It features much of the
same utility as git (although the Linux kernel developers would
probably disagree), and although it's slower, Canonical argues it's
far easier to use. Since I'm not a kernel developer juggling
millions of patches and trees, using git for my code is sort of like
taking the space shuttle to the grocery store. (Thanks for the metaphor,
Joe Bonamassa.)
- Cmake - a new build system.
It's not a replacement for the GNU build
system because it doesn't build your source code itself. Rather, it
writes the Makefiles itself and then you invoke gmake or whatever other
flavor of make you prefer. The documentation for Cmake is sadly lacking
and I've learned most of what I know from mailing lists and Google, but
once you figure out how to use it, it's a joy to use.
C vs. C++ vs. Fortran (for physics)
The age-old question. Here is what I have learned in my experience with all three:
- C: If you want your code to be portable, write it in C. It's an
old language, the standard is mostly "complete", and there is a C compiler
on pretty much every *NIX machine in existence. Also, if you get stuck,
it's the language your computer science friends will most likely be able to
help you fix, since almost everybody's first language is C
- Fortran: The language originally written for solving problems
in science and math. It's an easy language to read, which is not
coincidence given that it was designed in the first place to be useful
pretty much only for crunching numbers and not for doing, e.g., low-level
hardware manipulation. I find it to be very useful for writing small codes
to do quick calculations, if only because it has things like built-in
trigonometric and linear algebra operations, which C/C++ do not.
Unfortunately it has many other "features" which I, as a young-gun who
didn't grow up programming in the 1970s, don't particularly enjoy. Such
features include requiring compilers to create separate files for modules
(which forces one to compile files in a very specific order if one uses
modules), and maintaining 100% backward-compatability with all previous
Fortran standards back to FORTRAN 77. This has helped battle-hardened
libraries such as LAPACK and BLAS survive for many years, but it also
provides little incentive for programmers to break their bad habits which
were necessary to form when they were stuck with common blocks and
compile-time-only memory allocation in FORTRAN 77. Having inherited a great
deal of decades-old code from my research group, this feature of Fortran
has forced me to become a very, very, very patient person. However, despite
its sometimes backwards-looking features, it nevertheless remains the most
popular low-level language in computational science and mathematics, and
the ISO keeps writing new standards,
so if you plan on collaborating with anybody to write code in the physics
or astronomy community, chances are you'll need to be able to read it.
- C++:
It's growing on me. Migrating from procedural programming to
object-oriented programming is not always easy when writing physics and
astronomy codes. In general the programmer is simply implementing a
deterministic mathematical algorithm which has already been worked out on
paper, so the interactions among "objects" in code is generally simplistic,
which means a great deal of the rich object-oriented features available in
C++ go unused. However as codes become larger and take more and more
physics into account, I suspect that will change. Just recently I
discovered a quite clever code called MOOSE which uses many of the OOP
features of C++ to great effect.
When I write new codes I like to cycle among C, C++, and Fortran. This
exercise has helped me understand the more subtle differences among these
languages and has provided me a deep appreciation for all three.