The HOW and the WHAT

No, a low-level programming language is not defined by being closer to the metal. And high-level programming is not defined for being closer to English. Heck, BASIC was probably the closest to English, and you’d be hard pressed to defend that it’s higher-level than Haskell. Which is probably as difficult to understand for a native English speaker as any real programming language can get, with the sole exception of APL.

Anyway, indeed, some of those things are true. But they are only secondary attributes. They are like the fact that vi allows you to do much more in less keystrokes than any other editor: the core reason is that vi was designed to work over 300 baud phone lines, and this involved, among many other things, reducing keystrokes – but this was not the goal, but a side-effect.

Lower-level programming languages are created first, higher-level programming languages are created afterwards, in order to make programmers’ lives less painful. You know, programming is in some aspects very similar to being tortured: you have to keep dozens (if not hundreds) of things in mind while you write every line of code. And if you forget any of them, you write a bug – and you don’t know right away. Back in the day when the web wasn’t gobbling up all other platforms like The Nothing in Michael Ende’s The Neverending Story, you used C++ and the compiler would punish you with a basic warning or error for most cases, which is similar to getting a little pinch. Nowadays, it’s Javascript or Python, and you will be punished for the bug with a crash in front of your user or customer, which makes the minimum torture threshold more similar to getting a nail torn out.

But even if the driving force is improving the life of programmers reducing the cost of software development, the real way this is achieved is not by getting further away from the proverbial metal, but by allowing you to describe processes better. A higher level language allows you to describe things more succinctly, without having to get into the nitty-gritty details of every single excruciating little step. See a sample in assembly language:

                MOV EAX, [first_value]
                ADD EAX, [second_value]
                MOV [result], EAX

This calculates and stores the sum of two values. In three steps. Read the first value into EAX. Add the second value into EAX. And store the value of EAX in the result variable. Kids, this is how we did things back in the day.

And see this in C:

result = first_value + second_value;

Thanks progress exists.

But still, even if this shows real evolution in programming, from a decidedly lower-level  language to a higher-level one, it doesn’t show what the underlying key is. Removing the references to specific CPU registers and instructions is one step, but if we don’t understand the core, we can’t further this process and create even more useful programming languages. So, what is it?

In Turing’s spirit, all programming languages are equivalent in what you can do with them. But the truth is that they are not the same at all. Something like the SKI combinators are fully capable of any computation, and what’s more, it’s just three combinators, and one of them is redundant. See a sample program:

(S (S (K S) (S (K K) I))   (S (S (K S) (S (K K) I)) (K I)))

(This piece above is number 2. Imagine how an XML parser written using the SKI combinators must look like.)

You could say SKI combinators are as far from the metal as you can imagine. Functional combinator application. Rewriting rules. Church numerals. No conventions, only a couple of basic core definitions. What not.

But even with this, I doubt anyone would dare call SKI combinators a high-level programming language. It’s about as low-level as it gets, similarly to assembly language. Ok, let’s cut SKI calculus some slack, at the very least, it’s probably the only calculus with its own Facebook page (http://www.facebook.com/pages/SKI-combinator-calculus/138648626160295 – and all of 6 people worldwide like it).

So, if closeness to the metal does not define the high/low level distinction, what does? Because it’s obvious that C and Java Fortran and Haskell share something useful and valuable that assembly language and the SKI combinatory calculus don’t. (Sorry, just kidding about Java.)

And here is the key: if you compare assembly and SKI, you see that most of the code you write deals with the nasty details of each step of doing something: reading a value into a register, or substituting a combinatory parameter in some way to rearrange information. While, on the other hand, the sample C code above depends on the underlying infrastructure, and the C code just lists WHAT the end goal is.

And this is the key: lower-level programming languages make you worry about HOW each thing is done. While a higher-level  language allows you to get busy with WHAT to do, while letting the tools worry about how that goal is achieved.

Why is people’s code so bad? Even after getting a degree in CompSci?

I’m fixing code written by some quite junior programmers. I’m always amazed at the code people write. Only in these moments I really see what good code is about. When you read or write good code, it just seems the obvious thing to do, and no big deal.

Principle #1: if one small piece of code is repeated, and it would be wrong if you changed it in one place but not the other, then THE CODE IS BAD! You have to REWRITE it. Just so that you can’t change things in one place and not in the other(s)! It’s that simple.

The simplest case, and one I STILL see too often is numeric constants. You are adjusting some things (say, things in a UI layout done programmatically). You change it in one place, run the program, and it doesn’t work. Or you do something non-direct (like rotating the screen and then going back to the original place), and things break up. You check it, and the position is set in two places, to a numeric constant, and you had only changed it in one place!

The solution is obvious: have a #define at the top giving the right name to the value, and use the constant name in both places. In Java, you can just use some “private final static int” or so. Whatever your languages buys you, but there’s certainly a way (or switch languages, pronto!).

I had this idea that if someone indents code wrong, you should stop looking at it, because the rest will be wrong too. If a developer can’t get indentation right, it’s impossible they’ll get conceptual consistency right. I’m expanding on this idea now: if someone repeats small non-independent pieces of code or numeric constants, instead of refactoring to some simple function or definition, then I won’t bother with it (unless totally unavoidable).

May you never break the DRY principle: Dont Repeat Yourself. I have a theory that every good development practice boils down to just a circumstancial version of this principle. There is one practice that doesn’t fit the bill, so that’s why I’m withholding from sharing until I can really prove the elegance. I can boil all down to “WRITE HONEST CODE”, but that just isn’t so catchy.