She Bangs My AWK Into Oblivion


Published on 2010-12-13


You think the title could’ve been generated by a bot right? But this is basically going to be a religious piece about scripting and how to be a better “unix citizen”1 and why it doesn’t matter.

Anyway it’s not uncommon that you write some script and then execute it in the following manner:

usr@lcl ~> ./my_innovative_app.py
My greatest app ever!

So what is wrong with this?

Well any executable script like “my_innovative_app.py” will have a shebang line at the top telling your shell what interpreter to use, i.e. something like:

usr@lcl ~> cat my_innovative_app.py
#!/usr/bin/python
main():
    print 'My greatest app ever!'

if __name__ == '__main__':
    main()

The “#!” line is the shebang line. So the error here lies in that you are both using a “.py” file ending and telling it what interpreter to use. So you’re effectively giving the file type twice and confusing an application script with a source code file.

Furthermore, as a rule of thumb, if you’re going to be unixy, keep the name short so that it is easy to execute it in command line mode but keep it long enough so that it doesn’t collide with your other tools…

But you already knew this and you make a pass at the above script with the following remark:

But Edward. Your python script is not really portable since the path to the python binary is different on different platforms. This will make lots of trouble for people developing scripts in a distributed environment. Like when you’re using git as a collaboration tool.

And you are right, my god we need to change that to.. So after we put all these changes together we get something like:

usr@lcl ~> cat mia
#!/usr/bin/env python-2.6
main():
    print 'My greatest app ever!'

if __name__ == '__main__':
    main()

In this case the env command resolves the actual interpreter path on your platform and should be present at that location in all *nix platforms.

Which is great! Especially if you’re writing bash scripts in FreeBSD, where it is obfuscatedly located under “/usr/local/bin/bash”… Man is env giving me the greatest hard on ever just by thinking about it!

On a side note: the env command is really for executing commands in a modified environment, i.e. you use env to change or set specific environment variables.

Unfortunately relying on shebang, and fooling yourself with env, is a complete let down as a reliable mechanism for portable scripting, and this is why:

Actual Post

You might think that the shebang line is a directive to your shell to execute the file contents using the supplied interpreter and its supplied arguments. You might even make the “dynamic” assumption that the env command looks something like the following:

usr@lcl ~> cat /usr/bin/env
#!/bin/sh
prg=`which $1`
shift
exec $prg $@

Which basically means you’ve made the assumption that the shebang is composable, since the shebang line is calling another file with a shebang in it and letting the called executable decide what to do with any pass-a-long arguments… BAD PROGRAMMER! BAD! NO! NO MORE COOKIES FOR YOU!

The shebang line is actually something which gets translated at a lower level using the exec* system calls. But wait… why are we talking about this, is this a problem? Yes. This means that you can shoot yourself in the foot using non portable shebang lines, since these are clearly dependent upon your flavor and version of libc.

Now, lets say you are using AWK instead of Python.

usr@lcl ~> cat mia
#!/usr/bin/env awk -W traditional -f
//{print "Haha I ate your input"}

This will not work. The “-f” will be interpreted as a path on OS X and the “awk -W traditional -f” will become a path on Linux… So we’re stuck.


  1. Here I am refering to the fact that you should try and write portable shell scripts, especially using awk, sed, blah..


Comments