The programs listed here are provided as exhibitions of just how compressed a Linux ELF executable can be and still function.
Take a look through /usr/bin: how big is the smallest ELF executable in there? Now try writing the smallest hello-world program you can. Can you get it under 1K? Can you get it under 100 bytes?
This is precisely what I set out to do one day, and I ended up with a minor obsession over creating the smallest possible executables.
The distribution comes with preassembled binaries along with the assembly source code. If you wish to be able to rebuild them yourself, you will need to have a copy of Nasm, the excellent x86 assembler. Nasm's home page is at http://www.web-sites.co.uk/nasm/.
Your average ELF executable, even after being stripped, contains a fair bit of overhead, some of it completely superfluous. And when you rip out everything that doesn't contribute to a program's memory image, much of what remains, beside the program code itself, is information needed by the dynamic linker. This is of course very important information to have, normally -- but if the program makes its own system calls, it can dispense with that overhead as well.
Of course, there are few tools out there that will create an executable that has none of this extra machinery. But if one is willing to define the file entirely by hand, an executable can be created with the absolute minimum contents, including only the information that Linux needs in order to get it into memory and start it running.
The minimum information an ELF executable file needs to have is an ELF header and a program header table with at least one entry. (For more information about these entities, consult the ELF standard. You can download a copy of it as a Postscript document from ftp://tsx.mit.edu/pub/linux/packages/GCC/ELF.doc.tar.gz, or from the equivalent location of any repository containing GNU's gcc package for Linux. Or, you can also retrieve a transcription of the document as a flat-text file from http://www.muppetlabs.com/~breadbox/software/ELF.txt.) But, even in these two remaining structures, there are fields that aren't used, particularly with a bare-bones executable. Furthermore, there are other fields in these structures which do serve a purpose, but which are not, at this point in time, actually examined by the Linux OS. So, these fields can be used, if the programmer is desperate enough, to hold other bits of data, or even small amounts of code....
Let me pause for a moment and make this perfectly clear: I emphatically do not advocate the wilfull, widespread disregard of standards. Certainly, just because the ELF standard says that a given field will hold a certain value does not automatically mean that Linux ought to examine that field and reject any ELF file with something else there. But by the same token, just because (the current version of) Linux ignores the contents of that field does NOT give programmers implicit permission to fill it in with anything they like. When programmers en masse adhere to existing implementations instead of existing standards is exactly when a well-written standard becomes an obstacle instead of a tool, when future standards are forced to canonize ludicrous practices in the name of backward-compatibility, and when everything generally goes to hell. So: where these programs violate requirements of the ELF standard, do not take them seriously. These programs have been offered up as entertainment, and perhaps as an education in existing practice. But please do not seek to emulate such behavior in other, more serious programming work.
In fact, not every program here is portable among all the 2.2 kernels. Beginning with 2.2.17 (?), Linux began verifying one of fields (namely e_phentsize) in the ELF header that it had ignored in previous kernels. This field is also still ignored in the 2.4 kernel. I'm unclear as to reason for the divergence, but that is the risk one takes when ignoring standards. In order to accommodate users who are running the pickier kernels, I have supplied alternate versions of the three programs whose original version ignored this field. Since these are written specifically for the 2.2.17+ kernels, I have managed to keep them at their original size by taking advantage of yet another assumption that would not otherwise be portable -- namely, that all the general registers are initialized to zero on program startup. (This behavior can only be relied on with 2.2 and later kernels.)
Of course, violating requirements of the ELF standard is only one way in which these programs achieve such compression. Other aspects regarding the organization of the ELF structures in these programs is admittedly unorthodox, but nonetheless in complete adherence to the standard. Most important of all, of course, is the creative reworking of algorithms. Finally, as with any CISC assembly programming, there are also plenty of uncommon instructions that can be used to accomplish common tasks. (This can be an especially fruitful avenue of exploration on the Intel x86 as so many of the short instructions date back to the 8086, or worse, and are usually considered to be completely obsolescent, as Intel has made no attempt to keep their timings abreast of the rest of the instruction set. These days, one usually programs in assembly only in order to optimize the running time, in which case it is very important to stick with the core instruction set as much as possible. But when optimizing for space instead of speed, one suddenly finds new uses for all those obscure one- and two-byte instructions.)
The path of optimizing an ELF executable for size is described in my essay "A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux", at http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html.
This program returns an exit code of either zero or one, depending on whether it is invoked with the name "true" or "false", respectively. This one is the runt of the litter. Its size is 45 (count them) bytes. I believe that this is the smallest it is possible for a Linux ELF executable to be.
The final version of the program that started me off on this whole pursuit: hello world. It is 59 bytes long. This may well be the densest one here. With the program header table overlaid on top of the ELF header, and program itself running through both of them, some of the bytes have no less than three completely different purposes.
fgconsole.asm
fgconsole-2.2.17.asm
fgconsole was the smallest ELF executable in my old /usr/bin directory (2640 bytes), so I decided to see how small my own version could be. I managed to fit it into 72 bytes. fgconsole simply prints the number of the currently active virtual console. If run from somewhere other than the console, it returns a non-zero exit code.
keepalive.asm
keepalive-2.2.17.asm
Here's a program that I actually use, quite frequently in fact. It simply runs forever, printing a bell character every five minutes or so. I keep it in the background after telnetting from another machine that times out connections after 15 minutes of idle time. At 57 bytes, it's a bit longer than the one-line shell script I originally used, but on the plus side it doesn't take up an extra process in order to sleep.
Brainfuck is a very simplistic programming language, which was invented by Urban Mueller solely for the purpose of being able to create a compiler that was less than 256 bytes in size, for the Amiga OS. (More information about Brainfuck can be found at http://www.muppetlabs.com/~breadbox/bf/.) I eventually decided to take up the challenge as well, and create a Brainfuck compiler for Linux using less than 256 bytes. The compiler works as a filter: redirect the source code to the compiler's input and the compiler's output to a file. (And don't forget to chmod +x the output file.) At the beginning I didn't expect that I would actually succeed, since I thought I would need almost half of that just for the headers. I think my first cut was 458 bytes in size. I was quite pleased with myself when I trimmed that down to less than 300 bytes, ecstatic when I finally reached the 255-byte goal, and downright triumphant when I later got it to work in 238 bytes. By the time I had a 198-byte version, I was just plain dumbfounded. It's now at 171 bytes. And though I can't quite believe it, it works just as well as the first one did. As useless as it is, I think this one would have to be the crown jewel of this little collection. It is quite possibly also the ugliest and most opaque program I have ever written.
This is your classic hexdump program. I wrote it one day in a fit of pique, after trying (and failing) to convince the standard Linux hexdump program to use the canonical hexdump formatting style. This program, and the ones that follow, are different from the previous ones in that these adhere 100% to the requirements of the ELF standard. The reason I did this is because, unlike the previous programs, these can actually accomplish something useful. I wanted to be able to keep them around and know that they would continue to work. This one is actually larger than earlier versions. The first version only read from stdin, but I found myself using this program often enough to be annoyed by this limitation, so I added command-line parsing. As a result of these extra goals, hexdump is a whopping 226 bytes.
After hexdump, I came into contact with Konstantin Boldyshev, who heads the asumutils project (you can visit the asmutils home page at http://www.linuxassembly.org/asmutils.html), and I got involved in creating tiny programs that were actually useful. ls is my first real achievement along these lines. It is 1017 bytes in size, and sports many of the important command-line options:
-C | columnar output; default if stdout is a tty |
-1 | one file per output line; default if stdout is not a tty |
-l | "long" output |
-F | show file type suffixes |
-i | show inode numbers |
-s | show size of files in 1K-blocks |
-d | don't expand directories |
-a | show all (hidden) files |
-B | don't show files ending in ~ (backups) |
-N | don't replace non-graphic characters with a question mark |
-b | show non-graphic characters as octal escape sequences |
-R | recursively list contents of subdirectories |
(I'm especially proud of getting that last feature to work.)
The "long" output format only displays numeric user IDs, and it displays timestamps as an age, instead of the actual timestamp. There is no sorting capability, and the columnar output is much less intelligent than GNU's (though it looks pretty good most of the time). Beyond that, however, it conforms pretty closely to the standard ls program we all know and love.
date was my the next creation after ls for the asmutils project. (Note that the programs here are slightly different from the ones in asmutils. The asmutils programs are portable to other x86-based OSes, such as BSD. The versions you see here are strictly for Linux, and this allows me to squeeze out a couple more bytes here and there.) This version of date doesn't have clock-setting capabilities; it just does date output formatting. It honors the local time zone, if set, as well as the LANG environment variable. It also contains its own default locale if LANG isn't set, which makes it a bit large -- 1448 bytes, to be precise.
This is an implementation of the standard Unix program. It displays the prime factors of the integers specified on the command line, or on standard input if no arguments are given. This program is the oddball of this collection. Instead of simply trying to achieve the smallest possible size, I decided to shoot for real portability, for a change. This program not only conforms to the requirements of the ELF standard, it is also the only program in this collection that actually uses functions from libc (as opposed to making its own system calls). Thus it should continue to work with any future version of Linux, as long as new versions of libc and the ELF standard remain backwards-compatible. In addition, I decided to try to balance optimizing for both size and speed. This program takes advantage of the fact that floating-point division can be done in parallel with integer instructions on all Pentium processors. As a result it is significantly faster than the GNU implementation of factor for Linux. Finally, I added online help and an appropriate error message for invalid input. The resulting program therefore arguably stands as a completely functional replacement for the existing utility. Its size is 896 bytes. (A much smaller version, which does not use libc, is also available in the asmutils collection.)
By now you're probably thinking: "Okay, the first couple of programs were impressive, but now it's just repetitive. Isn't there anything interesting in here?" Well, here's something to reward you for reading all the way to the end: a game! Snake is a classic computer game -- in fact it's so old that you can hardly find it anymore in its original form. I originally wrote this as an Easter Egg for a contract involving text-terminal interfaces. Now it's available to all as a tiny program. In case you've never played it, the object of the game is to guide your snake around the playing field, eating the food blocks that appear at random locations, and dodging the walls and your own tail. As you eat the blocks, your score increases but so does your length. The program should work on any VT100-compatible terminal (although see the program comments regarding the line-drawing character set -- not all Linux console fonts provide correct support for them). Plus, at 1496 bytes, it's the perfect addition to a rescue floppy ... when you've just lost all your data, what's better than a game to take your mind off your troubles?