Chapter 2. Hello World

Table of Contents
Assembling the Hello World Program
The echo Program
Loops and Recursion
The getopt Programs

In this chapter I will show the basics of running simple SML programs. I will start with the classic hello world program to show how to build a complete runnable program. Then I will move on to more elaborate programs following the classic development with programs like echo and word count.

During this development I will stop to examine some of the programming idioms that are peculiar to functional programming and that often give an imperative programmer difficulties. I will pay particular attention to loops, using recursion, which is one of the biggest differences between imperative and functional programming.

By the time you get here you should have studied one of the texts or tutorials on SML cited in Appendix A.

All programming examples are available as complete source files that you can run. I only cite pieces of each program in the text. You should also read the complete programs.

Assembling the Hello World Program

The least you have to do to make SML say "hello world" is to use the top level. This is a classic Read-Eval-Print loop where you type in an SML declaration and it is immediately evaluated. For example

> sml
Standard ML of New Jersey, Version 110.0.7
- print "hello world\n";
hello world
val it = () : unit
- ^D 

You type in the print expression at the prompt and terminate it with a semicolon. It gets evaluated which has the side-effect of printing the string. Then the compiler shows the return value which is () of type unit and is assigned to the special variable it. The type unit plays the same role as void in C. Use a Control-D to exit from the compiler.

But this doesn't get you a program that you can run as a command. For this you need to do some more work. The SML/NJ system can save a program in an (almost) ready-to-run form called a heap file. The steps it goes through are shown in Figure 2-1

Figure 2-1. Compiling and Running a Program

When you run the sml command you start off with the runtime, which is a C executable, and a heap containing the compiler. The compiler compiles your source file to one or more modules, here A, B and C, which are left in the heap. Then you arrange for the contents of the heap to be dumped into a file using the built-in exportFn function. Just before the heap is dumped a garbage collection is performed which removes all objects in the heap that are not reachable from your program. This will get rid of the compiler.

Later you can run your program by starting another copy of the runtime and loading your program into its heap.

You can find out where the runtime executable is on your computer by looking for where the sml command is kept. In my installation I have these files.

> which sml
/usr/local/bin/sml

> ls -l /usr/local/bin/sml
... /usr/local/bin/sml -> /src/smlnj/current/bin/sml

> cd /src/smlnj/current/bin
> ls -a
./
../
.arch-n-opsys*
.heap/
.run/
.run-sml*
ml-burg -> .run-sml*
ml-lex -> .run-sml*
ml-yacc -> .run-sml*
sml -> .run-sml*
sml-cm -> .run-sml*

The sml command leads back to the .run-sml shell script. This script runs the executable in the .run subdirectory with a heap file that contains the compiler, found in the .heap subdirectory.

To build your own program you need to duplicate this arrangement. Here is a basic hello world program.

structure Main=
struct

    fun main(arg0, argv) =
    (
        print "hello world\n";
        OS.Process.success
    )

    val _ = SMLofNJ.exportFn("hw", main)
end

All compiled programs are divided into modules called structures. Here I've called it Main but the name doesn't matter. After the structure is compiled each of its declarations will be evaluated. Evaluating the function main doesn't do anything but say that this is a function. But when the val declaration is evaluated the exportFn function (in the built-in structure SMLofNJ) will be called. This will write the heap into a file named hw with a suffix that depends on the kind of operating system and architecture you are using. For Linux on ix86 the file name will be hw.x86-linux.

The second argument to exportFn names the function that will be called when the heap file is read back in. This function must return a success or fail code which becomes the exit code (0 or 1) of the program. These codes are defined in the built-in OS.Process structure.

The next step is to compile this program. The SML/NJ system comes with a built-in compilation manager that does a job similar to the Unix make command. First you need a CM file that describes what you are going to compile. Call it hw.cm. The least it needs to contain is

group is 
    hw.sml

Then compile the program as follows[1]

> CM_ROOT=hw.cm sml
Standard ML of New Jersey, Version 110.0.7, September 28, 2000
- CM.make();
[starting dependency analysis]
[scanning hw.cm]
[checking CM/x86-unix/hw.cm.stable ... not usable]
[parsing hw.sml]
[Creating directory CM/DEPEND ...]
[dependency analysis completed]
[compiling hw.sml -> CM/x86-unix/hw.sml.bin]
[Creating directory CM/x86-unix ...]
[wrote CM/x86-unix/hw.sml.bin]
GC #1.1.1.1.1.10:   (10 ms)
write 1,0: 1356 bytes [0x40cd0000..0x40cd054c) @ 0x1000
........ stuff deleted
write 5,0: 28 big objects (271 pages) @ 0x15410

The most convenient way to pass in the name of the CM file is through the CM_ROOT environment variable. (If you don't set CM_ROOT then a default of sources.cm is used.) At the prompt type CM.make();. This runs the compilation manager. Don't forget the semicolon. You will be prompted until you enter it.

The messages you get show the compilation manager figuring out that it needs to recompile the source file. It then caches the compiled form in the CM/x86-unix/hw.sml.bin file. Then the export step writes lots of stuff to the heap file.

Now that you have the heap file you need a shell script to run it. Here is a generic script.

heap=`basename $0`
install=.
smlbin=/src/smlnj/current/bin

exec $smlbin/.run-sml @SMLload=$install/${heap}.x86-linux "$@"

This script starts the runtime and specifies the heap file to load. This is taken from the name of the script so that the same script can be used for different programs by adding links or just copying it. The install variable allows you to move the heap file to some installation directory. Any command line arguments will be passed through to the SML main function.

Now you can run it

> hw
hello world

Don't be too worried about the large size of the heap file for such a small program. Some people argue that there is something wrong with a language if a program as small as this doesn't produce a correspondingly small executable and cry Bloat!. But few people write hello world programs. The overhead in the heap file becomes much more modest in proportion when you develop programs of a serious size.

That's a fair bit of work to get a single program going but you only have to do it once and copy it as a template for future programs.

Notes

[1]

The details of specifying the CM file name will change in a future release of SML/NJ.