Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Executable size?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Brian Hurt <bhurt@s...>
Subject: Re: [Caml-list] Executable size?
On Wed, 12 Nov 2003, John J Lee wrote:

> On Wed, 12 Nov 2003, Brian Hurt wrote:
> 
> > On Wed, 12 Nov 2003, Richard Jones wrote:
> [...]
> > > This is not a criticism of OCaml, but the executables do tend to be
> > > quite large. This seems mainly down to the fact that OCaml links the
> > > runtime library in statically. There was previous discussion on this
> [...]
> > This isn't as bad as it sounds.  A simplistic "hello world!" application
> > in Ocaml weighs in at 112K, versus 11K for the equivelent (dynamically
> > linked) C program- almost entirely either statically linked standard
> > libraries and infrastructure (garbage collections, etc.)- stuff that
> > doesn't expand with larger programs.
> 
> OK.  Is that 100K difference for "hello world" (which doesn't necessarily
> stay the same for larger programs, as you say below) simply a result of
> the fact that C has the "unfair" advantage of already having its runtime
> sitting on everyone's hard drive already?

Actually, I think Ocaml uses C's runtime libraries and builds on top of
them.  For example, if I understand things correctly, Ocaml's printf is a
wrapper which calls C's printf.  Which is why I haven't bothered comparing
Ocaml's size to C programs being statically linked.  Ocaml is at least
nice enough to only link libraries you are actually using (see the
print_string v. printf results).

In addition to a more complicated and complete standard library and 
bultins, Ocaml also has garbage collection, which is non-trivial to 
implement.  I wouldn't be surprised if half or more of that 100K of 
overhead is just the GC.  Currying, exceptions, etc. also have small size 
penalties.

On the other hand, I would argue that these features, while bloating the 
application.  Which is exactly the sort of thing small "benchmark" 
programs don't show.  I don't know how many times I've read or written C 
code like:

int copy_file(char * src, char * dst) {
    char * buf;
    FILE * inf;
    FILE * outf;

    if ((src == NULL) || (dst == NULL)) {
        return EINVAL;
    }

    inf = fopen(src, "rb");
    if (inf == NULL) {
        return errno;
    }

    outf = fopen(dst, "wb");
    if (outf == NULL) {
        fclose(inf);
        return errno;
    }

    buf = (char *) malloc(4096);
    if (buf == NULL) {
        fclose(outf);
        fclose(inf);
        return errno;
    }

    blah blah blah you get the idea

Vr.s the same code in Ocaml:

let copyfile src dst =
    let inf = open_in_bin src
    and outf = open_out_bin dst
    and buf = String.make 4096 ' '
    in
    let rec loop () =
        let c = input inf buf 0 4096 in
        if (c > 0) then
            begin
                output outf buf 0 c;
                loop ()
            end
        else
            ()
    in
    loop ()

The ocaml executable code for copyfile function will be smaller than the C 
version, simply because the ocaml version takes advantage of various 
features of the larger ocaml library and infrastructure- especially (in 
this case) exceptions and garbage collection.  

> 
> 
> > A naive assumption would be that an Ocaml program is about 100K or so
> > larger than the equivelent C program.  Not much, considering how easy it
> > is to get executables multiple megabytes in size.
> 
> [...]
> > Ocaml gets a lot more code reuse, and thus can actually lead to smaller
> > executables.
> 
> I don't understand what you mean by that (probably my fault).  What do you
> mean by "code reuse" here?  I usually understand that phrase to mean using
> code written by people other than me, but you seem to mean it in a
> different sense.
> 

I was using it in the most literal sense- using code more than once, in
more than one way.  In general, it's much better to have only one copy of
a function, used in two places, than two copies of the function.  The 
trick is that generally the two copies are not exactly identical- if 
the functions are, for example, the length of a linked list, one function 
might operate on a linked list of integers, another a linked list of 
floats.  Ocaml encourages you to program in a generic way- you actually 
have to work at it to write a linked list length routine that *isn't* 
generic, the naive implementation is (so is the optimized version).

Again, this generally isn't a problem in small programs, which easily fit 
into you brain as a whole.  Code reuse becomes more of a trick on moderate 
to large programs, especially moderate to large programs with more than 
one programmer.  How many times have we reimplemented linked lists in C?

> 
> > Unless you have special constraints, the difference between C program
> > sizes and Ocaml program sizes are not enough to be worth worrying about.
> 
> I don't really agree that the problem of distributing simple (few lines of
> code) applications in small executables is all that "special".  Certainly
> there are *many* applications where you don't need that; equally, there
> are quite a few where you do need/want that.

I was thinking of special cases where the difference of a 100K or 1M or so 
is the difference between working and not working.  If you are, for 
example, trying to fit your program on a 512K ROM, Ocaml's overhead might 
be a problem.  

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners