Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Memory management dominates running time
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Damien Doligez <damien.doligez@i...>
Subject: Re: [Caml-list] Memory management dominates running time
On Friday, January 3, 2003, at 10:58 PM, Jörgen Gustavsson wrote:

> I profiled my program with gprof on a moderately sized input which gave
> the following top ten time consumers.
>    %  cumulative    self              self    total
>  time   seconds   seconds    calls  ms/call  ms/call name
>  55.5      63.29    63.29                            fl_allocate [1]
>  11.0      75.79    12.50                            mark_slice [2]
>   5.7      82.32     6.53                            sweep_slice [3]
>   4.0      86.91     4.59                            _brk_unlocked [4]
>   2.5      89.80     2.89                            oldify_one [5]
>   2.3      92.42     2.62                            oldify_mopup [6]
>   1.6      94.23     1.81                            modify [7]
>   1.5      95.90     1.67                            alloc_shr [8]
>   0.9      96.91     1.01                            allocate_block [9]
>   0.9      97.90     0.99                            compare_val [10]

fl_allocate, alloc_shr, and allocate_block are O'Caml allocation 
   for the major heap.
mark_slice and sweep_slice are the major collector.
brk_unlocked is a C library function, I'm surprised to see it use
   so much run time.
oldify_one and oldify_mopup are the minor collector.
modify is the write barrier that is called when an assignment is done.

> Unfortunately it gets worse when we increase the input size.

This behaviour is a bit unusual.

> I am not an expert on garbage collection but the only explanation that
> I can see for this behaviour is that the free list that fl_allocate
> searches grows as the program runs. If there are lots of small blocks 
> in
> the begining of the free list then fl_allocate potentially has to 
> search
> very far when it allocates somewhat larger blocks.
> This situation can occur, for example, if the program allocates many 
> small
> blocks in the begining which are then reclaimed by the garbage 
> collector
> and then later the program allocates mostly larger blocks.

This phenomenon is called "fragmentation" and O'Caml does a few things
to prevent fragmentation-related problems:

1. Whenever two free blocks are adjacent, they are merged into one
    bigger block.
2. The free list is not searched from the beginning every time, because
    that would lead to an accumulation of small blocks at the beginning
    of the list, even under normal conditions.  Instead, each search
    starts where the previous one stopped, so that no block is examined
    more often than any other.
3. The heap compaction algorithm is designed to remove all fragmentation
    from the free list by moving objects around to make sure that all
    free blocks are adjacent.
4. When fragmentation is too high, the typical behaviour is that the
    allocation requests cannot be satisfied from the free list, and
    the heap size has to be increased indefinitely.  The GC can detect
    when this is the case, and automatically trigger a heap compaction.

It seems that you have found a case where this strategy fails to detect
a situation of heavy fragmentation.

> One reason for why I doubt it, is that if it is correct then the ocaml
> memory management can asymptoticly dominate the running time. Is it 
> really
> so?

It looks surprising, but we don't have an analysis of the run-time
cost of memory management, and such an analysis would be quite hard
to do precisely.

> Finally, what can I do about this problem?

You can have good control of the GC parameters through the Gc module
of the standard library.  The things you should try are:
1. Increase the minor heap size.  This will let the minor GC deallocate
    more blocks, and thus reduce the load of the major GC.  For this
    parameter, the point of diminishing returns depends quite a lot
    on the allocation behaviour of your program.
2. Increase the space overhead.  The major GC has a trade-off between
    space and time.  If you increase the space overhead, your program
    will need more memory, but the major GC load will decrease.
    Reasonable values for this parameter are between 50 and 500.
3. Decrease max_overhead.  This will trigger more automatic heap
    compactions, thus reducing fragmentation.  Be aware, however,
    that heap compaction is costly and not incremental, so it will
    pause the execution of your program (for up to a few seconds).
    This can be a problem if your program is real-time or interactive.
4. Call Gc.compact from your program.  This is a good idea if you
    can easily identify in your program the point where it switches
    from allocating small blocks to allocating big blocks.

If none of the above helps, I would be interested in getting a copy
of your program, to see what is going on in more detail.  After all,
the behaviour that you are observing might simply be caused by a bug.

-- Damien

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: