Version française
Home     About     Download     Resources     Contact us    
Browse thread
OCaml runtime using too much memory in 64-bit Linux
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Brian Hurt <bhurt@j...>
Subject: Re: [Caml-list] OCaml runtime using too much memory in 64-bit Linux
Xavier Leroy wrote:

>
>If the problem is confirmed, there are several ways to go about it.
>One is to implement the page table with a sparse data structure,
>e.g. a hash table.  However, the major GC and some primitives like
>polymorphic equality perform *lots* of page table lookups, so a
>performance hit is to be expected.  
>

I've been contemplating doing this on my own (not at work, I comment), 
just to see how much of a hit it is.  If no one else steps up to the 
plate, I will.

One important comment I will make, in moving to a hash table the size of 
pages has to increase.  The current implementation uses 1 byte per (4K) 
page for the map- a hash table would use, I project, about 4 words (16 
or 32 bytes) per page.  To keep the memory utilization equal, we'd need 
to go for a larger page size- 64K for 32-bit systems.  I'd be inclined 
to go larger than that- the 4K page size was standardized back when the 
average system (using memory protection) has 1MB, and 16MB was a huge 
amount of memory.  Memory sizes have increased 1024-fold in the same 
time, meaning I could make an argument for 4M pages.  I'm not sure I 
would (I think you'd run into a fragmentation problem on 32-bit), but it 
makes the page sizes I'd probably settle on (256K for 32-bit, 1M for 
64-bit) a lot more palatable. :-)

An advantage large pages would have is that they'd make the table a lot 
smaller, and thus a lot more cache friendly.  I mean, think about it- 
1GB of 4K pages at 1 byte per page is 256K, while 1GB of 256K pages at 
16 bytes per page is 64K, 1/4 the size.  1GB of 1M pages at 32 bytes per 
page is 32K, smaller yet.  The smaller table is more likely to fit into 
cache, and cheaper to load into cache.  While I wouldn't want to 
gaurentee anything, I could easily see the smaller table size that fits 
into cache actually gives a performance boost.  I've certainly seen 
weirder things happen.

>The other is to revise OCaml's
>data representations so that the GC and polymorphic primitives no
>longer need to know which pointers fall in the major heap.  This seems
>possible in principle, but will take quite a bit of work and break a
>lot of badly written C/OCaml interface code.  You've been warned :-)
>
>  
>
Of the two, I think I like the hashtable idea better.

If it isn't clear, this whole email is just speaking for myself.  I'm 
not allowed to speak for Jane Street (heck, if I'm luck, I'm allowed to 
speak *to* Jane Street :-).

Brian