Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Memory mapped values
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <xavier.leroy@i...>
Subject: Re: [Caml-list] Memory mapped values
> Would it be possible in theory for "input_value" to work by
> memory-mapping the file being read, rather than by immediately reading
> the file?  The idea would be that the structured value would then only
> actually be realised in physical memory as it is touched by execution
> and the corresponding pages of the memory-mapped file dragged in by the
> virtual memory mechanism.  (To be honest, I haven't actually checked if
> this is how input_value currently works, though I'm certain it can't
> be.)

No, that's not how input_value currently works :-)

What you describe sounds feasible, with two caveats:

- You need a serialization format that is "isomorphic" to the memory
representation of the data, i.e. that occupies the same space.  The
original Caml Light implementation of serialization used such a
format: the on-disk representation was essentially produced by a
copying GC applied to the value being externed, and input_value would
just read it in heap and replace offsets by pointers.

There were two problems with this approach.  One is 32/64 bit
interoperability, where you need to expand or shrink the data
accordingly during input_value; this is expensive and would prevent
direct access to a page as you describe.  The other is that this
serialization format wastes space, resulting in huge files that are
slow to read.  The "compact" format that OCaml uses (basically, a
prefix notation for the DAG of memory blocks composing the externed
value) is much more compact (by a factor of 10, roughly), and while it
takes more CPU time to do input_value, this is well offset by the
reduced file reading time.

- You need to relocate offsets into pointers when a page is first
accessed.  Under Unix, this could possibly be done by mapping the file
without read and write access, then catch the segmentation violation
that occurs when one of the pages is accessed, patch the pointers, and
change the page protections to read-write.  All this is highly
non-portable and quite slow, though.  (I think it's Appel and Li that
tried VM tricks to implement concurrent copying GC in the late 80s;
they found out later that the cost of changing page permissions is so
high under all Unix implementations they tested that the scheme was
impractical.) 

Because of this cost issue, your scheme would be interesting only if
the program accesses a small fragment of the memory-mapped data.  If
you're going to use all of the data, reading it in one step is more
efficient (it saves the cost of trapping SEGV and changing page
protections).

> But if it could work, then that could
> make for one of the very best and easiest ways of persisting data
> structures - easier than moving to a relational database, and directly
> related to the programming model.

I'm pretty ignorant with databases, but still what you describe is
vaguely reminiscent of some OO databases (ObjectStore, maybe?).  Two
issues remain to be addressed, though: how to modify incrementally the
data structure (modifying it in core and re-dumping it whole to disk
doesn't suffice), and how to deal with atomicity of updates...

Best wishes,

- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr