Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005215OCamlOCaml generalpublic2011-01-28 13:442012-07-16 13:09
Reporterhnrgrgr 
Assigned To 
PrioritynormalSeverityfeatureReproducibilityN/A
StatusresolvedResolutionfixed 
PlatformOSOS Version
Product Version3.13.0+dev 
Target VersionFixed in Version4.00.0+dev 
Summary0005215: Serialization of dynlinked closure
DescriptionA dynlinked code pointer could be serialized as a couple :
 - checksum of the original .cmo or .cmx
 - offset with the beginning of the dynlinked code.

Unmarshaling would be allowed in any executable with the same cmo/cmx already loaded.
Additional InformationAttached patch is a detailed explanation of the proposition.

At runtime, it requires to keep records for each dynlinked modules of:
- checksum ;
- the start address and the end address of dynlinked (byte)code.

Motivation
----------

The Ocsigen http server offers a mechanism of user session. In order to save memory, some sessions could temporary be saved on disk. If session contains one or more closures, then this will simplify our code. However, since we have an intensive use of dynlinked modules, and the Marshal module doesn't handle dynlinked closures, we can't include closures directly in the session record.

Additionally, it would be great if the serialization of a dynlinked closure could be reloaded after a restart of the server.
TagsNo tags attached.
Attached Filespatch file icon dyn.patch [^] (9,370 bytes) 2011-01-28 13:44 [Show Content]
bz2 file icon test-dynlink.tar.bz2 [^] (2,889 bytes) 2012-03-14 19:42
patch file icon dynlink-bytecode.patch [^] (3,094 bytes) 2012-03-14 19:50 [Show Content]

- Relationships
related to 0005687resolvedfrisch dynlink broken when used from "output-obj" main program (bytecode) 

-  Notes
(0007055)
xleroy (administrator)
2012-03-13 15:56

At long last, this feature is now implemented in the version/4.00 branch (commit 12227) and in trunk (commit 12229). Will be part of release 4.00.

I departed a bit from the proposed implementation, by not including the name of the dynlinked module in the serialized data, just its MD5. Run-time error messages are less clear, but this way we do not need to allocate a new marshalling "code" value, and the same logic handles both closures coming from the main program and closures coming from dynamically-loaded modules.

There are small tests in testsuite/tests/lib-dynlink-{bytecode,native}. More testing is always welcome.
(0007077)
hnrgrgr (developer)
2012-03-14 19:49
edited on: 2012-03-14 19:51

Thanks a lot !

I wrote also a small test program that exchanges closure between different program that shares the same plugin. The native code version works but the bytecode version fails.

In the bytecode version, a plugin's digest differs when loaded in different executable. It also differs when loaded in different order in the same program.

I initially thought it should be sufficient to compute the digest before to call "Symtable.patch_object" but it fails in a very specific situation where two modules with different dependencies generate the same sequence of bytecode (see testsuite/tests/lib-dynlink-marshall/bis/g.ml and gg.ml in the tarball). The modules would share the same digest.

The attached patch use the 'unpatched' bytecode and the list of imported interfaces to compute the digest of a cmo.

(0007100)
xleroy (administrator)
2012-03-19 09:48

> In the bytecode version, a plugin's digest differs when loaded in different executable. It also differs when loaded in different order in the same program.

Thanks for spotting this problem.

> The attached patch use the 'unpatched' bytecode and the list of imported interfaces to compute the digest of a cmo.

I'm not sure this is unique enough, because the relocation info isn't included in the digest. As a consequence, it could be (I haven't checked) that the following two modules

A: let x = "A"
B: let x = "B"

end up having the same digest. (The string literals are stored in the relocation info, IIRC.)

Your original proposal used the MD5 of the whole .cmo/.cma file as the digest. I think it is safe for .cmo files, containing just one compilation unit, but not for .cma files containing multiple units, which will all receive the same digest. The latter issue could be fixed by adding some diversification for each compunit of the .cma. In the end, the digests obtained this way should be safe, but they will change if e.g. debug info changes (and nothing else).

Let me think about these issues. If you have ideas, feel free to share.



(0007123)
xleroy (administrator)
2012-03-21 15:34

Commit r12253 in version/4.00: use MD5(MD5(file contents), unit-name) as unique identifier for dynlinked bytecode fragments. Thanks for testing again if you can.
(0007167)
hnrgrgr (developer)
2012-03-26 16:01

It works for me. Thanks.
(0007175)
xleroy (administrator)
2012-03-26 19:47

Thanks for the testing. I pushed the change on trunk as well (commit 12278).

- Issue History
Date Modified Username Field Change
2011-01-28 13:44 hnrgrgr New Issue
2011-01-28 13:44 hnrgrgr File Added: dyn.patch
2011-05-17 16:40 doligez Status new => acknowledged
2012-03-13 15:56 xleroy Note Added: 0007055
2012-03-13 15:56 xleroy Status acknowledged => resolved
2012-03-13 15:56 xleroy Resolution open => fixed
2012-03-13 15:56 xleroy Fixed in Version => 4.00.0+dev
2012-03-13 15:56 xleroy Description Updated View Revisions
2012-03-14 19:42 hnrgrgr File Added: test-dynlink.tar.bz2
2012-03-14 19:49 hnrgrgr Note Added: 0007077
2012-03-14 19:50 hnrgrgr File Added: dynlink-bytecode.patch
2012-03-14 19:51 hnrgrgr Note Edited: 0007077 View Revisions
2012-03-19 09:48 xleroy Note Added: 0007100
2012-03-19 09:48 xleroy Status resolved => feedback
2012-03-21 15:34 xleroy Note Added: 0007123
2012-03-21 15:34 xleroy Status feedback => resolved
2012-03-26 16:01 hnrgrgr Note Added: 0007167
2012-03-26 19:47 xleroy Note Added: 0007175
2012-07-16 13:09 frisch Relationship added related to 0005687


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker