Serialization of dynlinked closure #5215

vicuna · 2011-01-28T12:44:50Z

Original bug ID: 5215
Reporter: hnrgrgr
Status: closed (set by @xavierleroy on 2015-12-11T18:07:21Z)
Resolution: fixed
Priority: normal
Severity: feature
Version: 3.13.0+dev
Fixed in version: 4.00.0+dev
Category: ~DO NOT USE (was: OCaml general)
Related to: #5687
Monitored by: dario @glondu

Bug description

A dynlinked code pointer could be serialized as a couple :

checksum of the original .cmo or .cmx
offset with the beginning of the dynlinked code.

Unmarshaling would be allowed in any executable with the same cmo/cmx already loaded.

Additional information

Attached patch is a detailed explanation of the proposition.

At runtime, it requires to keep records for each dynlinked modules of:

checksum ;
the start address and the end address of dynlinked (byte)code.

Motivation

The Ocsigen http server offers a mechanism of user session. In order to save memory, some sessions could temporary be saved on disk. If session contains one or more closures, then this will simplify our code. However, since we have an intensive use of dynlinked modules, and the Marshal module doesn't handle dynlinked closures, we can't include closures directly in the session record.

Additionally, it would be great if the serialization of a dynlinked closure could be reloaded after a restart of the server.

File attachments

vicuna · 2012-03-13T14:56:39Z

Comment author: @xavierleroy

At long last, this feature is now implemented in the version/4.00 branch (commit 12227) and in trunk (commit 12229). Will be part of release 4.00.

I departed a bit from the proposed implementation, by not including the name of the dynlinked module in the serialized data, just its MD5. Run-time error messages are less clear, but this way we do not need to allocate a new marshalling "code" value, and the same logic handles both closures coming from the main program and closures coming from dynamically-loaded modules.

There are small tests in testsuite/tests/lib-dynlink-{bytecode,native}. More testing is always welcome.

vicuna · 2012-03-14T18:49:25Z

Comment author: hnrgrgr

Thanks a lot !

I wrote also a small test program that exchanges closure between different program that shares the same plugin. The native code version works but the bytecode version fails.

In the bytecode version, a plugin's digest differs when loaded in different executable. It also differs when loaded in different order in the same program.

I initially thought it should be sufficient to compute the digest before to call "Symtable.patch_object" but it fails in a very specific situation where two modules with different dependencies generate the same sequence of bytecode (see testsuite/tests/lib-dynlink-marshall/bis/g.ml and gg.ml in the tarball). The modules would share the same digest.

The attached patch use the 'unpatched' bytecode and the list of imported interfaces to compute the digest of a cmo.

vicuna · 2012-03-19T08:48:17Z

Comment author: @xavierleroy

In the bytecode version, a plugin's digest differs when loaded in different executable. It also differs when loaded in different order in the same program.

Thanks for spotting this problem.

The attached patch use the 'unpatched' bytecode and the list of imported interfaces to compute the digest of a cmo.

I'm not sure this is unique enough, because the relocation info isn't included in the digest. As a consequence, it could be (I haven't checked) that the following two modules

A: let x = "A"
B: let x = "B"

end up having the same digest. (The string literals are stored in the relocation info, IIRC.)

Your original proposal used the MD5 of the whole .cmo/.cma file as the digest. I think it is safe for .cmo files, containing just one compilation unit, but not for .cma files containing multiple units, which will all receive the same digest. The latter issue could be fixed by adding some diversification for each compunit of the .cma. In the end, the digests obtained this way should be safe, but they will change if e.g. debug info changes (and nothing else).

Let me think about these issues. If you have ideas, feel free to share.

vicuna · 2012-03-21T14:34:15Z

Comment author: @xavierleroy

Commit r12253 in version/4.00: use MD5(MD5(file contents), unit-name) as unique identifier for dynlinked bytecode fragments. Thanks for testing again if you can.

vicuna · 2012-03-26T14:01:05Z

Comment author: hnrgrgr

It works for me. Thanks.

vicuna · 2012-03-26T17:47:26Z

Comment author: @xavierleroy

Thanks for the testing. I pushed the change on trunk as well (commit 12278).

vicuna closed this as completed Dec 11, 2015

vicuna mentioned this issue Mar 14, 2019

dynlink broken when used from "output-obj" main program (bytecode) #5687

Closed

vicuna added the feature-wish label Mar 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialization of dynlinked closure #5215

Serialization of dynlinked closure #5215

vicuna commented Jan 28, 2011

vicuna commented Mar 13, 2012

vicuna commented Mar 14, 2012

vicuna commented Mar 19, 2012

vicuna commented Mar 21, 2012

vicuna commented Mar 26, 2012

vicuna commented Mar 26, 2012

Serialization of dynlinked closure #5215

Serialization of dynlinked closure #5215

Comments

vicuna commented Jan 28, 2011

Bug description

Additional information

Motivation

File attachments

vicuna commented Mar 13, 2012

vicuna commented Mar 14, 2012

vicuna commented Mar 19, 2012

vicuna commented Mar 21, 2012

vicuna commented Mar 26, 2012

vicuna commented Mar 26, 2012