Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization of dynlinked closure #5215

Closed
vicuna opened this issue Jan 28, 2011 · 6 comments
Closed

Serialization of dynlinked closure #5215

vicuna opened this issue Jan 28, 2011 · 6 comments

Comments

@vicuna
Copy link

vicuna commented Jan 28, 2011

Original bug ID: 5215
Reporter: hnrgrgr
Status: closed (set by @xavierleroy on 2015-12-11T18:07:21Z)
Resolution: fixed
Priority: normal
Severity: feature
Version: 3.13.0+dev
Fixed in version: 4.00.0+dev
Category: ~DO NOT USE (was: OCaml general)
Related to: #5687
Monitored by: dario @glondu

Bug description

A dynlinked code pointer could be serialized as a couple :

  • checksum of the original .cmo or .cmx
  • offset with the beginning of the dynlinked code.

Unmarshaling would be allowed in any executable with the same cmo/cmx already loaded.

Additional information

Attached patch is a detailed explanation of the proposition.

At runtime, it requires to keep records for each dynlinked modules of:

  • checksum ;
  • the start address and the end address of dynlinked (byte)code.

Motivation

The Ocsigen http server offers a mechanism of user session. In order to save memory, some sessions could temporary be saved on disk. If session contains one or more closures, then this will simplify our code. However, since we have an intensive use of dynlinked modules, and the Marshal module doesn't handle dynlinked closures, we can't include closures directly in the session record.

Additionally, it would be great if the serialization of a dynlinked closure could be reloaded after a restart of the server.

File attachments

@vicuna
Copy link
Author

vicuna commented Mar 13, 2012

Comment author: @xavierleroy

At long last, this feature is now implemented in the version/4.00 branch (commit 12227) and in trunk (commit 12229). Will be part of release 4.00.

I departed a bit from the proposed implementation, by not including the name of the dynlinked module in the serialized data, just its MD5. Run-time error messages are less clear, but this way we do not need to allocate a new marshalling "code" value, and the same logic handles both closures coming from the main program and closures coming from dynamically-loaded modules.

There are small tests in testsuite/tests/lib-dynlink-{bytecode,native}. More testing is always welcome.

@vicuna
Copy link
Author

vicuna commented Mar 14, 2012

Comment author: hnrgrgr

Thanks a lot !

I wrote also a small test program that exchanges closure between different program that shares the same plugin. The native code version works but the bytecode version fails.

In the bytecode version, a plugin's digest differs when loaded in different executable. It also differs when loaded in different order in the same program.

I initially thought it should be sufficient to compute the digest before to call "Symtable.patch_object" but it fails in a very specific situation where two modules with different dependencies generate the same sequence of bytecode (see testsuite/tests/lib-dynlink-marshall/bis/g.ml and gg.ml in the tarball). The modules would share the same digest.

The attached patch use the 'unpatched' bytecode and the list of imported interfaces to compute the digest of a cmo.

@vicuna
Copy link
Author

vicuna commented Mar 19, 2012

Comment author: @xavierleroy

In the bytecode version, a plugin's digest differs when loaded in different executable. It also differs when loaded in different order in the same program.

Thanks for spotting this problem.

The attached patch use the 'unpatched' bytecode and the list of imported interfaces to compute the digest of a cmo.

I'm not sure this is unique enough, because the relocation info isn't included in the digest. As a consequence, it could be (I haven't checked) that the following two modules

A: let x = "A"
B: let x = "B"

end up having the same digest. (The string literals are stored in the relocation info, IIRC.)

Your original proposal used the MD5 of the whole .cmo/.cma file as the digest. I think it is safe for .cmo files, containing just one compilation unit, but not for .cma files containing multiple units, which will all receive the same digest. The latter issue could be fixed by adding some diversification for each compunit of the .cma. In the end, the digests obtained this way should be safe, but they will change if e.g. debug info changes (and nothing else).

Let me think about these issues. If you have ideas, feel free to share.

@vicuna
Copy link
Author

vicuna commented Mar 21, 2012

Comment author: @xavierleroy

Commit r12253 in version/4.00: use MD5(MD5(file contents), unit-name) as unique identifier for dynlinked bytecode fragments. Thanks for testing again if you can.

@vicuna
Copy link
Author

vicuna commented Mar 26, 2012

Comment author: hnrgrgr

It works for me. Thanks.

@vicuna
Copy link
Author

vicuna commented Mar 26, 2012

Comment author: @xavierleroy

Thanks for the testing. I pushed the change on trunk as well (commit 12278).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant