Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynlinking duplicate module clobbers host program state #6462

Closed
vicuna opened this issue Jun 18, 2014 · 3 comments
Closed

Dynlinking duplicate module clobbers host program state #6462

vicuna opened this issue Jun 18, 2014 · 3 comments

Comments

@vicuna
Copy link

vicuna commented Jun 18, 2014

Original bug ID: 6462
Reporter: stephenrkell
Assigned to: @mshinwell
Status: resolved (set by @mshinwell on 2017-06-09T16:02:17Z)
Resolution: duplicate
Priority: normal
Severity: minor
Platform: amd64
OS: Linux
OS Version: Ubuntu 12.04
Version: 4.01.0
Target version: later
Category: otherlibs
Related to: #4229 #4231 #4839 #6950 #6957
Monitored by: @ivg @gasche @rixed @hcarty

Bug description

If you inadvertently duplicate a module between the executable and a dynamically loaded library, for example by adding an extraneous -linkpkg when building a .cmxs, loading the library will "re-initialize" the static data owned by the executable's copy of the module.

I've attached a tarball which demonstrates this. I would expect there to be a private copy of "myval" in lib2.cmxs, so that main continues to see 69105 instead of 42. (Alternatively, there could be an explicit treatment of symbol visibility and overriding, so that the user can control what happens, but that seems to be opening a can of worms.)

This is related to issue #4839, but applies even if you have compatible signatures. It's not a type-correctness problem so much as a general semantic bug.

It seems worth mentioning that this also seems to make the GC corrupt the program . Perhaps the root set gets clobbered somehow? The smallest example I have is a null CIL plugin, which is also included in the tarball -- "make run-cilly". This segfaults on my machine. If you dig around in gdb using watchpoints, you find that the storage allocated by the second initializer (e.g. try watching Pretty.aligns, which for me is at &camlPretty + 0x190 bytes) gets silently re-used as if it were unreachable (e.g. I have seen it being updated to point to a function, not a list, which is clearly wrong). Since the old pointer is still live, this quickly crashes the program. I'll be happy to help anybody reproduce this.

Steps to reproduce

Extract tarball, run make.

To see the GC problem, make sure you have CIL installed and then make run-cilly.

Additional information

I was hoping the simple test case would illustrate the GC problems too, which is why I made it run in a loop and keep allocating... but it doesn't crash for me.

File attachments

@vicuna
Copy link
Author

vicuna commented Feb 6, 2015

Comment author: @damiendoligez

I would expect there to be a private copy of "myval" in lib2.cmxs

We would like to do that, but the Unix linker does not support it: its namespace is desperately flat, and it provides no renaming facilities. Gnu binutils do provide renaming for object files, but it introduces its own set of problems. BTW, this is also the reason for the existence of the -for-pack option.

For the GC-related crash, I'm not surprised: the second copy of the module will indeed clobber the roots of the first. I'm not sure how this leads to a dangling pointer, but that's probably not worth investigating.

The only solution I can see is to forbid dynlinking a module that has the same name as an existing module (static or dynlinked). I just hope this can be done without big changes to the compiler.

@vicuna
Copy link
Author

vicuna commented Feb 9, 2015

Comment author: stephenrkell

In hindsight I was over-hasty to say that lib2 should have a private instance of myval. It's more reasonable for there to be a unique global myval. So the real issue is: why does lib2 want to initialise state belonging to lib1? Surely lib1, and only lib1, should do that? There's nothing in Unix linking that prevents this; it's the usual behaviour.

(Of course it should be possible to use static linking to arrange that lib2 has a private copy of myval. But that is a separate configuration and probably should not be the default.)

I don't think namespacing is the issue. Although quirky, there is quite a bit of namespacing support available in Unix linkers: symbol scope, symbol visibility and (at dynamic-load time) RTLD_LOCAL.

If I'm understanding -pack correctly, it's like ld -r (relocatable output). With both this and the problem I reported, it seems as though ocamlopt is duplicating linker functionality in a way that adds overall complexity. What are the reasons for doing this, rather than using the linker's actual features directly? Would it be possible/helpful to write down a mapping from ocaml's linking semantics to ELF linker features? I could potentially help with such an effort, since I know a little about linkers.

@vicuna
Copy link
Author

vicuna commented Jun 9, 2017

Comment author: @mshinwell

This has been fixed by #1063.
Since this requires cil and there are other similar testcases already present, I'm not going to put this report in the testsuite.

$ make
ocamlfind ocamlopt -o "lib1.cmx" -c "lib1.ml"
ocamlfind ocamlopt -a -o "lib1.cmxa" lib1.cmx
ocamlfind ocamlopt -o "prog" -linkpkg -package dynlink lib1.cmxa main.ml
ocamlfind ocamlopt -o "lib2.cmx" -c "lib2.ml"
ocamlfind ocamlopt -shared -o "lib2.cmxs" -linkpkg lib1.cmxa lib2.cmx
./prog
Lib1's !myval is 42
changing it to 69105
Fatal error: exception Dynlink.Error(Dynlink.Module_already_loaded "Lib1")
make: *** [run-prog] Error 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants