Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006462OCamlOCaml runtime systempublic2014-06-18 21:262015-08-13 15:28
Reporterstephenrkell 
Assigned To 
PrioritynormalSeverityminorReproducibilityalways
StatusacknowledgedResolutionopen 
Platformamd64OSLinuxOS VersionUbuntu 12.04
Product Version4.01.0 
Target Version4.03.0+devFixed in Version 
Summary0006462: Dynlinking duplicate module clobbers host program state
DescriptionIf you inadvertently duplicate a module between the executable and a dynamically loaded library, for example by adding an extraneous -linkpkg when building a .cmxs, loading the library will "re-initialize" the static data owned by the executable's copy of the module.

I've attached a tarball which demonstrates this. I would expect there to be a private copy of "myval" in lib2.cmxs, so that main continues to see 69105 instead of 42. (Alternatively, there could be an explicit treatment of symbol visibility and overriding, so that the user can control what happens, but that seems to be opening a can of worms.)

This is related to issue 0004839, but applies even if you have compatible signatures. It's not a type-correctness problem so much as a general semantic bug.

It seems worth mentioning that this also seems to make the GC corrupt the program . Perhaps the root set gets clobbered somehow? The smallest example I have is a null CIL plugin, which is also included in the tarball -- "make run-cilly". This segfaults on my machine. If you dig around in gdb using watchpoints, you find that the storage allocated by the second initializer (e.g. try watching Pretty.aligns, which for me is at &camlPretty + 0x190 bytes) gets silently re-used as if it were unreachable (e.g. I have seen it being updated to point to a function, not a list, which is clearly wrong). Since the old pointer is still live, this quickly crashes the program. I'll be happy to help anybody reproduce this.
Steps To ReproduceExtract tarball, run make.

To see the GC problem, make sure you have CIL installed and then make run-cilly.
Additional InformationI was hoping the simple test case would illustrate the GC problems too, which is why I made it run in a loop and keep allocating... but it doesn't crash for me.
TagsNo tags attached.
Attached Filesgz file icon ocaml-dynlink-clobber-bug.tar.gz [^] (1,126 bytes) 2014-06-18 21:26

- Relationships
related to 0004839acknowledged natdynlink reproducible segfault 

-  Notes
(0013243)
doligez (administrator)
2015-02-06 19:07

> I would expect there to be a private copy of "myval" in lib2.cmxs

We would like to do that, but the Unix linker does not support it: its namespace is desperately flat, and it provides no renaming facilities. Gnu binutils do provide renaming for object files, but it introduces its own set of problems. BTW, this is also the reason for the existence of the -for-pack option.

For the GC-related crash, I'm not surprised: the second copy of the module will indeed clobber the roots of the first. I'm not sure how this leads to a dangling pointer, but that's probably not worth investigating.

The only solution I can see is to forbid dynlinking a module that has the same name as an existing module (static or dynlinked). I just hope this can be done without big changes to the compiler.
(0013261)
stephenrkell (reporter)
2015-02-09 13:18

In hindsight I was over-hasty to say that lib2 should have a private instance of myval. It's more reasonable for there to be a unique global myval. So the real issue is: why does lib2 want to initialise state belonging to lib1? Surely lib1, and only lib1, should do that? There's nothing in Unix linking that prevents this; it's the usual behaviour.

(Of course it should be possible to use static linking to arrange that lib2 has a private copy of myval. But that is a separate configuration and probably should not be the default.)

I don't think namespacing is the issue. Although quirky, there is quite a bit of namespacing support available in Unix linkers: symbol scope, symbol visibility and (at dynamic-load time) RTLD_LOCAL.

If I'm understanding -pack correctly, it's like ld -r (relocatable output). With both this and the problem I reported, it seems as though ocamlopt is duplicating linker functionality in a way that adds overall complexity. What are the reasons for doing this, rather than using the linker's actual features directly? Would it be possible/helpful to write down a mapping from ocaml's linking semantics to ELF linker features? I could potentially help with such an effort, since I know a little about linkers.

- Issue History
Date Modified Username Field Change
2014-06-18 21:26 stephenrkell New Issue
2014-06-18 21:26 stephenrkell File Added: ocaml-dynlink-clobber-bug.tar.gz
2014-07-16 10:22 doligez Relationship added related to 0004839
2014-07-16 10:23 doligez Status new => acknowledged
2014-07-16 16:43 doligez Target Version => 4.02.1+dev
2014-09-04 00:25 doligez Target Version 4.02.1+dev => undecided
2014-09-14 22:38 doligez Target Version undecided => 4.02.2+dev / +rc1
2015-02-06 18:56 doligez Description Updated View Revisions
2015-02-06 19:07 doligez Note Added: 0013243
2015-02-06 19:07 doligez Target Version 4.02.2+dev / +rc1 => 4.03.0+dev
2015-02-09 13:18 stephenrkell Note Added: 0013261


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker