Re: Reverse-Engineering Bytecode: A Possible Commercial Objection To O'Caml

From: Gerd Stolpmann (gerd@gerd-stolpmann.de)
Date: Sat Jun 10 2000 - 16:41:36 MET DST

  • Next message: Ken Wakita: "Re: polymorphic variants (2) + magic"

    On Fri, 09 Jun 2000, Daniel Ortmann wrote:
    >Note that althought I brought up the idea of such encryption, I don't really
    >have much opinion as to whether it is a good idea. I *do* believe it should
    >be discussed.
    >
    >... and not mainly philosophically. But technically. Hopefully that email
    >got some people thinking. If so, then that small mission of mine has been
    >accomplished.

    >From a purely technical point of view: I think it is impossible to protect
    executable code by encryption if the code itself contains the key: Simply start
    the encrypted program, and wait until the bytecode is decrypted. The
    unencrypted bytecode will be in memory, and it is possible to examine memory
    contents and search the regions containing bytecode (which is quite simple).

    There is no advantage if asymmetric encryption is used.

    The only way to make it harder to get the bytecode is to permute the
    instruction codes. The permutation is applied both to the bytecode and to the
    virtual machine itself. However, the level of protection is not very high,
    because you can compare the permuted VM with the original VM, and because you
    can analyze the structure of the bytecode (which is still the same).

    I think it is not possible to improve the level of protection by encryption.

    >Also, isn't "information hiding" part of the purpose of .cmi files? What I am
    >suggesting might be viewed as an stronger type of .cmi file ... with REAL
    >"implementation hiding". :-)

    But your "implementation hiding" has a different intention. Normally, such a
    property is discussed to improve the maintainability of software; i.e. you can
    exchange implementations without changing the module interface. In contrast
    to this, your proposal has a commercial background.

    >I just "reverse engineered" emacs byte code by doing
    ><control> x <control> r ~/.emacs.elc ... and easily viewed actual lisp code.
    >
    >That's how easy it was. That's the kind of thing I was thinking about
    >avoiding.

    This is not possible with OCaml bytecode which is relatively close to machine
    code. This means: You cannot easily find out the structure of the source code
    by analyzing the byte code because compilation "flattens" the structure to some
    extend. Recursions are often transformed to ordinary
    loops. The memory layout of values can be difficult if some type of closures
    are used.

    Perhaps some type of automated reverse engineering is possible (without tools
    I cannot imagine it), but much of the structure of the source program will be
    lost.

    I would suppose that it is cheaper to develop the program again than to crack
    it.

    Note that Java byte code contains even more structure than OCaml byte code, and
    I have never heard that somebody complained about missing protection of Java
    investments.

    I hope my remarks make the discussion more to the point.

    Gerd

    -- 
    ----------------------------------------------------------------------------
    Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
    Viktoriastr. 100             
    64293 Darmstadt     EMail:   gerd@gerd-stolpmann.de
    Germany                     
    ----------------------------------------------------------------------------
    



    This archive was generated by hypermail 2b29 : Mon Jun 12 2000 - 16:08:03 MET DST