Version française
Home     About     Download     Resources     Contact us    
Browse thread
type inference for python
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: skaller <skaller@u...>
Subject: Re: [Caml-list] type inference for python
On Tue, 2005-02-01 at 16:26, Philippe Fremy wrote:
> 	Hi,
> 
> I would like to implement something similar to the type inference of 
> ocaml for the python language. I have always found it very impressive 
> (although I have only used caml light).
> 
> I have no experience with the topic, 

I do. I did considerable work on this. The project
was called Vyper. The key idea was to build an 
interpreter which was used to initialise all the
dictionaries, then analyse them for type information,
and generate code that used that type information.

I got as far as getting the interpreter to work very well,
running most standard pure python plus some emulations of
popular C extensions, and with reasonable performance.
In addition, I made some extensions, many of which
got into later versions of Python .. including proper
lexical scoping, garbage collection (well, my version
used Ocaml for that .. ) and a few other things.

However, the type analysis was never done, and I lost
interest in the project, primarily because it would
never support continuation passing in the style of 
Stackless Python. (And the source is lost now).

A few technical problems also got in the way:

(1) I needed finalisers for __del__ methods.
Ocaml has Ocaml finalises today as a consequence.
(C finalisers existed, but I needed Ocaml ones).

Unfortunately, Python uses ref counting and finalisation
is therefore synchronous and well ordered for non-cyclic
references, whereas my implementation just used GC.
This meant certain programs didn't quite work as
they did in CPython.

(2) Ocaml native code compiler can't generate shared libraries,
so compiling Python to Ocaml will never emulate dynamic
loading. Perhaps you could do this with Ocaml bytecode,
but the interpreter needed to be fast.

This was a very serious obstacle, since every Python
C extension I wanted to emulate had to be hardwired
into the program (even though using it emulated
the usual import semantics).

(3) Python specs weren't written by someone knowing about
languages and standardisation. By far the worst problem
was exceptions. In many cases errors are specified
to throw exceptions... which means the behaviour is well
defined.

This alone make static type analysis almost impossible.
To do static typing you need a specification that
says the result of a wrong typing is undefined.
Otherwise the program can *rely* on catching an exception,
and a type error isn't an error (since the behaviour is
well defined). Trying to explain that to Pythonistas and
van Rossum in particular proved difficult.. although they
seemed to understand the problem which dynamic mutations
to modules caused and considered 'freezing' modules after
they'd been imported. This may have been done in later
versions of Python: some changes to 'eval' and friends
were made for his reason (you have to specify the dictionary
to use for an eval now).

> 1. analyse all the constraints of the code
> Ex:
> def f(a): a.append(1)
> def g(a): a=a+1; f(a)
> g('coucou')
> 
> => a must support append

No, you cannot assume that due to the 'exception' problem
mentioned above.

> => a must also be an int

And you can't assume that either. A could be a class
with an __add__ method. Alternatively, it could be anything,
and the called of g() is again relying on a type error
to throw an exception.


> 2. cross-validate the constraints consistency
> => inconsistent assertions

Python variables are all polymorphic. It isn't
inconsistent to have

	a is an int
	a is a string

This is possible, at different times .. as an example:

	for i in [1,"string"]: ..

i is first an int then a string.

> The 2. and 3. looks like the most difficult aspect of it.

Actually the difficulty is deciding how to modify
the semantics of Python so analysis is possible.
This will break some programs, so you need to choose
a reasonable set of constraints.

You would typically assume that modules were immutable
after loading, whilst class objects remained dynamic.
This means you can apply some static analysis
by actually executing the program up to the main
function call, then examining what got imported.
At least that was my theory .. in particular,
you can type module level variables, and you can
actually lookup functions defined in modules and
examine the code. By using global analysis, you can
then optimise some functions. For example,
you might determine that some function

	def f(a): return a+1

really is always called with an int argument,
and thus optimise it.


-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net