Version française
Home     About     Download     Resources     Contact us    
Browse thread
type inference for python
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: skaller <skaller@u...>
Subject: Re: [Caml-list] type inference for python
On Tue, 2005-02-01 at 21:54, Philippe Fremy wrote:

> But unlike vyper, my goal is not to provide an alternative python 
> implementation with type inference and other goodies.

That wasn't mine either.

> I aim at something like pychecker. That is, a tool that you can run to 
> check the types consistency of a python. 

Perhaps you missed the point of my description. In order to
check the types of something, where there are no type annotations,
you need to use the known types as a starting point: namely
the literals.

Often in Python you will see:

	def f(a):
		return M.x + a

where you don't know the type of a. But what about M.x?

Perhaps you can look up x in M. But attributes are
not declared in Python, let alone their types. What you need
to do is actually find the value of M.x to see what type
it is.

IN order to do this, you observe there is a statement:

	import M

in the program. There is a simple way to find the type of M.x,
and that is to actually *execute* the import statement.

In other words, you have no choice but to implement an
interpreter -- or use the existing one.

More precisely, you could look at the code in M, do flow
analysis, and try to guess at the type of x from that.

But that is much more complicated and difficult. Why bother
to do that when you can just import M and then examine
the module dictionary??

So you see my purpose was not to implement a python interpreter,
I did that because it is *necessary*. There is no better way
to find the types of attributes in modules.

The reason this works is based on the fact that *usually*
in Python, a program does three things:

(a) import some modules
(b) do some initialisations in the mainline
(c) call the main function

It is a bit tricky to distinguish (b) and (c).
The point however, is that (a) is usually expected
to just create a set of values and functions
which the mainline then uses. So if you want to type check
the mainline code, you need to establish the types of the
values and functions in the imported modules as a starting
point... and as mentioned you do that by actually importing
them. This is basically the only possibility: it is reliable
because you don't have to guess about how to do it, the
semantics are defined in the language reference.

In addition, for many modules the typing is known
because it is documented in the Library Reference.

After you have done this, then you must analyse the mainline.

You CANNOT execute it, in the same fashion, to find the types,
because you don't have the input data, it could be a server,
etc etc. But most of the modules, you can expect them
to execute and terminate quickly, since the execution
serves mainly to just to store things in the module dictionary.

The difficulty you will face with 'pycheck' or whatever is that
you really do have to establish with some certainty what
some of the types are, so that a violation can be noted
and warned about.

Otherwise, if whenever you're in doubt you just say
'polymorphic' then clearly every doubtful use will just
lead to saying 'polymorphic' and you won't be able to
warn about *any* misuses .. :)


> It is perfectly acceptable for 
> it to be limited and to report false warning for twisted constructs.

The question is whether you can provide a reasonable signal to noise
ratio. To do that you really have to be right about the intended
typing most of the type.

>  I keep thinking about the interview of a mailman 
> developer, who said that a big problem of python is that eventhough you 
> have a program running for a long time, there might be some hidden place 
> where you did type mistake that will break your whole program with an 
> exception.
> 
> Regarding this, python is no better than C program segfaulting.

C is probably more reliable because it is statically
typed, however there is a loss of reliability in C because the type
system isn't very expressive, and the workaround -- casting --
is used far more often than one would like. Python, on the
other hand, is easier to test (no long compile times),
and dynamic type checks catch errors earlier than in C.

Either way, what you would like to do is a laudible goal,
if you can get it to work .. but i suspect you will find
it much harder than you might think -- I certainly did :)


> But since I am taking the "helper tool" approach, it is ok to make 
> assumptions that will trigger real warning in 90% of the cases and false 
> warning when you trick python too much. 

Sure, but what do you mean 'trick Python'? Writing code
based on the specified semantics -- including exception handling --
is not a trick.

I myself write lots of Python even today (all my build scripts
and utilities are Python). I do things often that you might
think are tricks:

try:
	x = f()
except:
	x = default

I do that one heaps, my code is littered with it,
almost the entire Felix configuration script works
that way.


> I am ready to make a few assumptions as a starter:
> - operations on int/float will return an int or float
> - operation + involves two objects of the same type and return an object 
> of the same type
> - AttributesError are not supposed to be raised

My literate programming tool interscript relies on that heavily.
It is in fact the central technique I use to build polymorphic
weavers (a weaver is a driver for a typesetter). you are
allowed to call the moral equivalent of:

	x.start_emphasise()
	x.output("jhgjhgjhg")
	x.end_emphasise()

and an error finding the start/end emphasise method is just
ignored. If they're found and fail that's an error.

It isn't a bug to leave out the emphasise methods 
for the text weaver -- plain text doesn't support it.
Latex and HTML do. This allows you to encode pretty
features (like emphasis) whether or not the typesetter
supports it -- that is, without knowing which typesetter
the client will want to use for your document.

You also might have some trouble finding this kind of thing
out -- since the weavers are all plugins. They're loaded
dynamically on demand.

Is this trickery? Well, if it is, if this is bad programming,
then why bother using Python? One uses it *because* it
is a dynamic language.

> Did you find a lot of code relying on this idiom ? 

No, most code doesn't do anything fancy. Most Python code
is 'well typed' and reasonbly static. By well typed I mean
a variable is either a single type, or clearly polymorphic.

Unfortunately, 'most code' is easily type checked by a
single test case. The real advantage of a static checker
is weeding out corner cases -- programmers don't get
the ordinary cases wrong that often, and when they
do the first test run usually finds the bug.


> Iterating over lists with variable types is sure tricky. But again, 
> that's not a common programming pattern (or is it my imagination).

The common case is (im Ocaml notation):

	'a option

That is, some value of a fixed type,
or None. That one you have to handle.

Luckily, there's lots of theory for this. None is a bottom
or ground or Error or whatever, so for each plain type

	int, string, ... 

you need to handle

	Some int | None, Some string | None

which are called 'pointed types'.

> You had the constraint of supporting every language feature. I can 
> afford to support only 50% of them. That would already be a big win over 
> the current situation.

Not really. The problem is that doubt propagates exponentially.
So you have to handle quite a bit :)

You probably need to distinguish

(a) monomorphic unpointed type (int, string etc)
(b) monomorphic pointed (allows None too)
(c) dynamic (any type is allowed)
(d) don't know

Don't confuse (c) and (d). Dynamic means you DO know
that the type varies (no further deduction required).
Don't know means you're open to further constraints
as you see more code.

> I am puzzled because the industry uses poor languages to do important 
> things. Our life depends so much on computers those days that it is a 
> pity to uses such insecure languages. There have been a lot of good 
> languages produced but few of them really caught on.

You will get much agreement with that sentiment here :)

> Whay I like in python is that it has some characteristics that make it 
> an easy to pick language, and it can replace C++, VB and Java in many 
> areas. But this type fully dynamic types is one of its strength but also 
> the number one error made by beginners.

I suspect dynamic typing is mostly a workaround
for lack of polymorphism and the lack of sum types (variants).
It does have the advantages of avoiding the difficulties
of find a sound, simple, and expressive type system,
which continues to be a bleeding edge research topic :)

If you're serious, and you think programmers will take
your tool seriously, you might consider establishing
a protocol for type annotations -- in comments for
example, perhaps like:

	def f(a): #a:int
	  x = y + z #z:int

Using such annotations, your tool could be helped
quite a bit. In particular, when you can't determine a
type, the diagnostic is a hint to add an annotation.
Software houses might then require enough annotations
as part of their coding standards to shut your tool up :)

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net