Version française
Home     About     Download     Resources     Contact us    
Browse thread
[oliver: Re: [Caml-list] Strings as arrays or lists...]
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: David Chase <chase@w...>
Subject: Re: [Caml-list] [RANT] String representation (was: Strings as arrays or lists...)
At 01:43 PM 3/4/2003 +0100, Diego Olivier Fernandez Pons wrote:
>    Bonjour,
>
>Some of the features you wish are 'not so hard to implement', at least
>if you already have 'conceptually bugged low-level' strings :

I hope you aren't proposing that everyone implement the "String" data
structure that they wish they really had.  Strings are too basic, and
too common, to be left to this.

Consider Java's approach of String and StringBuffer.  My main gripe
(after years of use) is that too many functions should take a StringBuffer
(which should probably be an abstract type, though there is indeed some
loss of efficiency) and return a StringBuffer, rather than simply returning
a String.  From an optimization point-of-view, the most useful thing
for performance would be to find a way to have an abstract type with
non-abstract synchronization.  All StringBuffer ops are synchronized,
and probably should be, but a compiler and idioms can help IF THE
SYNCHRONIZATION IS EXPOSED.  That is, in Java, this code

  StringBuffer sb ...
  synchronized (sb) { // I know that sb is thread-private
     while (...) {
        sb.append(...);
     }
  }

is likely to run faster than it would without the outer synchronization
because the JIT compiler can see that the (non-abstract, exposed, final)
StringBuffer synchronizations are redundant.

On-the-other-hand, multithreaded programming appears to be so hard
for the typical and even good-typical programmer, that I'd be interested
in seeing if there were better ways to do it (say, threads communicating
only through queues).

>> - Arrays need fast random access by numeric index, strings do not
>> need that.
>
>You can easily hava a (log n/k + 1) acces where n is the total size of
>the string and k is the size of each bucket (if you choose a data
>structure with constant buffer size)

That will be noticeably slower.  In the Java compiler that I worked
on, the String operations were specially recognized by the compiler
so that it could infer success of string bounds checks, and that
"minor" optimization was noticeably beneficial.  On the other hand,
with some cleverness and a biggish K, you might be able to get most
of the benefit simply from the "bounds check" (as in, small strings
fit in their first and only bucket, illegal indices necessarily also
overflow off the end of the first bucket).

David Chase


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners