English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
features of PCRE-OCaml
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2000-12-11 (17:29)
From: Gerd Stolpmann <gerd@g...>
Subject: Re: features of PCRE-OCaml
On Sat, 09 Dec 2000, Markus Mottl wrote:
>Gerd Stolpmann schrieb am Friday, den 08. December 2000:
>> There are two functions making it easy: enter_blocking_section and
>> leave_blocking_section. For example, the stub for the read syscall of the Unix
>> library:
>Ok, I have found an article by Xavier on these functions:
>  http://caml.inria.fr/archives/199905/msg00035.html
>So if I am not mistaken, a function that calls the GC or allocates memory
>on the OCaml-heap cannot be considered reentrant even if its semantics
>is otherwise referentially transparent. This means that just "tagging"
>a function as "pure" is no guarantee that it won't mess up the runtime
>when e.g. calling the GC concurrently - right?

For example, the situation must not occur where one thread is initializing
memory and is interrupted by another thread allocating memory and calling the
GC. One precondition of the GC is that memory is always initialized.

"Reentrancy" is an abstract view on the function interface; it is not true for
lower coding levels because (heap) memory is nothing but a large global variable
implicitly shared by every piece of code.

>In other terms I can put those functions around the largest section of
>C-code that doesn't interfere with the OCaml-runtime system - then I
>should be safe.

I think so.

>The only question now is: would it really pay for pattern matching in the
>PCRE? I have taken a look at the implementation of these functions and on
>their use, but have only found cases where some function really blocks for
>either an indefinite (e.g. read) or at least potentially very long amount
>of time (e.g. gethostbyaddr, which might need to contact a nameserver).
>Without threads we won't benefit, anyway, and if we use threads, there
>is a small overhead associated with calling these functions. Pattern
>matching maybe does not eat up so much time in the average case that this
>is justified. Any experiences or suggestions when using these functions
>is advisable?

I would say it depends on the problem size. For example, when searching in a
long text it is definitely worth-while to release the masterlock.

The more interesting case is the average text processing program with many
invocations of the PCRE engine with average problem sizes. The question is
whether the sum of all invocations is big enough such that an effect is
measurable. Ideally, I can imagine a two processor system in which one CPU only
executes Caml code, and the other only regexps. From the Caml CPU's point of
view, the PCRE calls are (ideally) cost-free (because they are delegated to the
other CPU). However, there is a synchronization overhead, and nothing is won if
the Caml CPU must spend more time with synchronization than it would spend with
executing the regexp itself.

I think it is worth an experiment.

Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 100             
64293 Darmstadt     EMail:   gerd@gerd-stolpmann.de