Version française
Home     About     Download     Resources     Contact us    
Browse thread
OCaml program crashes after computing fine for 2 days during grep on multiMB output file
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Gerd Stolpmann <info@g...>
Subject: Re: [Caml-list] OCaml program crashes after computing fine for 2 days during grep on multiMB output file
Am Mittwoch, den 01.03.2006, 12:03 +0100 schrieb Andries Hekstra:
> 
> Dear OCaml-list, 
> 
> I use OCaml under 64-bit Linux to do signal processing simulations of
> next generation optical storage devices. So far, I have really enjoyed
> programming in OCaml, e.g. as program texts are considerable shorter
> than in C++ for computations that involve many arrays. My computations
> run for many days if not a week, and produce output files of ca. 20
> MB. I run them in a job queue.  
> 
> Recently I have been plagued by programs that crash when I do a "grep"
> on the output file (opened with open_out). E.g. the program has been
> running succesfully for a few days. I do a "grep @ *.out" in the
> directory to monitor progress as important lines in the output file
> start with a "@". A few minutes later I receive mails from the queuing
> system saying that everything crashed. 
> 
> What is the cause of these crashes? Can somebody give me a clue? 

A stale NFS file handle normally means that the file disappeared on the
NFS server. (The server does not keep files open while clients have them
open in order to support proper POSIX semantics; it just re-opens them
whenever clients access the files.) As you are grepping the file, this
cannot be the case here.

Stale handles may also result if the NFS server is rebooted and
something goes wrong. Normally, the server keeps file handles across
reboots, but there are many reports that this does not work for some
users. Maybe these NFS servers are just buggy. (For example, some OS do
not guarantee stable device numbers, so every time the system is booting
the disks get new numbers, and all file handles become stale.)

You should also ensure that you are hard-mounting (option "o=hard" in
the mount command). Use NFS version 3 if possible.

In general, I would advise not to use NFS for long-running processes.
Write the file to /var/tmp and move it to its final location when it is
fully written.

Gerd

> ------------------------------------------------------------
> # LSBATCH: User input
> qtb -par Exp107.txt > Exp107.txt.log -codes
> gallager_10b_1023l_1048576w.txt
> ------------------------------------------------------------
> 
> Exited with exit code 2.
> 
> Resource usage summary:
> 
>    CPU time   : 163606.88 sec.
>    Max Memory :      3014 MB
>    Max Swap   :      3044 MB
> 
>    Max Processes  :         3
> 
> The output (if any) follows:
> 
> Fatal error: exception Sys_error("Stale NFS file handle")
> 
> 
> 
> 
> ------------------------------------------------------------------------
> Dr. Ir. Andries P. Hekstra
> Philips Research 
> High Tech Campus 27  (WL-1-4.15)
> 5656 AG Eindhoven
> Tel./Fax/Secr. +31 40 27 42048/42566/44051 
>   *  Good open source break software for computer users :
> http://www.workrave.org   
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------