Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mysterious Thread.kill #2779

Closed
vicuna opened this issue Jun 7, 2001 · 4 comments
Closed

Mysterious Thread.kill #2779

vicuna opened this issue Jun 7, 2001 · 4 comments
Labels

Comments

@vicuna
Copy link

vicuna commented Jun 7, 2001

Original bug ID: 380
Reporter: administrator
Status: closed
Resolution: not a bug
Priority: normal
Severity: minor
Category: ~DO NOT USE (was: OCaml general)

Bug description

Full_Name: Michael McDougall
Version: 3.01
OS: Linux (Redhat 7.1)
Submission from: halifax.cis.upenn.edu (158.130.66.10)

I've written a client/server system in Ocaml. The
server is multi-threaded. On the server's side, each new TCP connection
between the client and server is handled by separate
thread.

Ideally, the server would be able to run for weeks at a time without
being restarted. Unfortunately it crashes after a day or two under
light load, and it crashes within minutes if the load is heavy.

I can't figure out why it is crashing. When it does crash I get
the following message on stdout:

Fatal error: uncaught exception Failure("Thread.kill: killed thread")

I don't call "Thread.kill" anywhere in the code, so I don't see what
I can do to fix the problem.

All my code is surrounded by "try .. with" blocks (in the main thread
and the child threads) so I don't know why the exception isn't
being caught inside the program. This leads me to suspect that there
is some problem in the threads implementation.

I can't give simple instructions for how to reproduce the bug short
of downloading my Ocaml program and running it. I'll be happy to
supply the code if you want it.

I will appreciate any help. There is a group of us at UPenn who have
invested a lot of time and effort
on this project and this bug is sinking the whole project.

Here's a snippet of the server's code. It shows the main loop
and the creation of child threads.

let main () =
log "QcmdServer starting up";
log "Loading initial state";
begin
try
List.iter initFileInfo file_info_list
with
excep -> fatal_error("Exception occured while getting file information:"
^ (Printexc.to_string excep));
end;
while true do
try
log "Waiting for message";
let new_event = get_next_event() in
match new_event with

    EvConnection(in_chan,out_chan,addr,file_des) ->
  log (Printf.sprintf "Forking thread for  %s" 
	 (sockaddr_to_string addr));

  let t =Thread.create session_handler (in_chan,
					out_chan,
					addr,
					file_des) 

  in ()
  | EvTick -> 
      log "Tick";
      tick_counter := !tick_counter + 1;
      if (!tick_counter >= !broadcast_interval) then
        begin
          tick_counter := 0; 
          log "broadcasting";
      broadcast_updates ();
          
        end
      else ()
with
  Unix.Unix_error(err,s1,s2) ->
begin
      log "Unix error raised during main loop:";
      log ( (Unix.error_message err) ^ "," ^ s1 ^ ","  ^ s2);
end 
| ex ->
begin
      log "Exception raised during main loop";
      log (Printexc.to_string ex);
end

done
;;

log "Starting main";;
Printexc.catch main ();;

I'll be happy to supply more code if you think it will help.

Merci,

Michael McDougall

@vicuna
Copy link
Author

vicuna commented Jun 12, 2001

Comment author: administrator

Ideally, the server would be able to run for weeks at a time without
being restarted. Unfortunately it crashes after a day or two under
light load, and it crashes within minutes if the load is heavy.
I can't figure out why it is crashing. When it does crash I get
the following message on stdout:

Fatal error: uncaught exception Failure("Thread.kill: killed thread")

I don't call "Thread.kill" anywhere in the code, so I don't see what
I can do to fix the problem.

Actually, Thread.exit() is implemented as Thread.kill(Thread.self()).
Still, the current thread should never be in the killed state if it's
running! So, it looks like the thread library got seriously confused.

To say more, I'd really need to be able to reproduce the crash -- but
I know this is very hard with threads... Do you think you could set
up a fake client program that simply bombs the server with requests,
in order to reproduce the problem?

  • Xavier Leroy

@vicuna
Copy link
Author

vicuna commented Jun 12, 2001

Comment author: administrator

Ideally, the server would be able to run for weeks at a time without
being restarted. Unfortunately it crashes after a day or two under
light load, and it crashes within minutes if the load is heavy.
I can't figure out why it is crashing. When it does crash I get
the following message on stdout:

Fatal error: uncaught exception Failure("Thread.kill: killed thread")

I don't call "Thread.kill" anywhere in the code, so I don't see what
I can do to fix the problem.

Actually, Thread.exit() is implemented as Thread.kill(Thread.self()).
Still, the current thread should never be in the killed state if it's
running! So, it looks like the thread library got seriously confused.

To say more, I'd really need to be able to reproduce the crash -- but
I know this is very hard with threads... Do you think you could set
up a fake client program that simply bombs the server with requests,
in order to reproduce the problem?

  • Xavier Leroy

I've put together a tgz with instructions and a Makefile that
should allow you to reproduce the crash. The tgz is about 100k.
What's the best way of getting it to you? Attachment? Stick
it on a web site?

Michael

@vicuna
Copy link
Author

vicuna commented Jun 18, 2001

Comment author: administrator

Thanks for the nicely packaged test case! It compiled and ran out of
the box, but after 3 days running non-stop, it still did not crash.
The server runs on a Mandrake 7.2 machine and the client on a RedHat
6.2 machine, both pretty standard.

There are some "Bad file descriptor,close" errors on each connection
that seem harmless and can be fixed by removing the "Unix.close" in
close_out par_out_chan;
close_in par_in_chan;
Unix.close par_in_fd)

  • Xavier Leroy

@vicuna
Copy link
Author

vicuna commented Jul 2, 2001

Comment author: administrator

Cannot reproduce. -XL, 2001-07-02

@vicuna vicuna closed this as completed Jul 2, 2001
@vicuna vicuna added the bug label Mar 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant