Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007669OCamlthreadspublic2017-11-09 15:102017-11-24 00:34
Reporterjacques-henri.jourdan 
Assigned To 
PrioritynormalSeverityminorReproducibilityalways
StatusacknowledgedResolutionopen 
Platformx86-64OSLinuxOS Version
Product Version 
Target VersionFixed in Version 
Summary0007669: Thread preemption
DescriptionWhen using the systhread library with at least one computation intensive thread, the scheduler has very bad behavior. Preemption happens only rarely. This is, for example, demonstrated by the following program when run in native mode :

  let rec generate_list n =
    let rec aux acc = function
      | 0 -> acc
      | n -> aux (float n :: acc) (n-1)
    in
    aux [] n

  let long_list = generate_list 1000000
  let short_list = generate_list 100

  let rec slow_loop () =
    Printf.printf "SLOW %d\n%!" (List.length (List.rev_map sin long_list));
    slow_loop ()

  let rec fast_loop () =
    Printf.printf "FAST %d\n%!" (List.length (List.rev_map sin short_list));
    fast_loop ()

  let () =
    ignore (Thread.create slow_loop ());
    fast_loop ()

In the output, there are roughly as many "FAST" lines as "SLOW" lines, meaning that preemption only happens when printing to standard output. I would expect to see much more "FAST" lines than "SLOW" lines (i.e., the slow computation is preempted by the fast one even if the slow one does not interact).

Note that both computation do allocate memory through the GC, so the signals used for preemption can indeed be handled. This can be seen by adding the following lines at the beginning of the previous code, which gets the expected behavior:

  let preempt signal = Thread.delay 1e-6
  let _ =
    Sys.set_signal Sys.sigvtalrm (Sys.Signal_handle preempt);
    ignore (Unix.setitimer Unix.ITIMER_VIRTUAL
               { Unix.it_interval = 1e-2; Unix.it_value = 1e-2 })

I.e., we periodically force preemption by suspending the currently executing thread for a very short time.


My analysis of the problem is that there are two flaws in the implementation of the systhread library:

1- The "tick thread" regularly sends a signal so that yield() is called regularly. But on linux, the only thing yield() does is releasing and acquiring back the master lock. Since scheld_yield() apparently doesn't do want we want here, one solution could be to use usleep(1) instead. My experiments seem to indicate that the loss in performance is not measurable.
2- Actually, 1- would be fine if the implementation of the masterlock would garantee fairness. But it doesn't. Another soution would therfore to use a fair implementation for the master lock (such as a ticket lock). This should not make the implementation much more complicated, given that the master lock is already implemented "by hand".
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0018644)
shinwell (developer)
2017-11-09 15:43

Jane Street did some experiments with using [nanosleep] (similar to your point 1) and as far as I remember, the conclusion was that we would be keen for the runtime to provide an option to use that, rather than the current [yield].
(0018648)
xleroy (administrator)
2017-11-12 17:37

The fairness issue you mention is especially visible under Linux, because the Linux scheduler favors throughput over fairness, and because sched_yield() was made unusable in Linux 2.6. Windows is more fair as far as I can remember.

I'm open to using nanosleep(1 ns) instead of sched_yield() under Linux, provided someone conducts serious experiments to measure the overhead, if any.

But it's unclear to me why we would want fairness in thread scheduling and be willing to invest time and possibly degrade performance to get it. What is wrong with the behavior reported in this MPR?
(0018665)
jacques-henri.jourdan (manager)
2017-11-16 14:03

> But it's unclear to me why we would want fairness in thread scheduling and be willing to invest time and possibly degrade performance to get it. What is wrong with the behavior reported in this MPR?

To me, one of the typical uses of threads apart from performance is to be able to have a worker thread and an UI thread. This is made impossible with this bug.
(0018685)
seliopou (reporter)
2017-11-22 23:24
edited on: 2017-11-22 23:29

There are a few different schemes that we considered and experimented with. The two that I have numbers for are:

  1. dropping the `#ifndef` in `yield` to re-instate the `sched_yield` call; and
  2. using `nanosleep(1ns)` instead of `sched_yield`.

In addition to these modifications, we modified the Async scheduler to call `Thread.yield` in its main loop. You can see the location of this change here, though that location is currently occupied by a nanosleep that can be enabled and disabled through a configuration flag (disabled by default):

  https://github.com/janestreet/async_unix/blob/19e9656db3797c6df73c951208174bd9b6fa5c9c/src/raw_scheduler.ml#L721-L723 [^]

You can find the experimental setup for the tests we ran in this gist:

  https://gist.github.com/seliopou/2fb86817ec7dae2c61ff02b49b274031 [^]

Briefly, we have an Async-based client and server that support two commands: `get-updates` and `ping`. A `get-updates` request will cause the server to continuously stream bytes to the client until the client closes the connection. A `ping` request on the other hand will cause the server to send a short response and close the connection.

What we're testing, then, is the latency of pings to a server that is servicing a `get-updates` request. We'd expect a fair system to result in low ping times, on the order tens of milliseconds on average.

You can see the results in the gist linked above (units in seconds, and include process startup time for the client, so remove 50ms to approximate the real latency). Unfortunately, the tabular data is not rendered properly in Mantis.

As you can see, the baseline had a good median, bad average and tail, and a concordant high variance. `sched_yield` had worse medians but that won you a better average tail.

And as is clear from the table, `nanosleep` is just an across-the-board improvement. It's a clear winner in terms of thread fairness, though it doesn't really speak to the performance impact on the long-running operation. In addition it certainly has the downside that it makes the current thread unrunnable. How severe a penalty you take by making the current thread unrunnable really dependes on how much work the other threads in your system have to do before releasing the lock again. Ideally, we would just stay runnable but just go to the back of the run queue, which is exactly the behavior that was changed in `sched_yield` so long ago.

It might make sense to somehow expose the number of threads waiting on the lock. That way, decisions on how affect a yield can be left up to the application. In Async we approximate this using the number of that are in use from our thread pool. This quantity conflates threads that may be making progress, threads that are blocked in a system call, and threads that are blocked waiting to acquire the OCaml lock. In the absence of change to `Thread.yield`, it'd be nice to tease out just that latter component so that use cases like ours don't have to over-approximate the condition where yielding is a productive action.

(0018686)
jacques-henri.jourdan (manager)
2017-11-24 00:34

Thanks for these experiments, that indeed confirm what I observed informally.

I think Xavier was rather interested in measurement of overhead in terms of computation time. If a worker thread (i.e., a thread that uses CPU time) gets interrupted 20 times per second and (at least) a context switch occurs each time, this has a cost. If this cost is indeed negligible (which I guess it should be, but I never did serious experiments), then I do not see a reason not doing this change.

I can try to do these experiments soon.

- Issue History
Date Modified Username Field Change
2017-11-09 15:10 jacques-henri.jourdan New Issue
2017-11-09 15:43 shinwell Note Added: 0018644
2017-11-12 17:37 xleroy Note Added: 0018648
2017-11-12 17:37 xleroy Status new => acknowledged
2017-11-16 14:03 jacques-henri.jourdan Note Added: 0018665
2017-11-22 23:24 seliopou Note Added: 0018685
2017-11-22 23:29 seliopou Note Edited: 0018685 View Revisions
2017-11-24 00:34 jacques-henri.jourdan Note Added: 0018686


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker