Show
Ignore:
Timestamp:
07/13/09 18:05:57 (5 years ago)
Author:
robert
Message:

From David Fries, "Here is a fix for a deadlock seen under Windows using OpenThreads?
Barrier operations. The error is with atomic operations in the
win32 condition implementation. The attached sample program will
reliably trigger with as few as three threads and a dual core system,
though sometimes it will take 65,000 iterations.

2.8.1 was the base for these changes

Win32ConditionPrivateData.h
Win32ConditionPrivateData::wait does two operations to decrement
waiters_ then read, when InterlockedDecrement? decrements and returns
the value in one operation. The two operations allows another thread
to also decrement with both getting 0 for an answer.

Win32ConditionPrivateData::broadcast is using waiters_ directly
instead of using the w value read earlier, if it was safe to use
waiters_ directly there would be no need for InterlockedGet? or w.

overview of deadlock in barrier with three threads
one thread in broadcast, 2 threads in wait,
release semaphore 2, waits on waiters_done_
both threads wake, decrement waiters_, get 0 for w,

<logic error here>

one calls set waiters_done_,
broadcast thread comes out of waiters_done_,
other thread calls waiters_done_, (which leaves waiters_done_ in the
signaled state)

<sets the trap>

broadcast thread returns releases mutex, other threads get
mutex and also return,
next barrier, first two threads enter wait, one goes to broadcast, release
semaphore 2, skips waiters_done_ as it had been released last time
returns, processes, enters the barrier for the next barrier operation
and waits,
three threads are now in wait, two have the previous barrier phase,
one the current phase, there's one count left in the semaphore which a
thread gets, returns, enters the barrier as a waiter, sleeps, and the
deadlock is completed"

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • OpenSceneGraph/trunk/src/OpenThreads/win32/Win32ConditionPrivateData.h

    r9958 r10457  
    6969        { 
    7070            // Wake up all the waiters. 
    71             ReleaseSemaphore(sema_.get(),waiters_,NULL); 
     71            ReleaseSemaphore(sema_.get(), w, NULL); 
    7272 
    7373            cooperativeWait(waiters_done_.get(), INFINITE); 
     
    113113        catch(...){ 
    114114            // thread is canceled in cooperative wait , do cleanup 
    115             InterlockedDecrement(&waiters_); 
    116             long w = InterlockedGet(&waiters_); 
     115            long w = InterlockedDecrement(&waiters_); 
    117116            int last_waiter = was_broadcast_ && w == 0; 
    118117 
     
    124123         
    125124        // We're ready to return, so there's one less waiter. 
    126         InterlockedDecrement(&waiters_); 
    127         long w = InterlockedGet(&waiters_); 
     125        long w = InterlockedDecrement(&waiters_); 
    128126        int last_waiter = was_broadcast_ && w == 0; 
    129127