Anonymous | Login | 01-20-2021 11:02 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ View Advanced ] [ Issue History ] [ Print ] | ||||||||
ID | Category | Severity | Reproducibility | Date Submitted | Last Update | ||||
0006576 | [Squeak] Kernel | major | always | 07-24-07 07:28 | 11-25-08 23:12 | ||||
Reporter | andreas | View Status | public | ||||||
Assigned To | |||||||||
Priority | normal | Resolution | fixed | ||||||
Status | closed | Product Version | |||||||
Summary | 0006576: Delay is not thread-safe | ||||||||
Description | Delay is not thread-safe since currently the calling process updates the internal Delay structure itself. If the process gets terminated while doing this (e.g., adding/removing from SuspendedDelays) the whole system is left in an inconsistent state and breaks. Attached changes fix the problem. | ||||||||
Additional Information | It is *very* hard to recreate this problem under controlled circumstances which is the reason why there are no tests accompanying the changes. I've not been able to create the problem on a single machine but we have seen it for months on our servers and with these changes the problem simply went away. | ||||||||
Attached Files |
![]() ![]() ![]() |
||||||||
|
![]() |
||||||
|
![]() |
|
(0010928 - 837 - 895 - 895 - 895 - 895 - 895) andreas 07-24-07 07:29 |
"Change Set: SafeDelay Date: 23 July 2007 Author: Andreas Raab This change set fixes a set of severe problems with concurrent use of Delay. Previously, many of the delay-internal structures were modified by the calling process which made it susceptible to being terminated in the middle of manipulating these structures and leave Delay (and consequently the entire system) in an inconsistent state. This change set fixes this problem by moving *all* manipulation of Delay's internal structures out of the calling process. As a side-effect it also removes the requirement of Delays being limited to SmallInteger range; the new code has no limitation on the duration of a delay. No tests are provided since outside of true asynchronous environments (networks) it is basically impossible to recreate the situation reliably." |
(0010931 - 61 - 61 - 61 - 61 - 61 - 61) andreas 07-24-07 16:05 |
Updated the change set to one including the class definition. |
(0010936 - 162 - 218 - 218 - 218 - 218 - 218) Keith_Hodges 07-25-07 17:17 edited on: 07-25-07 17:19 |
To install this fix use: Installer mantis ensureFix: '5676 make Delay thread safe'. "fix begin" Installer mantis bug: 6576 fix: 'SafeDelay-2.cs'. "fix end" |
(0011083 - 471 - 535 - 535 - 535 - 535 - 535) andreas 08-31-07 03:38 |
I've added a set of tweaks that should go on top of the previous changes: "Change Set: DelayTweaks-ar Date: 30 August 2007 Author: Andreas Raab Some more delay tweaks: - Ensure proper unscheduling of delays which are terminated in the wait - Only unschedule delays that are beingWaitedOn - Bullet-proof handleTimerEvent so that it only polls the clock once (each poll is a potential cause for msecs clock overflow) and cope with an obscure race condition" |
(0011091 - 80 - 86 - 86 - 86 - 86 - 86) edgardec 09-04-07 11:13 |
This now is 7146DelayTweaks-ar.cs and was in updates for 3.10 Thanks Andreas ! |
(0011095 - 227 - 251 - 251 - 251 - 251 - 251) edgardec 09-07-07 09:50 |
Reminder sent to: andreas Andreas: I sorry to tell i should remove the .cs from updates stream I found I can't save the image and all freezes. It's happening on PPC Mac OS X with John VM., If you have new files for me to test, I put again when works |
(0011096 - 218 - 218 - 218 - 218 - 218 - 218) andreas 09-07-07 21:34 |
Without some more detail there is nothing I can do. The only thing I can say is that we run this code in our production servers 24/7 and that we run in our production clients and that we run it under load and it works. |
(0011100 - 234 - 252 - 252 - 252 - 252 - 252) edgardec 09-08-07 10:12 |
Reminder sent to: andreas I take the 7143 image, filed in the two changes sets and again I can't save the image. This time I try on my old Windows 98 Pc whit SqueakVM-Win32-3.10.4. I appreciate any leads why you have many prodution things and I can't save. |
(0011104 - 341 - 341 - 341 - 341 - 341 - 341) andreas 09-11-07 02:08 |
I finally got around to look at this problem and it turns out it's a weird problem with (as one would expect in Delay) all sorts of tricky implications. It worked for us only because all of our images are contaminated by artifacts of earlier delay problems. We are currently testing a fix for all the problems; I'll let you know how it goes. |
(0011218 - 194 - 200 - 326 - 326 - 326 - 326) GazzaGuru 09-28-07 15:14 |
I've also noticed that these changes bring back a resumption time bug. Try interrupting [true] whileTrue. Then, after saving the image, try again... C.f. http://bugs.squeak.org/view.php?id=381 [^] |
(0011230 - 458 - 458 - 458 - 458 - 458 - 458) andreas 10-01-07 08:30 |
Just as an FYI, this still isn't resolved completely. The problem reported by Edgar can be fixed simply by changing system startup/shutdown (turns out that delay is shut down too early which causes a problem when Sensor is shut down) but it seems as if there is still another problem that leads to Delay's AccessProtect mutex being signaled multiple times. I still haven't tracked this one down - I have a suspicion and added a guard but it hasn't fired yet. |
(0011235 - 207 - 249 - 249 - 249 - 249 - 249) edgardec 10-05-07 10:07 |
Reminder sent to: andreas Could send or put here the right startup order ? I try to change doing Smalltalk removeFromStartUpList: Delay. Smalltalk addToStartUpList: Delay And with Sensor, but don't work. Too deep for me :-) |
(0011236 - 1495 - 1929 - 1929 - 1929 - 1929 - 1929) GazzaGuru 10-05-07 21:48 |
Just a test for "allegedly" correct operation.. s _ Semaphore forMutualExclusion. i _ 1. "Loop while there's not too many signals on the semaphore (should be 0 or 1)" [[(s instVarNamed: #excessSignals) < 2] whileTrue: [ "Fork two processes to do things inside the semaphore" p _ [[true] whileTrue: [s critical: [(Delay forMilliseconds: 10) wait]]] forkNamed: 'p', i printString. q _ [[true] whileTrue: [s critical: [(Delay forMilliseconds: 10) wait]]] forkNamed: 'q', i printString. "Increment the counter just to make it easy to ID the processes" i _ i + 1. "Delay to give the processes a chance to resume and potentially get into the critical on the Semaphore" (Delay forMilliseconds: 500 atRandom) wait. p terminate. (Delay forMilliseconds: 500 atRandom) wait. q terminate. "After terminating the two processes, the excess signals should be 1 on the Semaphore because nothing is in critical any more, assuming the processes terminated properly and the unwind happened as it should, but... " "In our images with Andreas' recent Delay and Semaphore fixes, we always get one of these two error conditions (although normally too many signals). This would cause deadlock in 'real world' situations" (s instVarNamed: #excessSignals) = 0 ifTrue: [ WorldState addDeferredUIMessage: [self inform: 'Too few signals after ', i printString, ' loops']. self halt]]. self inform: 'Too many signals after ', i printString, ' loops'] fork |
(0011238 - 98 - 110 - 110 - 229 - 229 - 229) andreas 10-06-07 01:22 edited on: 10-06-07 01:22 |
DelayStartup-ar should bring order in the Delay startup/shutdown. Also note the changes to 0006588 |
(0011240 - 133 - 145 - 145 - 145 - 145 - 145) edgardec 10-06-07 10:05 |
Thanks Andreas ! I put again 7146DelayTweaks-ar.cs and was in updates for 3.10. Now I could save the image and continue the updates |
(0011256 - 384 - 417 - 417 - 417 - 417 - 417) edgardec 10-08-07 12:16 |
Seems this is causing the "go limbo" on both Mac and Windows XP when I try to run the 'all test" of SUnit. At this moment I must delete from updates again until was solved. I also try the related 6588, but still I could complete the all test run. People wishing this must try the Installer way and not our updates way until Ralph and Andreas agree how to deal this important issue |
(0012804 - 55 - 55 - 55 - 55 - 55 - 55) KenCausey 11-25-08 23:12 |
Harvested as update 7144 and released with Squeak 3.10. |
Mantis 1.0.8[^]
Copyright © 2000 - 2007 Mantis Group
133 total queries executed. 71 unique queries executed. |