Mantis Bugtracker
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0006830 [Squeak] Kernel major always 12-28-07 21:38 08-14-12 00:30
Reporter orgsow View Status public  
Assigned To leves
Priority normal Resolution reopened  
Status feedback   Product Version 3.10
Summary 0006830: A mutex can wind up with a semaphore with more than 1 excessSignal
Description I am new to smalltalk and was playing around with processes to see how they work and I've encountered something that should never happen.

It may be related to this bug: http://bugs.squeak.org/view.php?id=6576 [^] since I am using Delay.

The semaphore in a mutex should never go above 1 excess signal.

This script here will cause that to happen if you put it into a Transcript window and "do it":

START

|p keepGoing i t resume suspend stop semcounts m |
m := Mutex new. p := Processor activeProcess.keepGoing := true.i := 0.t := Transcript.m critical: [t cr;cr;cr;cr;cr.].
m critical: [ t cr; show: 'init: ', Processor activeProcess oopString; cr].
semcounts := ''.
semcounts := 'init:', m excessSignals asString, ' ', semcounts.

resume := [
    m critical: [ t cr; show: 'resume: ', Processor activeProcess oopString; cr].
    semcounts := 'resume1:', m excessSignals asString, ' ', semcounts.
    (Delay forSeconds: 20) wait.
    m critical: [ t cr; show: 'resuming'; cr.
        p resume].
    semcounts := 'resume2:', m excessSignals asString, ' ', semcounts.
].
suspend := [
    m critical: [ t cr; show: 'pause: ', Processor activeProcess oopString; cr].
    (Delay forSeconds: 10) wait.
    m critical: [ t cr; show: 'pausing'; cr.
        p suspend].
    semcounts := 'suspend:', m excessSignals asString, ' ', semcounts.
].
stop := [
    m critical: [ t cr; show: 'stop: ', Processor activeProcess oopString; cr].
    (Delay forSeconds: 30) wait.
    m critical: [ t cr; show: 'stopping'; cr.
        keepGoing := false].
    semcounts := 'stop:', m excessSignals asString, ' ', semcounts.
].

resume fork.
suspend fork.
stop fork.

m critical: [ t cr; show: 'end: ', Processor activeProcess oopString; cr].
semcounts := 'end:', m excessSignals asString, ' ', semcounts.

[keepGoing] whileTrue: [ m critical: [t show: i; space].
     "Processor yield."
     i := i + 1].
semcounts := 'beforedone:', m excessSignals asString, ' ', semcounts.

(Delay forSeconds: 10) wait.

m critical: [ t cr; show: 'done.';cr.].
semcounts := 'done:', m excessSignals asString, ' ', semcounts.

Transcript show: semcounts asString.


END

The output of the last line is: done:2 beforedone:2 stop:1 resume2:1 suspend:1 resume1:0 end:1 init:1

I have ran a version of this that stores the mutex in the Smalltalk dictionary for reuse and if it's reused it hangs the system. Infact, that's why I originally started trying to figure out what was going on because it was hanging on the second run every time.

Also, notice the commented out "Processor yield." If you uncomment that then the program will function as expected (ie, excessSignals will not go over 1.) That may help diagnose the problem.

This happens in 3.9 and 3.10
Additional Information I added 2 methods to the system to expose excessSignals. You need to fileIn the attatched change set to get them, or you can add them manually from here:


!Mutex methodsFor: '*miles-debug' stamp: 'mbg 12/28/2007 13:19'!
excessSignals
    ^semaphore excessSignals.! !


!Semaphore methodsFor: '*miles-debug' stamp: 'mbg 12/28/2007 13:19'!
excessSignals
    ^excessSignals.! !
Attached Files  miles-debug.1.cs [^] (714 bytes) 12-28-07 21:38
 mutex_bug.st [^] (1,707 bytes) 12-28-07 21:42

- Relationships

- Notes
(0014215 - 696 - 774 - 774 - 774 - 774 - 774)
andreas
02-06-12 10:35
edited on: 02-06-12 10:36

Just stumbled over this. The example is a bit obscure but the problem is an old one, namely that suspend and resume of processes waiting on semaphores doesn't work properly. To wit:

| sema process |
sema := Semaphore new.
process := [sema critical:[]] fork.
(Delay forSeconds: 1) wait.
process suspend.
process resume.

When the process is suspended, it is taken off the semaphore's list, but when it is resumed it's not put back onto it (which would be very difficult to do) but the code in Semaphore critical will signal the semaphore regardless.

Basically, external process manipulation is always tricky and should be done with great care, or better, left only for the debugger.

 
(0014234 - 38 - 48 - 48 - 48 - 48 - 48)
laza
07-14-12 11:12

Basically I read this as "not fixable"
 
(0014247 - 576 - 576 - 576 - 576 - 576 - 576)
leves
08-14-12 00:30

I have an idea how to fix it, I wrote a mail about it once, but can't find it now. The idea is to add another variable to Process in order to hold a reference to the Semaphore it was waiting for while it's suspended. When the process is resumed, the process can #wait again for the same Semaphore. I started hacking the VM and got a partially working solution (the suspend/resume mechanism is quite spread out across many methods), but the code is probably gone due to a hard disk failure. It might still be available in the backups, but I don't have access to them right now.
 

- Issue History
Date Modified Username Field Change
12-28-07 21:38 orgsow New Issue
12-28-07 21:38 orgsow File Added: miles-debug.1.cs
12-28-07 21:42 orgsow File Added: mutex_bug.st
12-10-11 03:56 lewis Issue Monitored: lewis
02-06-12 10:35 andreas Note Added: 0014215
02-06-12 10:36 andreas Note Edited: 0014215
07-14-12 11:12 laza Status new => resolved
07-14-12 11:12 laza Resolution open => not fixable
07-14-12 11:12 laza Assigned To  => laza
07-14-12 11:12 laza Note Added: 0014234
07-14-12 11:12 laza Status resolved => closed
08-14-12 00:30 leves Assigned To laza => leves
08-14-12 00:30 leves Status closed => feedback
08-14-12 00:30 leves Resolution not fixable => reopened
08-14-12 00:30 leves Note Added: 0014247


Mantis 1.0.8[^]
Copyright © 2000 - 2007 Mantis Group
59 total queries executed.
37 unique queries executed.
Powered by Mantis Bugtracker