Mantis - Squeak
Viewing Issue Advanced Details
7343 Network minor always 04-28-09 04:22 02-06-11 23:47
andreas  
cdegroot  
normal  
closed 3.10  
fixed  
none    
none trunk  
0007343: Socket clock rollover issues
On servers with enough uptime and load class Socket can cause issues since there are various loops that utilize a deadline which may be out of range for the millisecond clock value and consequently the operation may never complete.
related to 0006857closed cdegroot Socket>>#waitForDataFor:ifClosed:ifTimedOut: large delay hangs process 
 SocketClockRollover-ar.1.cs [^] (3,743 bytes) 04-28-09 04:23
 M7343-Socket-Timeout-Rollover-nice.1.cs [^] (36,017 bytes) 04-28-09 20:37
 M7343-Socket-Timeout-Rollover-nice.2.cs [^] (36,351 bytes) 04-29-09 20:19

Notes
(0013096)
andreas   
04-28-09 04:23   
Attached change set fixes the issues.
(0013097)
nicolas cellier   
04-28-09 06:53   
Yes, amazing to see such code still uncorrected! Is squeak used?
There are other senders of #deadlineSecs: in HTTPSocket waiting for a fix.
(0013098)
andreas   
04-28-09 07:07   
We don't use HTTPSocket so no fixes for that from Qwaq ;-) Also, the problem is more subtle than you may think because in order to trigger it you need to hit a condition that would cause timeout just as the clock is rolling over. Which is not very usual and I'm sure gets overlooked when it happens in practice. I know I've ignored similar cases in the past until today since we just had a server upgrade and I couldn't for the hell of it imagine what in the last server upgrade would cause such problem. Basically I just got lucky to look at it and after discussing with Eliot whether there may be a VM problem involved or an issue with waitTimeoutMSecs: I checked for a clock rollover on the off-chance that this might be the cause of it. Which it was.
(0013099)
nicolas cellier   
04-28-09 20:37   
Change Set: M7343-Socket-Timeout-Rollover-nice
Date: 28 April 2009
Author: nice

Well, I kept the spirit of Andreas' correction but changed some details

1) do not use (deadline - Time millisecondClockValue)
It can last > 0 for ever because deadline is potentially LargeInteger
2) thus avoid 500ms busy loop
3) return true where required for HTTPSocket
4) correct deadlineSecs: and *Until: senders
5) move *Until: in a deprecated category

TODO: use the self deprecated: message or remove these old rollover-bugged selectors
(Or correct them, but it's just boring)
(0013100)
nicolas cellier   
04-29-09 20:20   
M7343-Socket-Timeout-Rollover-nice.1.cs is bogus...
because it did not care of subtle difference between these protocols:

wait*Until: return false in case of connection closed/timeout
wait*For: signal an error

rather use M7343-Socket-Timeout-Rollover-nice.2.cs
(0013102)
nicolas cellier   
04-29-09 20:25   
Andreas explained that the 500ms soft busy loop primary goal is to work around some missing events (it happens that an external socket semaphore is not signalled when it ought to).

See http://lists.squeakfoundation.org/pipermail/squeak-dev/2009-April/135833.html [^]
(0013950)
leves   
11-25-10 23:12   
Fixed in Network-ul.99, except for the busy waiting part, but that's originally a VM issue, isn't it?