Mantis Bugtracker
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0006721 [Squeak] Collections major sometimes 10-11-07 05:47 11-09-10 05:38
Reporter wiz View Status public  
Assigned To
Priority high Resolution open  
Status feedback   Product Version 3.10
Summary 0006721: In 7149 Identity dictionarys sometimes can't find their keys
Description
This is a circumstantial report.

I have a project which when I save it to disk and then read it back loses track of a crucial morph property which causes a formerly working handle to fail.

I have track the problem down to the fact that when the identity dictionary scans the dictionary array for the association it finds a nil first. see the uploaded debugger gif.


Additional Information I have no idea how to determine what is different between before and after the save.

Either the array is different from what it was before or that identity hash was different.
Clearly with the array as it is the hash \\ finish needs to = 4 not 8 inorder to find the key.

Other data points. My computer going into sleep mode and then waking up can sometimes cause the problem.

I changed the name of the key to something I did not use as a temp var. No change in the bug.


So..
What happened???

Yours in curiosity and service, --Jerome Peace
Attached Files  Error-key not found.gif [^] (21,944 bytes) 10-11-07 05:47
 IntegrityTests-wiz-M6721.3.cs [^] (3,039 bytes) 10-13-07 04:52
 IntegrityTests-wiz-M6721.001.pr [^] (52,703 bytes) 10-13-07 04:54
 nilKeyCleanup-wiz-M6721.2.cs [^] (923 bytes) 10-13-07 04:55
 ImageSegReload2.jpeg [^] (100,109 bytes) 10-14-07 04:27
 IntegrityTests-wiz-M6721.7.cs [^] (4,033 bytes) 02-28-10 06:43
 nilKeyFix2-wiz-M6730.4.cs [^] (2,943 bytes) 02-28-10 06:44

- Relationships

SYSTEM WARNING: Creating default object from empty value

SYSTEM WARNING: Creating default object from empty value

has duplicate 0006839feedback wiz #comeFullyUpOnReload: not sent to MorphExtension's otherProperties (anIdentityDictionary) 
related to 0006730new  GeeMailMorph initialzation causes nil to be used as a dicitonary key 
related to 0002788closed  Bug: Use of == for arithmetic equality 

- Notes
(0011297 - 264 - 300 - 300 - 300 - 300 - 300)
wiz
10-11-07 06:41

More data:

I played some more. The dictionary size can change on read back.

I didn't actually see the bug this time but the dictionary array on read in was size 5 and the original was size 8 . The tally was 3 in this case.

What determines the readin size?
 
(0011300 - 790 - 838 - 838 - 838 - 838 - 838)
RalphJohnson
10-11-07 10:11

Could you give us some code that recreates this problem? The project, and perhaps an image that displays the problem?

What version of the image are you using?

Maybe somebody has seen this before. Otherwise, there isn't enough information in the bug report to tell.

Dictionaries can certainly change their basic size when you read them back in. Look at the class method sizeFor: in Set. If the size of the old dictionary is n, the basic size of the new one is (n+1)*4//3. (3+1)*4//3 is 5. But this should not mess up the hash table, because every element should be reinserted.

For a regular dictionary, sometimes the hash changes after you insert an element. But an identity dictionary uses basicHash, which should never change. So, I can't imagine how it is happening.
 
(0011301 - 1512 - 2500 - 2500 - 2500 - 2500 - 2500)
johnmci
10-11-07 17:41

Well you can choose in forwardBecome to copy or not copy copy the identity hash so you can violate the premise that an object and it's identity hash can't change.

elementsForwardIdentityTo: otherArray copyHash: copyHash
    "This primitive performs a bulk mutation, causing all pointers to the elements of this array to be replaced by pointers to the corresponding elements of otherArray. The identityHashes remain with the pointers rather than with the objects so that the objects in this array should still be properly indexed in any existing hashed structures after the mutation."
    <primitive: 249>
    self primitiveFailed

via
becomeForward: otherObject copyHash: copyHash
    "Primitive. All variables in the entire system that used to point to the receiver now point to the argument.
    If copyHash is true, the argument's identity hash bits will be set to those of the receiver.
    Fails if either argument is a SmallInteger."

    (Array with: self)
        elementsForwardIdentityTo:
            (Array with: otherObject)
                copyHash: copyHash

and

testBecomeForwardDontCopyIdentityHash
    "Check that
        1. the argument to becomeForward: is NOT modified to have the receiver's identity hash.
        2. the receiver's identity hash is unchanged."

     | a b hb |

    a := 'ab' copy.
    b := 'cd' copy.
    hb := b identityHash.

    a becomeForward: b copyHash: false.

    self
        assert: a identityHash = hb;
        assert: b identityHash = hb.

Also then try copyHash: true.

But I don't see a test for that????
 
(0011304 - 1541 - 1713 - 1823 - 1823 - 1823 - 1823)
wiz
10-12-07 04:30

Hi Ralph

The 7149 in the report refered to squeak 3.10 update 7149. It was a fresh image before the project is loaded.

rampHandles-wiz.056.pr on Bob's superswiki is identical to the project I was using when I first ran into the bug.

It can be gotten from http://209.143.91.36/super/730 [^] (Taming Fills).

I didn't post it originally because I was hoping to isolate the problem in a simpler form. That didn't seem to work.

This is an intermittent bug. Sometime reading it back in will leave the handles working. But often they would fail.

The handle that noticibly breaks is the brown "reference handle". It should move the origin of the fill.

I used the existence of a property which in the project is called #refHandle to distinguish a host handle from its submorphs which are also handles.
I later changed the name to #mainHandle to insure the name was unique.

The other problem was that Morph #hasProperty will respond with false when the error is encountered. This caused the true refHandle to defer to its owner (world) which didn't know what to do with what came next in the code.

So if the bug really exists. And if #hasProperty is sent at a point when a false negative can cause mischief , the mischief will be caused.

So #hasProperty has a bug also in that it doesn't use the dictionary in a way that reports the error.

Dictionaries that sometimes loose track of their keys are a very uncomfortable thing to have lurking around.

I intend to keep looking. Thanks for the help and support.

--Jer
 
(0011306 - 530 - 620 - 620 - 620 - 620 - 620)
wiz
10-13-07 01:41

Ah, squeak bugs, I bet you can't find just one.

I created an integrity test to check all instances of a dictionary class an have used it to check Indentity dictionaries. I haven't found the problem I was looking for because other problems turned up first. Like a dictionary with nil as a key.

I suspect getting this test to pass will require solving lots of problems.

Then one can try it on other dictionaries.

Cheers.

Yours in service and curiosity, --Jerome Peace




 IntegrityTests-wiz-M6721.1.cs uploaded.
 
(0011307 - 380 - 422 - 422 - 422 - 422 - 422)
wiz
10-13-07 02:51

 IntegrityTests-wiz-M6721.2.cs uploaded.
Added two finer grain tests.

One collects the problems with nil keys. The other trys all the other dictionaries for probems with finding keys.

The news is that in 7158 there are exactly two Id dictionaries with nil keys both involve GeeMailMorph.

The other news is that there are no keys in a fresh 7158 with key lookup problems.
 
(0011308 - 808 - 850 - 850 - 850 - 850 - 850)
wiz
10-13-07 03:01

Other thougths on this bug.

I am running on a OS9.1 running iMac. Which means my vm is stuck at 3.8.9b7 Classic. The identity hash is a primitive which means it involves the vm. So one though is that my problem may not be recreatable on other systems. (I have not checked.)
And may be due to a vm bug that has since been fixed. So if someone else has seen the problem or has been able to load my project w/o running into any problem with the brown handle, please let me know.

Other data and observations.

When loading back the project the identity hash for #refHandle is different from when it was saved. So both the array of the dictionary and the hash key change on reload. I still haven't found the smoking gun to point to what gets the two out of sync. Intermittent bugs are harder to find.
 
(0011309 - 1095 - 1173 - 1173 - 1173 - 1173 - 1173)
wiz
10-13-07 05:14

Ok. Three uploads.

nilKeyCleanup does a quick patch to the initialization code that produces the nil keys and the postscript removes the offending dictionaries and replaces with the new. This is a quick patch for the purpose of making the tests pass.

IntegrityTests-wiz-M6721.3.cs is the changeset that contains the tests.

IntegrityTests-wiz-M6721.001.pr is a project that contains the tests also.
The code is the same but the project also causes the bug to occur.

when the test fail debug #testOkIdicIntegrity look at rejects to see which keys are unfindable and inspect the eachDictionary .

Because I can't tell if the problem is only on my system if you can load the project and not get the problem please report that. If you are willing please check several times since the bug is intermittent. Report your results in either case. Worthy bugs prove their worth by fighting back.

In 7158 I got this project to fail twice in two loads. The first time I had on unfindable key the second time two. So if this problem exists on other systems it should not be to hard to find.
 
(0011313 - 619 - 685 - 685 - 685 - 685 - 685)
wiz
10-14-07 02:29

Ho,

I finally read Johns comment slowly enough to let it penetrate.

#becomeForward: otherObject copyHash: copyHash is somewhat of a red-herring. No one calls the message except the test.

The comment was key to finding something important.

Apparently on image reload someone must be doing a becomeForward on the keys. This is (sometimes) making the hashes invalid for the loaded dictionaries. I tried rehashing the offending dictionary after loading the project and the lost key bug now worked.

So my main suspicion falls on the loading programs not doing enough.
Or not doing enough at the right time.
 
(0011315 - 100 - 146 - 146 - 146 - 146 - 146)
wiz
10-14-07 04:28

Ha,

Found the smoking gun.

see: ImageSegReload2.jpeg

This problem is due to == to = "fixes"
 
(0011318 - 198 - 210 - 210 - 210 - 210 - 210)
RalphJohnson
10-15-07 13:41

Very good detective work!

However, it is possible that this code really should be = and not == and the real problem is that hashes are changing. I don't know the answer, but I will look into it.
 
(0011321 - 367 - 391 - 391 - 391 - 391 - 391)
wiz
10-15-07 17:07

Hi Ralph,

I am pretty convinced that this method is clearly broken. One of the identity comparisons really wanted to be an indentity comparison and it controlled whether a becomeForward was preformed or not.

Also the problem with loading the project does not cause the bug in sq7067 or in sq7137 but it does in sq7143 and sq7158. That is enough to convince me.
 
(0011322 - 1282 - 1406 - 1406 - 1406 - 1406 - 1406)
wiz
10-15-07 17:25
edited on: 10-15-07 18:02

[OT] but relevant.

There are several meta-causes of this problem.

The 3dot10 team advocacy for tests is right on. And should have been insisted on here.

The change touched massive amounts of code and was commited all at once to core and peripheral methods at the same time.
Large changes always portend trouble. When I saw what was being done I cringed without knowing why. Just based on my own past experience.

Now I've seen one effect and I wonder if there are more.

The resources at the disposal of the release team are not sufficient for checking all the code. What can be done to insure that changes commited to the image are not going to cause other obscure bugs?

The other large problem I have with the way the change was committed was that the timestamp was silently left in place. All my instincts as a maintainer say that this is very very wrong. A clear audit trail must be kept and two methods that are not identical must differ in at least their time stamp. Else in the future someone will substitute one method for the other never knowing they fundamentally changed the code.

I encourage you and others to follow the chain of "why's" to the root causes of this bug getting into the image.

Yours in service and curiosity, --Jerome Peace

 
(0013515 - 1142 - 1267 - 1267 - 1383 - 1383 - 1383)
wiz
02-24-10 00:10

Fun squeak 9400 and Fun squeak 9408 both have this problem.

On startup there are instances of IdentityDictionary that are not properly hashed. So the test fails.

In a fresh sq 3791 the no improperly hashed IdentityDictionary. I presume there are some ways of making them because of the Fun Squeak problems. Loading projects is probably a good suspect.

 
I can get the test to pass in 3.10.2 final although I first have to load the 0006730 patch that removes GeeMailMorph initialization from UndefinedObject.

Once I have done that all three dictionary tests run green in 3.10.2 final.


Even in sq 3791 two of the Dictionary Integrety tests fail. The old fix for GeeMail morph does not seem adequate to cure the problem. Ominous.

In the trunk image the GeeMailMorph initialization is absent but there are still some Identity dictionaries lying around with the nil keyed associations. So I can't get two tests to run green. In fact when tested they are flagged with errors.

I am too confused to give a recipe right now.

I want this record thought before I look further.

Yours in curiosity and service, --Jerome Peace
 
(0013518 - 1227 - 1364 - 1364 - 1364 - 1364 - 1364)
wiz
02-28-10 06:59

I have uploaded two files:

IntegrityTests-wiz-M6721.7.cs
nilKeyFix2-wiz-M6730.4.cs

These have been made to work in sq trunk 3971.

The IntegrityTests provide three tests to check that dictionarys can find their keys.

The nilKeyFix is essentially to remove a stray identity dictionary for Gee Mail Morph (at one point it was being inited from the UndefindedObject class. So it still sits around in the image and has a nil key. Which we test for.
The postscript of the fix insures the problem dictionary is cleared out.
 
The test currently only tests identity dictionaries though other dictionary classes could be added.

Edgar in the process of building Fun Squeak was able to create identity dictionaries that fail the integrity test at startup. The dictionaries are not rehashed so a certain number of them will find nils before they find a valid value.

This causes all sorts of transient havoc until something causes the dictionaries to be rehashed or recreated.

Needless to say an image with dictionaries that can't find its keys is very unstable.

In the main image once both changesets have been installed or filed in the 3 tests have run green.

Yours in curiosity and service, --Jerome Peace
 
(0013608 - 899 - 975 - 975 - 975 - 975 - 975)
nicolas cellier
03-30-10 23:15

Levente has extended the tests for various dictionary.
Now the question is what do we do with this bug report ?

ImageSegment are broken and will always be, because they save low level information and try to load it back bit identical. They are not robust to internal structural changes. All we can do is put a workaround here and there, but we won't solve the root of the problem.

So shall we let the bug open forever ?
Personnally, I would tag it : "won't fix".

I have no generic solution.
Some personnal maybe: since late 80s, I always saved my data as Smalltalk scripts made of small chunks and using kind of workspace variables. Maintaining a public API is easier than maintaining internal structure compatibility in the long term.
It was each Behavior responsibility to save itself on a stream in a well behaved fashion.
By all mean, avoid embedding WrongBehavior in your image :)
 
(0013609 - 1234 - 1422 - 1422 - 1422 - 1422 - 1422)
wiz
03-31-10 04:28

Hi Nicolas,

What to do with the bug report.

Closing this report with a will not fix seems to me wrong.

The bugs caused when dictionaries lose their keys will happen again and again until this problem is resolved.

Originally I had a piece of code with a boolean that had to be "true" but was returning "false". That is one of the consequences of dictionaries not coming fully up on reload.

I thought we had fixed the problem before releasing 3.10.2 but now it has reappeared in trunk.

The cure and therefore resolution of the dictionary problem is to find all instance where dictionary images image segments are reloading and make very sure they are rehashed.

The inability of image segments to keep stored bits relevant is a separate bug. A much deeper bug. Right now it almost sounds like a different solution to storing projects would be needed to solve that one.

So What to do with this report?

Suggestions:

1) Make a good faith attempt to rehash dictionaries when they are reloaded then call this problem resolved.

2) Make a second report of the larger problem. Mark this as a duplicate of that.

3) Leave this problem open for now.

I'm in favor of a combination of 1) and 2).

Cheers -Jer
 
(0013610 - 783 - 882 - 882 - 882 - 882 - 882)
nicolas cellier
03-31-10 20:11

1) Maybe there would be a generic fix: perform a postLoad action on every object loaded from an imageSegment.

This would be ^self for Object, self rehash for HashedCollection.

2) I don't see larger report that helpful in mantis, because we end up with plenty of non closable reports.
My policy would rather be:
   1 elementary reproducible problem = 1 report
My rationale:
Small report => Small fix => Small progress
Large report => Lot of work for reading/understanding/reviewing/fixing => no progress

3) My rationale was that ImageSegment will continually bring compatibility issues, because inherently dependent on private implementation details rather than public API. So we can't fix it.
But if you think this is fixable (see 1), it's better to let this one open.
 
(0013613 - 380 - 380 - 380 - 380 - 380 - 380)
leves
04-01-10 14:03

The postLoad action sounds like a good idea for HashedCollections but it probably won't solve all problems. I wonder if we can add version information (the release version of the image and maybe the vm which created the imagesegment) to imagesegments and raise a warning if the information is absent or doesn't match with the current environment during loading. What do you think?
 
(0013623 - 1942 - 2124 - 2124 - 2124 - 2124 - 2124)
wiz
04-04-10 06:38

Hi Nicolas,

The title of this report is that Identity Dicitonarys loose their keys under certain circumstances.

Its an important problem. This report or one like it deserves to remain open until that problem is fixed.

Reloading image segments seems to still be an immediate cause of the problem.

So the scope of this report is
a) Testing for the problem in general.
 We've done some of that with the Dictionary Integrity tests.
b) Insuring that reloading image segments results in dictionaries with integrity.
 Rehashing identity dictionaries on reload is necessary.
 The other solution is to disallow image segments (and therefore projects). That is feasible but a tough sell. Because then you would need someone to reinvent a way to save projects that didn't use image segments.

Once a solution to b is chosen this report could be resolved.

The larger report is connected with abolishing image segments because they are too brittle a means of storage. That is what would deserve its own report.

[OT] Your point two goes continues a meta-discussion we both find interesting about what should go into a mantis report. It is somewhat off the topic of this report proper.
 
I agree with you that a good point at which to open a report is when there is a useful recipe for reproducing an error. Reports without that kind of information are mostly disrespectful of the readers time. An exception would be documenting a larger issue discovered while pursuing a smaller one.

That said my rationale for mantis is that it is a record of problem and analysis and solution. It serves as a searchable repository of valuable information on problems. Open reports are more visible than closed reports. So a previous solution to a problem that has just recurred is more discoverable. So I am much more comfortable with open topics than most people.

That's all I can think to write about that right now.

Cheers --Jer
 
(0013906 - 180 - 180 - 180 - 180 - 180 - 180)
leves
11-07-10 02:09

Jerome, I modified the loading of HashedCollections from ImageSegments and ReferenceStreams in the Trunk. Please check if your project can be saved and reloaded with these changes.
 
(0013921 - 935 - 1029 - 1029 - 1029 - 1029 - 1029)
wiz
11-09-10 05:38

Hi leves,

Thanks for working on the problem.

Too much time and change has past. I don't have a clean system to test or know which of my old projects was causing the problem (probably the fill handles project. I believe Edgar has a project in his fun squeak.) The ability to upload that project to the new regime is questionable and would take a lot of work. Especially from my current level of inactivity squeak-wise.

The dictionary integrity tests are there to detect the problem. The project was just a "bird in a coal mine" and when the problem was present, it usually broke in a pattern I came to recognize.

Then I'd run the test and usually find out I was right.

When you run into unexplained failures in the image. Run the integrity tests. If they fail a problem is still present.

As you can see from the initial report this is a hard one to reproduce reliably.

Yours in curiosity and service, --Jerome Peace
 

- Issue History
Date Modified Username Field Change
10-11-07 05:47 wiz New Issue
10-11-07 05:47 wiz File Added: Error-key not found.gif
10-11-07 06:41 wiz Note Added: 0011297
10-11-07 10:11 RalphJohnson Note Added: 0011300
10-11-07 17:41 johnmci Note Added: 0011301
10-12-07 04:30 wiz Note Added: 0011304
10-13-07 01:36 wiz File Added: IntegrityTests-wiz-M6721.1.cs
10-13-07 01:41 wiz Note Added: 0011306
10-13-07 02:46 wiz File Added: IntegrityTests-wiz-M6721.2.cs
10-13-07 02:51 wiz Note Added: 0011307
10-13-07 03:01 wiz Note Added: 0011308
10-13-07 04:52 wiz File Added: IntegrityTests-wiz-M6721.3.cs
10-13-07 04:52 wiz File Deleted: IntegrityTests-wiz-M6721.1.cs
10-13-07 04:52 wiz File Deleted: IntegrityTests-wiz-M6721.2.cs
10-13-07 04:54 wiz File Added: IntegrityTests-wiz-M6721.001.pr
10-13-07 04:55 wiz File Added: nilKeyCleanup-wiz-M6721.2.cs
10-13-07 05:14 wiz Note Added: 0011309
10-14-07 02:29 wiz Note Added: 0011313
10-14-07 04:24 wiz Relationship added related to 0006730
10-14-07 04:27 wiz File Added: ImageSegReload2.jpeg
10-14-07 04:28 wiz Note Added: 0011315
10-14-07 05:10 wiz Relationship added related to 0002788
10-15-07 13:41 RalphJohnson Note Added: 0011318
10-15-07 17:07 wiz Note Added: 0011321
10-15-07 17:25 wiz Note Added: 0011322
10-15-07 18:02 wiz Note Edited: 0011322
01-09-08 03:58 wiz Relationship added has duplicate 0006839
02-24-10 00:10 wiz Note Added: 0013515
02-28-10 06:43 wiz File Added: IntegrityTests-wiz-M6721.7.cs
02-28-10 06:44 wiz File Added: nilKeyFix2-wiz-M6730.4.cs
02-28-10 06:59 wiz Note Added: 0013518
03-30-10 23:15 nicolas cellier Note Added: 0013608
03-31-10 04:28 wiz Note Added: 0013609
03-31-10 20:11 nicolas cellier Note Added: 0013610
04-01-10 14:03 leves Note Added: 0013613
04-04-10 06:38 wiz Note Added: 0013623
11-07-10 02:09 leves Note Added: 0013906
11-07-10 02:09 leves Status new => feedback
11-09-10 05:38 wiz Note Added: 0013921


Mantis 1.0.8[^]
Copyright © 2000 - 2007 Mantis Group
158 total queries executed.
79 unique queries executed.
Powered by Mantis Bugtracker