Mantis Bugtracker
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0005703 [Squeak] Collections minor always 01-05-07 10:39 01-06-07 05:43
Reporter wiz View Status public  
Assigned To
Priority normal Resolution open  
Status new   Product Version 3.9
Summary 0005703: The hash for class Interval seems pretty weak.
Description Read the code:
"Interval>>hash
    "Hash is reimplemented because = is implemented."

    ^(((start hash bitShift: 2)
        bitOr: stop hash)
        bitShift: 1)
        bitOr: self size"

That seemed like it gave a mighty good chance for hashes to clash.
Especially the bitOr instead of bitXor.
Additional Information So I tried to satisfy my curiosity by seeing how many matched more than one interval.
Copy the uploaded file into a workspace and see for yourself.

I was surprised with how small a sample needed to be taken in order to show the weakness.

Less than 10% of the intervals had unique hashes. And I was seeing a lot of multiple clashes.


I am not an expert on hashes. So this is here as a heads up and an alert.

If you know enough to evaluate the importance of these results say so.
(in other words if I'm all wet and the strength of this hash doesn't make a difference, I'd appreciate being clued in.)

Yours in curiosity and service, --Jerome Peace
Attached Files  Interval-hash-probe.text [^] (1,472 bytes) 01-05-07 10:39

- Relationships
child of 0007002new  The Mother of all issues involving interval 

- Notes
(0008838 - 88 - 112 - 112 - 272 - 272 - 272)
nicolas cellier
01-06-07 02:33
edited on: 04-30-07 19:07

It is weak and non compliant with = (when compared to an array)
see 0003380
also 0003488

 
(0008840 - 390 - 438 - 438 - 438 - 438 - 438)
wiz
01-06-07 03:49

Thanks Nicolas,

The information was informative.

Its not clear from the reports if things were resolved or not. I suspect not.

I think even the simple hash can be improved if the number of elements becomes a strong focus for the hash.
Things with a different number of elements will not be equivalent.
Then check the first and last elements and then what ever else is necessary.
 
(0008843 - 944 - 1028 - 1154 - 1154 - 1154 - 1154)
nicolas cellier
01-06-07 05:43

Yes proposed patches in http://bugs.impara.de/view.php?id=3488 [^] solve the problems.

1) first last and size are used in Interval comparison
(not increment, in order to avoid floating point bug)

2) Interval uses super hash in order to have same hash code when equal to an array.

3) Super hash is redefined for efficiency in order to not hash each and every element.
(Time millisecondsToRun: [(1 to 1000000) hash])

There is just a discussion about how many elements to be used in super hash.
5 first ? more ? (at beginning and end)
There is no universal efficient hash code... You can always build a counter example. One has to decide for a magic number compromise.

Concerning interval, of course using 3 elements or two elements plus size would be enough to obtain a correct hash distribution... Just replace bitOr: with bitXOr: in current implementation... Unfortunately we must use super hash because of equality with array...
 

- Issue History
Date Modified Username Field Change
01-05-07 10:39 wiz New Issue
01-05-07 10:39 wiz File Added: Interval-hash-probe.text
01-06-07 02:33 nicolas cellier Note Added: 0008838
01-06-07 03:49 wiz Note Added: 0008840
01-06-07 05:43 nicolas cellier Note Added: 0008843
04-30-07 19:07 nicolas cellier Note Edited: 0008838
08-21-10 13:10 nicolas cellier Relationship added child of 0007002


Mantis 1.0.8[^]
Copyright © 2000 - 2007 Mantis Group
52 total queries executed.
35 unique queries executed.
Powered by Mantis Bugtracker