|Anonymous | Login||10-19-2021 18:07 UTC|
|Main | My View | View Issues | Change Log | Docs|
|Viewing Issue Simple Details [ Jump to Notes ]||[ View Advanced ] [ Issue History ] [ Print ]|
|ID||Category||Severity||Reproducibility||Date Submitted||Last Update|
|0005703||[Squeak] Collections||minor||always||01-05-07 10:39||01-06-07 05:43|
|Summary||0005703: The hash for class Interval seems pretty weak.|
Read the code:
"Hash is reimplemented because = is implemented."
^(((start hash bitShift: 2)
bitOr: stop hash)
bitOr: self size"
That seemed like it gave a mighty good chance for hashes to clash.
Especially the bitOr instead of bitXor.
So I tried to satisfy my curiosity by seeing how many matched more than one interval.
Copy the uploaded file into a workspace and see for yourself.
I was surprised with how small a sample needed to be taken in order to show the weakness.
Less than 10% of the intervals had unique hashes. And I was seeing a lot of multiple clashes.
I am not an expert on hashes. So this is here as a heads up and an alert.
If you know enough to evaluate the importance of these results say so.
(in other words if I'm all wet and the strength of this hash doesn't make a difference, I'd appreciate being clued in.)
Yours in curiosity and service, --Jerome Peace
|Attached Files||Interval-hash-probe.text [^] (1,472 bytes) 01-05-07 10:39|
(0008838 - 88 - 112 - 112 - 272 - 272 - 272)
edited on: 04-30-07 19:07
It is weak and non compliant with = (when compared to an array)
(0008840 - 390 - 438 - 438 - 438 - 438 - 438)
The information was informative.
Its not clear from the reports if things were resolved or not. I suspect not.
I think even the simple hash can be improved if the number of elements becomes a strong focus for the hash.
Things with a different number of elements will not be equivalent.
Then check the first and last elements and then what ever else is necessary.
(0008843 - 944 - 1028 - 1154 - 1154 - 1154 - 1154)
Yes proposed patches in http://bugs.impara.de/view.php?id=3488 [^] solve the problems.
1) first last and size are used in Interval comparison
(not increment, in order to avoid floating point bug)
2) Interval uses super hash in order to have same hash code when equal to an array.
3) Super hash is redefined for efficiency in order to not hash each and every element.
(Time millisecondsToRun: [(1 to 1000000) hash])
There is just a discussion about how many elements to be used in super hash.
5 first ? more ? (at beginning and end)
There is no universal efficient hash code... You can always build a counter example. One has to decide for a magic number compromise.
Concerning interval, of course using 3 elements or two elements plus size would be enough to obtain a correct hash distribution... Just replace bitOr: with bitXOr: in current implementation... Unfortunately we must use super hash because of equality with array...
|01-05-07 10:39||wiz||New Issue|
|01-05-07 10:39||wiz||File Added: Interval-hash-probe.text|
|01-06-07 02:33||nicolas cellier||Note Added: 0008838|
|01-06-07 03:49||wiz||Note Added: 0008840|
|01-06-07 05:43||nicolas cellier||Note Added: 0008843|
|04-30-07 19:07||nicolas cellier||Note Edited: 0008838|
|08-21-10 13:10||nicolas cellier||Relationship added||child of 0007002|
| Mantis 1.0.8[^]
Copyright © 2000 - 2007 Mantis Group
52 total queries executed.|
35 unique queries executed.