Mantis - Squeak
Viewing Issue Advanced Details
1876 Collections minor always 10-06-05 18:45 02-06-11 23:48
closed 3.9  
none 4.1  
0001876: [ENH] identityHash
In 2001, Scott A Crosby posted a couple of changesets to fix the bad scalability of collections that
use identityhash.

The idea: put the primitive into a method "hashBits", then define "identityHash" as hashBits*primNumber.
So we do the upscaling directly in the hash method, not later in scanFor:

The good effect: scanFor: does not need to test for size, all object that use identityHash get useable
hash values.

Negative: #identityHash now is slower due to the multiplication... but that should not be noticable.

Phillipe did a test of the idea by implementing #hash like that and using OrderdCollection. The
picture attached show the result.

TODO: check if such a big scale is needed... maybe a smaller number is enough?
 objectchart.png [^] (17,200 bytes) 10-06-05 18:45
 001IdentityHash.cs [^] (412 bytes) 10-06-05 19:02
 002IdentityHash.cs [^] (1,921 bytes) 10-06-05 19:03
 003dentityHash.cs [^] (246 bytes) 10-06-05 19:03
 004IdentityHash.cs [^] (2,416 bytes) 10-06-05 19:03
 IdentityHashFixForDystemTrace.cs [^] (1,319 bytes) 10-06-05 19:03
 hiccups.png [^] (21,542 bytes) 02-19-06 15:36
 new_dictionary.png [^] (29,068 bytes) 02-21-06 20:28

10-06-05 22:56   
Some clarification:

The picture shows the time for accessing 100 Objects in a Dictionary in relation to the size of the Dictionary.
The Object line is the access time as is. The PrimeObject line is the access time when multiplying #identityHash by 100799.
It should be contant, because the number of Objects accessed (100) is constant.
The OrderedCollection is just used to store the keys.

The code (originally posted by Lukas):
| test ord |
test := Dictionary new.
[ test size >= 10000] whileFalse: [
    ord := OrderedCollection new: 100.
    100 timesRepeat: [ 
        test at: ( ord add: Object new ) put: nil ].
        show: test size asString;
    Transcript show:
            10 timesRepeat: [ 
                ord do: [ :each | test at: each ] ]
        ] timeToRun asString.
    Transcript cr ]

01-28-06 20:51   
Just stumbled across this. Two things to consider:
- If the problem is in Dictionary (contrary to IDDict), why not implement Object>>hash as "self identityHash * BigPrime"?
- There is still a scalability problem in IDSet/IDDict due to an unfortunate choice of the initial size of sets. This should really be changed to something like:
  Set class>>new
     ^self basicNew init: 5.
(try the benchmark with an IDDict with and without this and notice the major difference around 4k elements)

02-19-06 15:40   
Yes, Set new is a different issue. See hiccups.png. It shows this is not fixed by multiplying #identityHash by 100799. But changing Set new doesn't fix Dictionaries. So they should both be integrated.
04-22-06 13:43   
Ok guys
I would like to conclude....
So could you give some code :)
12-17-08 04:17   
"fix begin"
Installer mantis bug: 1876 fix: '004IdentityHash.cs'.
"fix end"
nicolas cellier   
09-07-10 18:46   
This has been resolved in trunk (4.1) with introduction of this method:
    "For identityHash values returned by primitive 75, answer
    such values times 2^18. Otherwise, match the existing
    identityHash implementation"

    ^self identityHash * 262144 "bitShift: 18"

#scaledIdentityHash must be used as hash index (IdentityDictionary, IdentitySet, ...)
#identityHash can still be used for printing or when collection is known small enough.