Mantis - Squeak
Viewing Issue Advanced Details
1876 Collections minor always 10-06-05 18:45 02-06-11 23:48
MarcusDenker  
andreas  
normal  
closed 3.9  
fixed  
none    
none 4.1  
0001876: [ENH] identityHash
In 2001, Scott A Crosby posted a couple of changesets to fix the bad scalability of collections that
use identityhash.

The idea: put the primitive into a method "hashBits", then define "identityHash" as hashBits*primNumber.
So we do the upscaling directly in the hash method, not later in scanFor:

The good effect: scanFor: does not need to test for size, all object that use identityHash get useable
hash values.

Negative: #identityHash now is slower due to the multiplication... but that should not be noticable.

Phillipe did a test of the idea by implementing #hash like that and using OrderdCollection. The
picture attached show the result.

TODO: check if such a big scale is needed... maybe a smaller number is enough?
 objectchart.png [^] (17,200 bytes) 10-06-05 18:45
 001IdentityHash.cs [^] (412 bytes) 10-06-05 19:02
 002IdentityHash.cs [^] (1,921 bytes) 10-06-05 19:03
 003dentityHash.cs [^] (246 bytes) 10-06-05 19:03
 004IdentityHash.cs [^] (2,416 bytes) 10-06-05 19:03
 IdentityHashFixForDystemTrace.cs [^] (1,319 bytes) 10-06-05 19:03
 hiccups.png [^] (21,542 bytes) 02-19-06 15:36
 new_dictionary.png [^] (29,068 bytes) 02-21-06 20:28

Notes
(0002786)
pmm   
10-06-05 22:56   
Some clarification:

The picture shows the time for accessing 100 Objects in a Dictionary in relation to the size of the Dictionary.
The Object line is the access time as is. The PrimeObject line is the access time when multiplying #identityHash by 100799.
It should be contant, because the number of Objects accessed (100) is constant.
The OrderedCollection is just used to store the keys.

The code (originally posted by Lukas):
| test ord |
test := Dictionary new.
[ test size >= 10000] whileFalse: [
    ord := OrderedCollection new: 100.
    100 timesRepeat: [ 
        test at: ( ord add: Object new ) put: nil ].
    Transcript
        show: test size asString;
        tab.
    Transcript show:
        [
            10 timesRepeat: [ 
                ord do: [ :each | test at: each ] ]
        ] timeToRun asString.
    Transcript cr ]


(0003646)
andreas   
01-28-06 20:51   
Just stumbled across this. Two things to consider:
- If the problem is in Dictionary (contrary to IDDict), why not implement Object>>hash as "self identityHash * BigPrime"?
- There is still a scalability problem in IDSet/IDDict due to an unfortunate choice of the initial size of sets. This should really be changed to something like:
  Set class>>new
     ^self basicNew init: 5.
(try the benchmark with an IDDict with and without this and notice the major difference around 4k elements)

(0003906)
pmm   
02-19-06 15:40   
Yes, Set new is a different issue. See hiccups.png. It shows this is not fixed by multiplying #identityHash by 100799. But changing Set new doesn't fix Dictionaries. So they should both be integrated.
(0004774)
ducasse   
04-22-06 13:43   
Ok guys
I would like to conclude....
So could you give some code :)
(0012857)
Keith_Hodges   
12-17-08 04:17   
"fix begin"
Installer mantis bug: 1876 fix: '004IdentityHash.cs'.
"fix end"
(0013858)
nicolas cellier   
09-07-10 18:46   
This has been resolved in trunk (4.1) with introduction of this method:
scaledIdentityHash
    "For identityHash values returned by primitive 75, answer
    such values times 2^18. Otherwise, match the existing
    identityHash implementation"

    ^self identityHash * 262144 "bitShift: 18"

#scaledIdentityHash must be used as hash index (IdentityDictionary, IdentitySet, ...)
#identityHash can still be used for printing or when collection is known small enough.