Mantis Bugtracker

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0007549 [Squeak] VM major always 06-15-10 07:08 06-22-10 02:14
Reporter johnmci View Status public  
Assigned To lewis
Priority normal Resolution fixed  
Status resolved   Product Version trunk
Summary 0007549: 64bit VM, 32bit image, sweepPhase hangs if the first word in the heap is the start of a free chunk
Description Friends, Romans, and SqueakVMers,

As some of you know, I am working on my own Squeak VM, and I recently ran into a strange situation where the 64-bit Squeak VM would loop forever reading one of my snapshots. The 32-bit version--that is, just running it in 32-bit mode--worked fine. I spent some time tracking this down, and believe that the problem was caused by a bug in the 64-bit Squeak VM GC code that is only excited when the first word in the heap is the start of a free chunk. I don't know where I should be sending this report, so I'm sending it to you, in hopes that you may be able to forward it to the right place and/or people.

Here is the explanation:

Since my object format uses extra words per object preceding the standard ST header, when I write out a snapshot, I write those words out as "Free chunks", the official ST way of saying there are X words not part of any object.
In the 32-bit version of the real Squeak VM, Oops are (tagged) addresses of the target objects.
The 64-bit version of the real Squeak VM, each 32-bit oop in the image is added to an offset to product the (64-bit) address of the object. Thus, oops in the image are relative to the start of the heap.

The sweep phase of the Squeak VM GC is this:

    "Sweep memory from youngStart through the end of memory. Free all
    inaccessible objects and coalesce adjacent free chunks. Clear the mark
    bits of accessible objects. Compute the starting point for the first pass of
    incremental compaction (compStart). Return the number of surviving
    objects. "
    "Details: Each time a non-free object is encountered, decrement the
    number of available forward table entries. If all entries are spoken for
    (i.e., entriesAvailable reaches zero), set compStart to the last free
    chunk before that object or, if there is no free chunk before the given
    object, the first free chunk after it. Thus, at the end of the sweep
    phase, compStart through compEnd spans the highest collection of
    non-free objects that can be accomodated by the forwarding table. This
    information is used by the first pass of incremental compaction to
    ensure that space is initially freed at the end of memory. Note that
    there should always be at least one free chunk--the one at the end of
    the heap."
    | entriesAvailable survivors freeChunk firstFree oop oopHeader oopHeaderType hdrBytes oopSize freeChunkSize endOfMemoryLocal |
    self inline: false.
    self var: #oop type: 'usqInt'.
    self var: #endOfMemoryLocal type: 'usqInt'.
    entriesAvailable := self fwdTableInit: self bytesPerWord * 2.
    survivors := 0.
    freeChunk := nil.
    firstFree := nil.
    "will be updated later"
    endOfMemoryLocal := endOfMemory.
    oop := self oopFromChunk: youngStart.
    [oop< endOfMemoryLocal]
        whileTrue: ["get oop's header, header type, size, and header size"
            statSweepCount := statSweepCount + 1.
            oopHeader := self baseHeader: oop.
            oopHeaderType := oopHeader bitAnd: TypeMask.
            hdrBytes := headerTypeBytes at: oopHeaderType.
            (oopHeaderType bitAnd: 1) = 1
                ifTrue: [oopSize := oopHeader bitAnd: self sizeMask]
                ifFalse: [oopHeaderType = HeaderTypeSizeAndClass
                        ifTrue: [oopSize := (self sizeHeader: oop) bitAnd: self longSizeMask]
                        ifFalse: ["free chunk" oopSize := oopHeader bitAnd: self longSizeMask]].
            (oopHeader bitAnd: self markBit) = 0
                ifTrue: ["object is not marked; free it"
                    "<-- Finalization support: We need to mark each oop chunk as free -->"
                    self longAt: oop - hdrBytes put: HeaderTypeFree.
                    freeChunk ~= nil
                        ifTrue: ["enlarge current free chunk to include this oop"
                            freeChunkSize := freeChunkSize + oopSize + hdrBytes]
                        ifFalse: ["start a new free chunk"
                            freeChunk := oop - hdrBytes.
                            "chunk may start 4 or 8 bytes before oop"
                            freeChunkSize := oopSize + (oop - freeChunk).
                            "adjust size for possible extra header bytes"
                            firstFree = nil ifTrue: [firstFree := freeChunk]]]
                ifFalse: ["object is marked; clear its mark bit and possibly adjust
                    the compaction start"
                    self longAt: oop put: (oopHeader bitAnd: self allButMarkBit).
                    "<-- Finalization support: Check if we're running about a weak class -->"
                    (self isWeakNonInt: oop) ifTrue: [self finalizeReference: oop].
                    entriesAvailable> 0
                        ifTrue: [entriesAvailable := entriesAvailable - 1]
                        ifFalse: ["start compaction at the last free chunk before this object"
                            firstFree := freeChunk].
                    freeChunk ~= nil "BUG cannot handle free start--dmu! ***********************************************"
                        ifTrue: ["record the size of the last free chunk"
                            self longAt: freeChunk put: ((freeChunkSize bitAnd: self longSizeMask) bitOr: HeaderTypeFree).
                            freeChunk := nil].
                    survivors := survivors + 1].
            oop := self oopFromChunk: oop + oopSize].
    freeChunk ~= nil
        ifTrue: ["record size of final free chunk"
            self longAt: freeChunk put: ((freeChunkSize bitAnd: self longSizeMask) bitOr: HeaderTypeFree)].
    oop = endOfMemory
        ifFalse: [self error: 'sweep failed to find exact end of memory'].
    firstFree = nil
        ifTrue: [self error: 'expected to find at least one free object']
        ifFalse: [compStart := firstFree].

    ^ survivors

Notice the line with the comment I added, it has a lot of asterisks. It is trying to go back to the last free chunk found and update its length, now that its found non-free chunk.
But, the test works by initializing the "freeChunk" variable to zero (called "nil" in this code)!!
If the first free chunk is at the start of the heap, it's "oop" will be zero in the 64-bit case, and it's length will NEVER be initialized.
As a result, when the compaction algorithm scans the heap, it gets stuck on this zero-length free chunk and loops forever.
(I think the other nil tests are troublemakers, too, BTW).

The workaround for my VM is simply to skip those first free words, so that the first word in the heap is a non-free object. (The snapshot code does a GC so all real free objects are gone by this time).

However, you might want to redo the tests so the algorithm tolerates a free chunk at the start of the heap.

Thank you for all your efforts!

- David
Additional Information
Attached Files  M7549-ObjectMemory-sweepPhase-dtl.2.cs [^] (4,507 bytes) 06-16-10 10:29

- Relationships

- Notes
(0013810 - 970 - 1036 - 1036 - 1036 - 1036 - 1036)
06-15-10 07:53

Likely of course this signage issue is a concern because what if freeChunk := 0x9000000 making a negative number then later it does the freeChunkSize := oopSize + (oop - freeChunk) and I think the math would be wrong, well assuming the (oop - freeChunk) is a signed operatin, I'd guess both firstFree and freeChunk should be unsigned ints just to clarify.

Er isn't freeChunkSize := oopSize + (oop - freeChunk). actually freeChunkSize := oopSize + hdrBytes. Or am I confused tonight?

Anyway the problem is that freeChunk = zero is also being used as a test for freeChunk has no value versus freeChunk is zero is a valid value.

Therefore to fix I think we have to set freeChunk to 0xFFFFFFFF then check for freeChunk ~= 0xFFFFFFFF
Also we need to set firstFree to 0xFFFFFFFF and check both firstFree = 0xFFFFFFFF ifTrue:

In 64bit mode we need to use 0xFFFFFFFFFFFFFFFF

That would allow 0x0 as a valid value and where 0xFFFFFFFF... is a non-value.
(0013811 - 261 - 261 - 261 - 261 - 261 - 261)
06-16-10 03:40

I think that this is accomplished by using -1 rather than nil to represent invalid oop. Change set M7549-ObjectMemory-sweepPhase-dtl attached. Note that -1 is 0xFFFF.... for either 4 or 8 byte sqInt, so this hopefully implements the change as per your proposal.
(0013812 - 236 - 236 - 236 - 236 - 236 - 236)
06-16-10 03:49

Ah, well David Ungar's thoughts are fiddling with the OFFSET value over in the memory allocation logic is 100 miles away from where the code is important. Thus likely to be removed because no-one understands the issue. So -1 might work.
(0013813 - 200 - 317 - 317 - 317 - 317 - 317)
06-16-10 03:52

Ok, what about

firstFree = nil ifTrue: [firstFree := freeChunk]]]

    firstFree = nil
        ifTrue: [self error: 'expected to find at least one free object']

I think firstFree also has to be -1?
(0013814 - 83 - 83 - 83 - 83 - 83 - 83)
06-16-10 10:31

Replaced change set with fixed version, fix checks for both freeChunk and firstFree
(0013815 - 459 - 459 - 459 - 459 - 459 - 459)
06-17-10 02:06

Note to reviewers: In 2's complement representation (i.e. a C integer), -1 equals 0xFFFFFFF for a 32 bit sqInt, and -1 equals 0xFFFFFFFFFFFFFFFF for a 64 bit sqInt. Thus the use of -1 as the flag value is equivalent to John's suggestion of 0xFFFFFFFF or 0xFFFFFFFFFFFFFFFF for 32 and 64 bit images respectively. The generated C is as per John's proposal, and when running in an interpreter simulator the -1 is just a -1 which is of course also an invalid oop.
(0013819 - 45 - 45 - 45 - 45 - 45 - 45)
06-22-10 02:14

Patch included in SqS/VMMaker VMMaker-dtl.183

- Issue History
Date Modified Username Field Change
06-15-10 07:08 johnmci New Issue
06-15-10 07:08 johnmci Status new => assigned
06-15-10 07:08 johnmci Assigned To  => lewis
06-15-10 07:53 johnmci Note Added: 0013810
06-16-10 03:34 lewis File Added: M7549-ObjectMemory-sweepPhase-dtl.1.cs
06-16-10 03:40 lewis Note Added: 0013811
06-16-10 03:49 johnmci Note Added: 0013812
06-16-10 03:52 johnmci Note Added: 0013813
06-16-10 10:29 lewis File Added: M7549-ObjectMemory-sweepPhase-dtl.2.cs
06-16-10 10:29 lewis File Deleted: M7549-ObjectMemory-sweepPhase-dtl.1.cs
06-16-10 10:31 lewis Note Added: 0013814
06-17-10 02:06 lewis Note Added: 0013815
06-22-10 02:14 lewis Status assigned => resolved
06-22-10 02:14 lewis Resolution open => fixed
06-22-10 02:14 lewis Note Added: 0013819

Mantis 1.0.8[^]
Copyright © 2000 - 2007 Mantis Group
66 total queries executed.
42 unique queries executed.
Powered by Mantis Bugtracker