Mantis Bugtracker
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0005205 [Squeak] Kernel major always 10-09-06 05:13 06-22-13 00:23
Reporter wiz View Status public  
Assigned To
Priority normal Resolution open  
Status resolved   Product Version 3.10
Summary 0005205: Versions, sources, and changes. Repairing the current system to eliminate limitations.
Description The problem:

Squeak periodically runs out of room in the source and change files.
The number of files are currently fixed at two.
The size of each file is limited at 32M each.
Changes can only be added to the second file.

While compressing changes and sources works for distibution, not having an image of squeak with the history of all changes from the current source severely hampers maintenence and repair.

Conclusion: Squeak needs to have a version source system that does not have these gross limitations.
Additional Information So what to do about them.

See 0004369 for an enhancement that removes the file size limitation by using the adjunct class to compliled method to reference larger source and changes files.

I'm about to argue that this alone is not the best way to solve the problem stated above.

The essence of a good solution for me would allow a squeak to have several levels of versions. Each version level would have a source or changes file(s) connected to it. You could compress the lowest level of changes into the next higher version.

The other orthoganal aspect of the solution is rather than letting the file size get too large a source or change 'file' would actually be a series of modestly sized files. Part 1, part 2, part 3. etc.

Only the growth tip of the changes file changes. So this would allow the separate distribution of an image and (small) changetip with the option of downloading the unchanging sources once when necessary. We do this now by maintaining the sources separate from the changes file. This method would just be an extention of that.

The above is the needfully vague user story. The details need to be fleshed out in the implementatiion.

I understand the same mechanism that Klaus used to suggest the 512M files could be used to extend sources.

The main challenge would be working out a useable naming scheme for the files.

Squeak has a wonderful version numbering system already in place that can be used to generate levels.

The other necessary challenge is to build this scheme to be as friendly to the past method of doing sources and changes as possible. It will help greatly if old sources and changes are still referencable from within the new scheme of things.

I leave this problem here for now in hopes of feedback and support. I've have put the major label on it because it represents a major change and if it is to do the most good the sooner a solution is incorperated the more good it will do.

Attached Files  ExpandedSourceFileArray-part1-dtl.2.cs [^] (20,117 bytes) 12-27-09 18:59
 ExpandedSourceFileArray-part2-dtl.2.cs [^] (678 bytes) 12-27-09 19:00

- Relationships

SYSTEM WARNING: Creating default object from empty value

SYSTEM WARNING: Creating default object from empty value

related to 0005783new  [RFI] Needed a way to condense sources/changes that preserves version trails better. 
related to 0007239assigned andreas Refactoring Accesses to SourceFiles 
related to 0004369new  [Patch] Both source files now with 512MB capacity 

- Notes
(0008677 - 252 - 288 - 288 - 288 - 288 - 288)
wiz
12-13-06 10:53
edited on: 12-13-06 10:55

Hi Tim,
Thanks for assigning yourself to this.
The next good step to take would be to post whatever code you've got so far here.

I'll have some time over Xmas to persue my curiosity. And the more data I can give it the better it will serve me.

 
(0008867 - 4482 - 4883 - 5086 - 5086 - 5086 - 5086)
tim
01-11-07 10:34

Text of email I sent on jan 10 2007 hoping to spark some discussion of ways to proceed:-
    From: tim@rowledge.org
    Subject: Re: Version Histories (was Whats Happening with 3.10. etc.)
    Date: January 10, 2007 11:14:09 PM PST (CA)
    To: squeak-dev@lists.squeakfoundation.org

Here is my schema, such as it is thus far, for improving the source referencing. It's not complete. I need some suggestions for ways to tackle a few items.

The problem we face is that a lot needs to be changed in order to use anything other than indices into file; so many facilities rely upon it. Some serious refactoring would be needed in assorted changelist, version listing etc methods. There are complications in ImageSegment code too. RemoteString is pretty yucky. Class comments are also mixed up in file/index encoding assumptions. All the source compression, tempname-in-method etc code will need altering.

During the writing (and indeed installing later) of the code we need all our normal tools to keep working so we can stay sane while doing the writing.

The basic idea is to add a proper oop for a source reference object so that we can later implement classes that use a database, a web search, access multiple files, decompile, guess or whatever. Initially I propose simply using an integer with a simple encoding scheme (not the rather covoluted one currently in use - take a look at StandardSourceFileArray>fileIndexFromSourcePointer: and friends) and just reusing the files.

First possibly contentious idea
========================
Do this work in a 3.8.1 image to avoid the change in sources/changes files done during the 3.9 cycle and some issues with the introduction of Traits. I am *not* suggesting abandoning all the hard work in 3.9 and anyone implying that I did so will get a late night visitation. You Don't Want That.
*After* incorporating the improved source referencing, add the 3.9 packages but leave out the source condensing step(s). This would leave the SqueakV3.sources file untouched and could either leave us with a changes file that is simply appended to (and obviously quite big) or slightly reformatted (and still pretty big).

Step 1
Add a new source reference ivar to the method properties object that all methods now have. Well, except that method properties were added somewhere during 3.9 - so they'd have to be added early. Drat. Ideas on the minimum disruptive way to add this?

Step 2
Add new method creation methods that do not use the trailerBytes stuff. We are not using them yet...

Step 3
Change source access method(s) to check the value of the source reference in the methodProperties and use the 'old' access if it is nil - which of course it is for now. Also change endPC similarly.

Step 4
change #generate methods to refer to a global flag to say whether they use trailerbytes or not. Set the flag *before* that... DAMHIKT!

Step 5
A big do-it to flip the global flag, recompile all methods and thereby set the new shiny source pointers. This is where things can get very complicated. We have to decide what to do about the changes file and all those historical versions.
a) we could effectively condenseSources so that there is no need to worry about handling encrypted pointers to older versions. This would of course mean a new .sources file.
b) we could condenseChanges so that any methods in the changelog that are also in the old sources file have a correct back pointer, thus losing most but not all history.
c) we could try to be very clever and copy the entire history chain for each method across to a new changes file, keeping history.
d) we could be insane and try to make code that works out when a history pointer is 'old format' and still handle it and then just append new format sources to the current changes. I'm not going to write that one myself...
Ideas?

Step 6
change the #generate methods to ignore trailerbytes completely, drop the global flag and remove it
remove all the redundant method creation methods etc

Step 7
rework ImageSegment, source code compression, abandonSources, etc etc. Ideas?

There is a *lot* needing changing to do this well. I'm horrified how poorly factored and written some of this core code is and how it has just accreted more and more crap over the last few years. Yuck. Yuck. Bad taste in mouth.

tim
--
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim [^]
Strange OpCodes: EFBI: Emulate Five-volt Battery Intermittently
 
(0009064 - 1757 - 1979 - 1979 - 1979 - 1979 - 1979)
wiz
01-19-07 10:09

Hi Tim,

In thinking about a transition stratagy here is what I've come up with so far.

Its a somewhat vauge stratagy but useful because it works within the framework that exists.

1) keep the oldest two source/change files limited to their current size.
2) Use some values of the prior pointer as an extention flag.
the prior pointer in the range p < n or p > max - n where n is some number 1 < n < about 10 would give you lots of flags.

The presence of these values in the pointer field tells you the distinction between old way to retreive things or new way.

Then just wrap a filter around the things that use the prior pointers.

if the pointer is a normal value do what has always been done.

If it is a special value use the extended scheme to find recent changes.

The advantage is it would give a chance to add the new stuff in while still using the old stuff. Which has been doing its job up until recently.

It would allow additional change file parts to take up the load of the overflow changes.

When you get to a point that you have all the changes stuff in the new format you can set n higher so that everything filters into the new scheme.

At some point you have a safe point at which to remove the old now vestigial code in a clean up phase.

There should be some way to get to a point where versions of code exist in both the old and new format at the same time.

And you could probably gen up some exhaustive test to make sure you get
both the old scheme and new scheme to give equivalent answers to queries for versions.

This is a slow cautious way to get from scheme a to scheme b with both of them existing side by side.

That's the current thinking.

Yours in service, --Jerome Peace
 
(0009065 - 0 - 0 - 0 - 0 - 0 - 0)
wiz
01-19-07 10:13

Reminder sent to: tim

 
(0013449 - 397 - 431 - 431 - 431 - 431 - 431)
lewis
12-27-09 19:03

Uploaded two changes sets that correspond to the changes added to Squeak trunk on 26-Dec-2009:
 ExpandedSourceFileArray-part1-dtl.2.cs
 ExpandedSourceFileArray-part2-dtl.2.cs

These changes take advantage of the new CompiledMethodTrailer to permit essentially unlimited expansion of the sources and changes files, and are fully backward compatible with the traditional StandardSourceFileArray.
 
(0013451 - 2332 - 2521 - 2521 - 2521 - 2521 - 2521)
wiz
12-30-09 06:07

Hi Mr. Lewis,

Thanks for addressing this issue.

 I agree that not having the limit of 32k on the source and changes files will improve things marginally.

 I am still disturbed by two (missing) things.

 I ask all who sumbit changes for a test that fails before the patch and passes after it. This does not have to be a catastrophic failure. Simple something that proves the patch changes something that needed changing. It might simply be a test for the presence of the new classes. It would be even better if something substantial could be proven with the tests. (Very short of playing with 32M+ files let alone 512M. Test like those fall into the realm of acceptance tests. I am looking more for sunit and regression tests here.) Beyond guidance they will guard against future changes reverting things. The change process tends to let things fall through the cracks.

 The second missing thing is removing the limitation of one change file per image. There should be a way to have a change file per version level. Whole numbers, one decimal, two decimals etc.

The point being that when you ship an image at a certain version level you can ship just the image and a small changes file associated with the image. Other level of changes would not be affected. If someone is getting a new image they would need all the version levels of changes that go with that image that the have not already downloaded.

 The change files mimic the version tree.

 My belief is that the 512M scheme is missing this vital component. Experience has shown me that large files are potential problems in their own right. A small corruption in one part can kerflarg a great deal of data. Smaller more modular files usually fare better.

 This scheme assumes there would be other tools in place to manage the small files. Different levels of consolidation and redundancy removal. I am not addressing that here because that could be addressed once the ability to have multiple change levels is in place.

 My comments are old and my thinking is old. I never did adapt to the presence of MC. Still I think the need to aim for small modular change files will help squeak in the long run. No matter how far MC has come.

 Thank you for taking the time to think about these remarks.

Yours in curiosity and service, --Jerome Peace
 

- Issue History
Date Modified Username Field Change
10-09-06 05:13 wiz New Issue
12-10-06 21:49 tim Status new => assigned
12-10-06 21:49 tim Assigned To  => tim
12-13-06 10:53 wiz Note Added: 0008677
12-13-06 10:55 wiz Note Edited: 0008677
01-11-07 10:34 tim Note Added: 0008867
01-19-07 10:09 wiz Note Added: 0009064
01-19-07 10:13 wiz Note Added: 0009065
01-19-07 10:40 wiz Relationship added related to 0005783
12-03-08 21:33 Keith_Hodges Relationship added related to 0007239
12-13-09 18:05 lewis Issue Monitored: lewis
12-27-09 18:59 lewis File Added: ExpandedSourceFileArray-part1-dtl.2.cs
12-27-09 19:00 lewis File Added: ExpandedSourceFileArray-part2-dtl.2.cs
12-27-09 19:03 lewis Note Added: 0013449
12-27-09 19:09 lewis Relationship added related to 0004369
12-30-09 06:07 wiz Note Added: 0013451
06-22-13 00:23 tim Assigned To tim =>
06-22-13 00:23 tim Status assigned => resolved


Mantis 1.0.8[^]
Copyright © 2000 - 2007 Mantis Group
82 total queries executed.
49 unique queries executed.
Powered by Mantis Bugtracker