Mantis Bugtracker
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0001821 [Squeak] Collections feature N/A 09-30-05 21:39 12-17-08 04:21
Reporter Ron View Status public  
Assigned To
Priority normal Resolution suspended  
Status new   Product Version 3.8
Summary 0001821: [ENH] Collection Utilities for consideration --Ron
Description Attached is a change set for a group of collection utilities that have been very usefull for me in the past. The most usefull methods are selectUnique: and merge. The selectN code is mostly useful for what I'm working on now, but you never know what someone could use. --Ron Teitelbaum.

Collection >> max: aBlock
    "return an object in reciever that has the higest value of aBlock, this will return the last lowest object, not a collection"

Collection >> merge
    "return to reciever a collection with each element concatenated to remove imbeded collections"
    "#(#(1 2) #(3 4)) merge = #(1 2 3 4), #('ab' 'cd') merge = 'abcd'"

Collection >> min: aBlock
    "return an object in reciever that has the lowest value of aBlock, this will return the last lowest object, not a collection"

Collection >> selectMax: aBlock
    "return all objects in reciever as a collection that have the higest value of aBlock, "

Collection >> selectMin: aBlock
    "return all objects in reciever as a collection that have the higest value of aBlock, "

Collection >> selectN: numberToReturn max: aBlock
    "return N objects in reciever as a collection that have the higest value of aBlock, "

Collection >> selectN: numberToReturn min: aBlock
    "return N objects in reciever as a collection that have the lowest value of aBlock, "

SequencableCollection >> selectUnique: aBlock
    "Returns to the sender a collection of assoications with the result of aBlock as the key and a collection of matching items as a value"


Additional Information
Attached Files  Collection Utilities.1.cs.gz [^] (959 bytes) 09-30-05 21:39

- Relationships

- Notes
(0002740 - 274 - 274 - 274 - 274 - 274 - 274)
dvf
10-01-05 03:54

Cool. I also like extending the collection protocol. Have you seen groupBy:having:? seems to me that a #groupBy: variation would be a more easily identifiable variation on your #selectUnique: (unless I misunderstand). After all select is supposed to not select everything...
 
(0002741 - 1104 - 1221 - 1221 - 1221 - 1221 - 1221)
Ron
10-01-05 04:27

Agreed. There are two arguments for leaving it selectUnique:

1) I've been using it for 9 years and that's how I remember it. <-kinda selfish.

2) The name implies selecting an object that is the same as the elements in the collection for each key. As opposed to collect which we expect will return something different.

The groupBy:having implementation is basically the same. You can get simular results from groupBy:having by doing

#('a' 'a' 'b') groupBy: [:a | a] having: [:a | true]

I think that the dictionary approach is two heavy weight for this method since the main use of this method is to identify keys. If you knew the key before hand you would use a select. (you would not normally access the result dictionary by key) Normally after using selectUnique: you would use some sort of iterator. Although the dictionary does give you a sorted collection of keys which is nice and the dictionary does allow for iteration.

An implementation of

groupBy: aBlock
^ self groupBy: aBlock having: [:a | true].

instead of my selectUnique works for me.

Thoughts? --Ron
 
(0002742 - 308 - 342 - 342 - 342 - 342 - 342)
dvf
10-01-05 04:43

Works for me. BTW, #merge is a good idea. Have you considered implementing it with gather:? (or the other way around)

BTW, how are min: max: different from detectMin: detectMax:?

And come on, variables named myAss? ;-) and seriously, I've never seen the "my" prefix actually contain useful information.
 
(0002743 - 1071 - 1119 - 1119 - 1119 - 1119 - 1119)
Ron
10-01-05 05:35

The merge is different in that is creates a collection the same spieces as the first collection element. This allows things like subString collections to be combined. gather works again with the addition of the unnecessary block but not for strings.

#('a' 'b' 'c') merge 'abc' gather: [:a | a] #($a $b $c)

#(#(1 2) #(3 4) #(5 6)) gather: [:a | a] #(1 2 3 4 5 6).

Sorry about the myAss it's an old habit working with associations. When I first learned Associations I was so taken with their power I went nuts. We had to write a transform for oracle so I could store them!! It was pretty funny. I've leared to back off some but variable names die hard.

There is no difference in detectMin: and detectMax: except style. I didn't know about detectMax: when I sent this in (I saw it when you mentioned groupBy:having:) I would suggest that min: and max: and also sum: (which I didn't send in) should replace these since detect does not stop at any element but instead goes through the whole collection. Detect should be used for interuptable scans. --Ron
 
(0002744 - 91 - 91 - 91 - 91 - 91 - 91)
Ron
10-01-05 05:42

By the way thank you for taking an interest. My name is Ron Teitelbaum. What's yours dvf?
 
(0002745 - 89 - 89 - 139 - 139 - 139 - 139)
dvf
10-01-05 08:00

I'm Daniel Vainsencher, danielv@tx.technion.ac.il. I you want to chat some more, mail me.
 
(0002746 - 536 - 551 - 551 - 551 - 551 - 551)
andreas
10-01-05 11:22

Eeek. Is this really necessary? Collection with its >130 methods is already among the top 20 classes in a basic 3.8 image (out of 1659 classes total). Nothing personal, mind you, but there are so many methods in collection that nobody uses anyway why would we want to add even more?

This kind of proliferation of base classes is already a real problem in Squeak (in my understanding one of the largest that we're facing long term), please let's only add new stuff to base classes if we have an excellent and proven reason to do that.
 
(0002748 - 733 - 757 - 757 - 757 - 757 - 757)
dvf
10-01-05 16:30

necessary? of course not. I certainly never considered inject:into: necessary until I found it exists. Not even select:. Its just useful.

About size of Collection - all extensions and all classes are not equal. Adding more protocol to Collection makes it richer, but I claim it doesn't make it harder to use the existing protocol, as opposed to extending Morph or Object, simply because the concept space is easier - yet another read only conversion function.

To be specific - I certainly use groupBy: more often than I do inject:into:, even though currently only the less convenient groupBy:having: exists. I would use merge lots if it was in the image. I can't recall needing the others more than once, so I'd leave them out.
 
(0002749 - 1624 - 1972 - 1972 - 1972 - 1972 - 1972)
andreas
10-01-05 17:44

For the fun of it, let's look at the classic collection protocols like here:

((Collection organization listAtCategoryNamed: 'enumerating') collect:[:each|
  (SystemNavigation new allCallsOn: each) size -> each
]) asArray sort reversed

The result (3.8 full) should be enlightening:

2505->#do:
861->#collect:
423->#select:
267->#detect:ifNone:
127->#associationsDo:
76->#reject:
74->#inject:into:
71->#anySatisfy:
45->#select:thenCollect:
45->#detect:
32->#do:separatedBy:
28->#allSatisfy:
25->#collect:thenSelect:
12->#intersection:
11->#difference:
8->#union:
8->#count:
5->#detectSum:
5->#detectMax:
4->#do:without:
4->#detectMin:
2->#groupBy:having:
1->#noneSatisfy:
0->#select:thenDo:
0->#reject:thenDo:
0->#collect:thenDo:

There is a very obvious (and very small) set of messages that are used almost everywhere: #do: #select: #collect: #detect:ifNone: (I don't really count associationsDo: since it is mostly used for Dictionaries). In fact, we can remove almost half of the protocol if we throw out those messages that are in "single digit use".

Unfortunately, you can't do that. Once a method has been added in Collection it basically there forever - somebody "might" use it and therefore you can't really remove it. It's because of that reason that I'm opposed to adding more methods without having good justifications for it.

About #merge - are you aware that #gather: will trivially do a merge for you? E.g., #( ( 1 2) (4 5) ) gather:[:x| x] => #(1 2 3 4 5). That's another one of those pet extensions now proliferating the Collection protocol.
 
(0002757 - 3093 - 3195 - 3195 - 3195 - 3195 - 3195)
Ron
10-02-05 02:14

Andreas, Thank you for you input. I agree that changes to base classes should only be done when there is a good reason to do so. In my last job there were no class owners so any developer was free to make changes to base classes to do what ever they wanted. Within VW we were able to make extensions to classes for our own use, which helped keep base class envy applications clean. Is it possible to make extensions in squeak? (forgive my ignorance I'm still getting up to speed on the difference between squeak and VW)

I think that you are right that you will find few senders within squeak but that doesn't consider how people outside are using the code. My expierence is that the methods I wrote are useful and I have had many developers that have used them in code since I wrote them.

Admittedly I should have looked closer before opening my mouth about things I though were usfull. I did not know about groupBy:having: or gather:.

I do believe that simplifying methods (methods with fewer parameters) are very useful. These methods groupBy: and gather (without a paramter) should be considered. Although I prefer the names selectUnique: and merge.

I also think that the detect... methods are improperly named, but such is life.
max: , min: , sum: ... would make more sense.

I also beleive that adding methods that are named for what people would think of when looking for them is a good idea as long as differences in implementation are clear from the name, or there are no differences in implementation. This is why I withdrew selectUnique: and suggested groupBy: which called groupBy:having: even though I think that my implentation of selectUnique: is better. groupBy: and groupBy:having: as names still bothers me since they are really sql constructs and users would expect to have sorted and filtered lists not dictionaries with keys, but since dictionaries can be used as iterators this point seems silly.

I leave all the methods up for consideration, since I'm new here, and let everyone decide what should be included or what should not.

Also I should point out there is no particular virture to having less methods on a class. (other then what I mentioned before about bad naming or differences in implemenation without it being clearly described by it's name) Anything that could be added to make things easier for someone trying to learn or someone trying to be productive should be considered. The size of the class in bytes is quickly justified by cool code that someone writes with it. If it's image size you are worried about running code quickly races past anything you might add. (do we have an image stripper here? if so add the point about image stipping, if you don't use the method it's not in your production image anyway)

Daniel, I see you worked on Montecello. I was very happy to see that it was there!! So far it's very intuitive and I've started tracking my code. Do people ever post changes here with with .mcz files or should we only use .cs.gz's? I'm looking forward to working with you somemore. Thanks again.
 
(0002763 - 1781 - 1865 - 1865 - 1865 - 1865 - 1865)
andreas
10-04-05 06:09

Ron - you have some good arguments and some not so good. Let's start with the most obvious problematic observation: That size doesn't matter. If you need a counter-example go look at class Morph and try to make sense of a class with about 2000 methods. No matter how well-named the messages might be, a monster like Morph (which after all doesn't even do that much) is simply overwhelming. The problem with the "size-doesn't-matter" approach in general is that the "good stuff" is hidden by the useless stuff. Wading through endless amounts of methods is not helpful and therefore every method (even more so if base classes are concerned) has to have a good reason to be there. Merely being "useful" is not enough IMO - I can think of hundreds of "useful" methods in Collection but it would make them less understandable by the shere amount of stuff.

Class extensions: They can be created and managed using Monticello but the problem here is that their scope is unlimited. For example, #gather: used to be an extension added by PackageInfo and should therefore only be used by packages requiring PackageInfo. Alas! Right after getting in people started using that extension, making it a de-facto "standard" method. The same is true for many other parts of the system and what I mean by "unable to remove" such methods. Since Squeak doesn't have a module system, nobody ever tests their stuff with the packages that are being claimed as required.

Improper naming: What's named good for one person might be named oddly for others. I remember the original discussion about #allSatisfy: and #anySatisfy: which started with Dan disliking the (VisualWorks) method #contains: (equivalent to anySatisfy:) So I'd be careful with claiming which names are proper and which one's aren't.
 
(0002764 - 992 - 1050 - 1050 - 1050 - 1050 - 1050)
dvf
10-04-05 07:15

Hi Andreas.

First, you ignored my arguments why the size of the enumerating protocol does not matter, as opposed to the size of Morph. Since these methods are read only, they can be understood JIT.

Second, I consider "numbers of uses in the image" to be a bad metric in regard to inclusion. select:thenCollect: may be common, but if it was absent, that would affect nothing but performance, since (a select: [..]) collect: [...] does the same thing and is almost as short. Whereas if #allSatisfy: or #intersection: had not been there, collection would have been functionally incomplete in a way. To my view, not having #groupBy: makes it incomplete in the same way - a useful way of looking at collections is not afforded.

I think almost every time I used gather: it was to implement merge in my code. #merge makes something that's commonly needed more easily available - why not add it?

About all of the other methods Ron proposed, I'm not convinced they carry their own weight.
 
(0002770 - 800 - 836 - 836 - 836 - 836 - 836)
Ron
10-05-05 02:00

Thank you for pointing out allSatisfy: and anySatisfy: . It keeps me from recommending confirm: and conform: . Which I left out and did the same thing. This again helps to point out that I should spend more time looking through the classes for implemntations of methods I used all the time (and still need) under different names. (I like allSatisfy and anySatisfy better those are good names, better then mine). This does help make the argument that size does matter. (having a good chuckle)

Can we agree then that dispite arguements of size and function, completeness or not that there is merit to the methods groupBy: and merge. I will confine my suggestions to well researched missing functionality in the future?

If we agree I will submit a new cs with only these methods.

Andres?
 
(0004550 - 1024 - 1192 - 1416 - 1416 - 1416 - 1416)
nicolas cellier
03-23-06 02:25
edited on: 03-23-06 02:49

I perfectly agree with
count: aBlock
  ^1 to: self size inject: 0 into: [:count :each |
    (aBlock value: each) ifTrue: [count+1] ifFalse: [count]]

It is clear (clearer than inject:into: pattern), short understandable and usefull. I have it for long time and vote for it.

I also have arithmetic maxOf: minOf: sumOf: productOf:
findMaxOf: findMinOf: findAll: (find the index of/all the indices)
Why not max: ? because it might be interpreted as
#( 1 4 7) max: 5 => #(5 5 7)
#(1 4 7) max: #(5 2 2) => #(2 4 7)

But these are not well accepted by community as kernel value adding...
There is also the problem of empty collections, in which case inject:into: is better...

also have the more specific cumulativeSum: cumulativeProduct:
I also like the idea to access a collection with a mask of booleans (a kind of select:).
But all this is related to array programming and you should look at http://www.iam.unibe.ch/~scg/Archive/Papers/Moug03aOOPALOOPSLA.pdf [^]

Nicolas Cellier (nice) ncellier@ifrance.com

 

- Issue History
Date Modified Username Field Change
09-30-05 21:39 Ron New Issue
09-30-05 21:39 Ron File Added: Collection Utilities.1.cs.gz
10-01-05 03:54 dvf Note Added: 0002740
10-01-05 04:27 Ron Note Added: 0002741
10-01-05 04:43 dvf Note Added: 0002742
10-01-05 05:35 Ron Note Added: 0002743
10-01-05 05:42 Ron Note Added: 0002744
10-01-05 08:00 dvf Note Added: 0002745
10-01-05 11:22 andreas Note Added: 0002746
10-01-05 16:30 dvf Note Added: 0002748
10-01-05 17:44 andreas Note Added: 0002749
10-02-05 02:14 Ron Note Added: 0002757
10-04-05 06:09 andreas Note Added: 0002763
10-04-05 07:15 dvf Note Added: 0002764
10-05-05 02:00 Ron Note Added: 0002770
03-23-06 02:25 nicolas cellier Note Added: 0004550
03-23-06 02:41 nicolas cellier Note Edited: 0004550
03-23-06 02:49 nicolas cellier Note Edited: 0004550
12-17-08 04:21 Keith_Hodges Resolution open => suspended


Mantis 1.0.8[^]
Copyright © 2000 - 2007 Mantis Group
96 total queries executed.
57 unique queries executed.
Powered by Mantis Bugtracker