Mantis Bugtracker
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0007564 [Squeak] Collections major always 10-14-10 18:14 02-06-11 23:48
Reporter Peter Kenny View Status public  
Assigned To leves
Priority normal Resolution fixed  
Status closed   Product Version 4.1
Summary 0007564: Failure of concatenation operator
Description This causes failure of an existing package, which works in earlier versions. The construct ('/',Character separators) is used to generate an array of terminators for String>>findTokens: Instead of the array, we get a ByteString consisting of the slash character followed by the PrintString version of Character separators. To verify this, check whether the following is true:
('/',Character separators) asSortedcollection = (Character separators,'/') asSortedCollection
It should answer true, and does in Squeak 3.8; in Squeak 4.1 it answers false. To see what is happening, try in Squeak 4.1:
('/',Character separators) = ('/',Character separators printString)
On my machine this answers true, but it is obvious nonsense.
Additional Information My machine is running Windows 7 Home Premium 64-bit. Squeak was downloaded yesterday (13 October 2010) from www.squeak.org. The image calls itself Squeak 4.1, but the .exe says it is 4.0.2. The package I am testing is the HTML and CSS Validating Parser from www.squeaksource.org/htmlcssparser, which works fine on earlier versions.
Attached Files  CharacterSepFix.1.cs [^] (1,851 bytes) 10-26-10 02:09
 filein.jpg [^] (206,861 bytes) 10-26-10 15:36

- Relationships

- Notes
(0013888 - 671 - 677 - 677 - 677 - 677 - 677)
Peter Kenny
10-15-10 16:53

I have done a little exploration (I know Dolphin Smalltalk, but I am a complete beginner on Squeak). It seems clear that the problematic behaviour is a consequence of the definition of String>>, (the comma operator), which does not exist in 3.8 - in that version String inherits from SequenceableCollection (I think). The last line of the definition applies 'asString' to the argument, and for an array the definition of asString is inherited from Object, and simply gives the printString. Hence the behaviour is explained, but I cannot trace back enough to see the need for a separate definition of the comma operator in Class String in version 4.1. Over to the experts.
 
(0013890 - 984 - 1032 - 1032 - 1032 - 1032 - 1032)
Peter Kenny
10-23-10 22:55

Having now read the wiki page on bug reporting, I see that we are advised to verify that the supposed bug still exists in the latest alpha. I have downloaded the latest image I can find (Squeak4.2-10548-alpha.image dated 22-Sep-2010) and the latest Windows 32 VM (Squeak4.1.1.exe dated 27-Jul-2010). I can confirm that the behaviour I reported in 4.1 still exists in this version.

Note that the comment in the Squeak 4.1 code for String>>, implies that the change from earlier versions has been made to allow what would formerly have been an improper use of string concatenation. The example quoted is:
Transcript cr; show: 'The value is ',3.
which is illegal in earlier versions of Squeak (also in Pharo and most other Smalltalks that I know); they all require the more explicit:
Transcript cr; show: 'The value is ',3 printString.
This suggests to me that String>>, should just be deleted (unless there are methods that use the improper concatenation elsewhere in the image).
 
(0013891 - 1129 - 1512 - 1512 - 1512 - 1512 - 1512)
Peter Kenny
10-24-10 20:57

On the other hand, if it is essential to retain this new 'improper' concatenation, it is possible to modify the code of String>>, so that it will function as in previous versions of Squeak if the argument is any kind of SequenceableCollection, otherwise it will behave as in the present code for 4.1. The following is the complete code for the method; I have commented line I have inserted:

, anObject
    "Concatenate the argument to the receiver.
        Transcript cr; show: 'The value is: ', 3.
    "
    "Next line inserted by PK 24-Oct-2010"
    (anObject isCollection and: [anObject isSequenceable]) ifTrue: [^super , anObject].
    ^ self copyReplaceFrom: self size + 1
          to: self size
          with: anObject asString

I have tested that the example in the method comment still works, and also that correct behaviour is restored for the problem case quoted in my original post.

Does anybody read these posts, or am I just talking to myself? This is a bug which breaks an existing application by misinterpreting a perfectly correct Smalltalk construct. I posted details ten days ago, and so far it has not even been acknowledged.
 
(0013893 - 416 - 428 - 428 - 428 - 428 - 428)
andreas
10-25-10 22:10

The easiest way to work around this issue is by being explicit on your end. If you intend the result to be an Array you can simply do, e.g., ('/' asArray, Character separators), if you want the result to be a string you can do, e.g., ('/', (String withAll: Character separators)).

Ultimately I think Character separators should return a string instead of an Array; there is little reason to return an Array here.
 
(0013894 - 754 - 778 - 778 - 778 - 778 - 778)
Peter Kenny
10-25-10 22:38

Andreas
The work round I used is even simpler - I wrote (Character separators, '/'), which works fine - it is an input to String>>findTokens:, which is happy with any sequenceableCollection. Note, however, that I am not the maintainer of the HTMLCSS Parser, which is where this construct occurs. If anyone downloads the parser and tries to use it in Squeak 4.1, it will fail on this point. The introduction of String>>, in 4.1 has broken an established package, and all to allow a non-standard usage. The problem could of course recur in other contexts; it is quite legitimate to concatenate a string and an array of characters, but the new String>>, will foul it up. My suggested extra line will restore the behaviour to what it was in earlier Squeaks.
 
(0013895 - 482 - 488 - 488 - 488 - 488 - 488)
andreas
10-26-10 00:14

The problem with the suggested change is that it will fail in cases where (I think) it should not fail, for example when doing something like '/', #(1 2 3). I'm not convinced that character-array concatenation is a common enough use case to warrant the change (this is the first time ever that this kind of issue was reported) so I would rather change Character>>separators to return a string since there's really no good reason to return an array of characters instead of a string.
 
(0013896 - 130 - 130 - 130 - 130 - 130 - 130)
andreas
10-26-10 02:10

I've uploaded the fix I posted into the trunk. This should work just fine with 4.1; please try it out and let me know if it works.
 
(0013897 - 188 - 188 - 188 - 188 - 188 - 188)
Peter Kenny
10-26-10 15:19

Andreas: Idiot question I know, but I am a Squeak beginner. How do I file in the change set you sent me? I have spent half an hour searching every menu I can find, and I can't work it out.
 
(0013898 - 117 - 117 - 117 - 117 - 117 - 117)
andreas
10-26-10 15:35

Go to the Tools menu, choose Filelist. Select the .cs file and click on the [fileIn] button. See attached screenshot.
 
(0013899 - 1061 - 1085 - 1085 - 1085 - 1085 - 1085)
Peter Kenny
10-26-10 16:18

Right. I had made changes I could not be sure of reverting, so for a fair test I (1) re-installed Squeak 4.1 from the download file (2) filed in your change set (3) re-installed the parser from the download file. After this it worked fine; I was able to read and parse the front page of a web newspaper. So your change has fixed my problem - thanks!

Just so I understand the situation (I'm not sure what 'posted to the trunk' means), will the changes appear only in the next release (4.2), or will they also affect future downloads of 4.1? I should perhaps contact the maintainers of the parser and try to persuade them to alter ('/', Character separators) to (Character separators, '/'), which will have the same effect but will work in the unchanged 4.1.

I won't prolong the argument - it would just be wasting your time - but I can't help thinking that your first suggestion to me, to make the intention more explicit, should be used to anyone who wrote ('/', #(1 2 3)) - I would not automatically assume that the printString of the array was required.
 
(0013916 - 342 - 342 - 498 - 498 - 498 - 498)
leves
11-08-10 00:29

Posted to trunk means, that it will be available in the next release. To test it: download the following image http://ftp.squeak.org/trunk/Squeak4.2-10548-alpha.zip [^] . Then update the image by clicking the Squeak logo on the upper left corner to open the Squeak menu and select 'Update Squeak'. If this is done, the image will contain the fix.
 

- Issue History
Date Modified Username Field Change
10-14-10 18:14 Peter Kenny New Issue
10-15-10 16:53 Peter Kenny Note Added: 0013888
10-23-10 22:55 Peter Kenny Note Added: 0013890
10-24-10 20:57 Peter Kenny Note Added: 0013891
10-25-10 22:10 andreas Note Added: 0013893
10-25-10 22:38 Peter Kenny Note Added: 0013894
10-26-10 00:14 andreas Note Added: 0013895
10-26-10 02:09 andreas File Added: CharacterSepFix.1.cs
10-26-10 02:10 andreas Note Added: 0013896
10-26-10 15:19 Peter Kenny Note Added: 0013897
10-26-10 15:35 andreas Note Added: 0013898
10-26-10 15:36 andreas File Added: filein.jpg
10-26-10 16:18 Peter Kenny Note Added: 0013899
11-08-10 00:29 leves Status new => resolved
11-08-10 00:29 leves Fixed in Version  => trunk
11-08-10 00:29 leves Resolution open => fixed
11-08-10 00:29 leves Assigned To  => leves
11-08-10 00:29 leves Note Added: 0013916
02-06-11 23:48 leves Status resolved => closed


Mantis 1.0.8[^]
Copyright © 2000 - 2007 Mantis Group
82 total queries executed.
52 unique queries executed.
Powered by Mantis Bugtracker