Mantis - Squeak
Viewing Issue Advanced Details
7789 Morphic major always 09-30-13 23:25 10-08-13 22:24
closed 4.4  
none 4.4  
0007789: CharacterScanner improvements to handle fonts and byte/wide Strings more cleanly
With the newly cleaned up CharacterScanner hierarchy done, it's time to clean up the way the scanning method is chosen.
related to 0001650closed tim [BUG] CharacterScanner primitive is broken; char scanning generally in a mess 
 Scanner-scanning-refactor.1.cs [^] (12,787 bytes) 10-03-13 20:24
 Scanner-scanning-hookup.1.cs [^] (562 bytes) 10-03-13 20:26

09-30-13 23:27   
Email 30 Sept 2013-
Now that Nicolas & I have pretty much finished this stage of cleaning up the scanners etc, we have at least achieved the major aim I had in mind; getting back to a single class tree for scanning text. So far as I can tell everything is working ok and I haven't managed to cause any errors.

The magic keys (cmd - & cmd +) that are supposed to kern the selection do not work, but then they don't in a vanilla 4.5-12461 image from before this work was done. So we didn't break it…

The next thing to do is try to simplify the choices between byte and wide strings, fonts and encodings and language environments. I hate seeing #isKindOf: or #isMemberOf: type tests in running code (you can excuse it in prototypes, for a few minutes at least) and #isByteString is not much better. We have classes and inheritance for a reason; nobody should be writing C code in Smalltalk.

Trying to list the factors involved in working out how to scan a text (and *please* correct whatever I get wrong):-

the String -
byteString; so far as I can see ByteStrings are single-byte characters (duh) with an assumed encoding. That appears to be 'mac roman' which is almost but not quite latin1 or iso-something-or-other.
wideString; 32 bit characters where the top (ish) 8 bits are used as a leading character (not to be confused with leading in the typographic sense of affecting line spacing - isn't English wonderfully clear…) that defines an EncodedCharSet (or LanguageEnvironment, sigh) which provides for a specific scanning message to use. To complicate life further, a later character in a WideString can change the encoding to use, which may well change the font, oh frabjous day.

the Font -
we have several classes of fonts, not all in the base image right now.
I think I'd divide them into two phyla at the moment;
a) StrikeFonts and other simple bitmap glyphs. This would include StrikeFont itself, HostFont and TTCFont (since it generates bitmaps that are simply bitblt'd to use)
b) ComplicatedPluginFonts where an interface to a more complex and sophisticated renderer is used to leverage a library such as TrueType, Cairo, Pango, Weyland or whatever. These may well need to completely usurp the actual scanner to do the work.

There's another font aspect that is important too, but for now at least it is tied to a & b above - whether pair-kerning is supported. I'm sure we could make a variant of StrikeFonts that does it if we wanted but let's keep things tolerably intelligible for now, eh?

I'm going to take a quick swing at changing the scanning to delegate to
1) the string, which will then delegate to
2) the font, which for all the classes in the image right now will then delegate back to
3) the scanner, but having already worked out which form of scanning is required.

OK; I'm going in! Cover me!
10-03-13 20:24   
After some going around in convoluted convolvings of conundra, I have a proposed solution.

Instead of splitting out byte and wide strings, and kernable and non-kernable fonts within the scanner scanCharactersFrom:to:… method we now
a) send #scanCharactersFrom:to:with:rightX: font: to the string.
b) a ByteString assumes no clever encodings and forwards #scanByteCharactersFrom:to:in:with: rightX: to the font
b-1) a non-pair-kerning font (i.e. all the strike font related classes currently in the image) passes #scanByteCharactersFrom:to:in:rightX: back to the characterscanner, which now knows exactly what to do
b-2) a pair-kerning font (FreeType? Cairo?) can send #scanKernableByteCharactersFrom:to:in:rightX: instead
c) a WideString finds the relevant EncodedCharSet to use and sends #scanMultibyteCharactersFrom:to:in:with:rightX:font: to it
c-1) the encoding thingy then sends #scanXXXXCharactersFrom:to:in:with:rightX: to the font - where XXXX is relevant for the encoding. Currently the only special version is #scanMultibyteJapaneseCharactersFrom:to:in:with:rightX:
c-2) similarly to b-2, the font forwards to the character scanner.

The minor inconvenience here is that anyone adding a new encoding has to implement a couple of methods
a) over-ride {encoding}>scanMultibyteCharactersFrom:to:in:with:rightX:font:
b) add AbstractFont>#scanXXXXCharactersFrom:to:in:with:rightX:
c) maybe also for not-in-the-image font classes
d) add CharacterScanner>#scanXXXXCharactersFrom:to:in:with:rightX:

Of course, at any step along the way, the functionality can be usurped and refocussed. A new scanner could be built and used, the font might use sophisticated plugin instead of a class in-image even the encoding used might have other ideas.

The attached two changesets have the current it-works-ok-but-isnt-complete code. If we use this there are several scanner related methods that ought to be changed to avoid a wasted send through CharacterScanner >scanCharactersFrom:to:in:rightX:stopConditions:kern: And the FreeType code will need updating.
10-08-13 22:24   
and finally
Graphics-tpr.256 for the hookup.