Mantis Bugtracker
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0007789 [Squeak] Morphic major always 09-30-13 23:25 10-08-13 22:24
Reporter tim View Status public  
Assigned To tim
Priority high Resolution fixed  
Status closed   Product Version 4.4
Summary 0007789: CharacterScanner improvements to handle fonts and byte/wide Strings more cleanly
Description With the newly cleaned up CharacterScanner hierarchy done, it's time to clean up the way the scanning method is chosen.
Additional Information
Attached Files  Scanner-scanning-refactor.1.cs [^] (12,787 bytes) 10-03-13 20:24
 Scanner-scanning-hookup.1.cs [^] (562 bytes) 10-03-13 20:26

- Relationships
related to 0001650closed tim [BUG] CharacterScanner primitive is broken; char scanning generally in a mess 

- Notes
(0014461 - 2856 - 3030 - 3030 - 3030 - 3030 - 3030)
tim
09-30-13 23:27

Email 30 Sept 2013-
Now that Nicolas & I have pretty much finished this stage of cleaning up the scanners etc, we have at least achieved the major aim I had in mind; getting back to a single class tree for scanning text. So far as I can tell everything is working ok and I haven't managed to cause any errors.

The magic keys (cmd - & cmd +) that are supposed to kern the selection do not work, but then they don't in a vanilla 4.5-12461 image from before this work was done. So we didn't break it…

The next thing to do is try to simplify the choices between byte and wide strings, fonts and encodings and language environments. I hate seeing #isKindOf: or #isMemberOf: type tests in running code (you can excuse it in prototypes, for a few minutes at least) and #isByteString is not much better. We have classes and inheritance for a reason; nobody should be writing C code in Smalltalk.

Trying to list the factors involved in working out how to scan a text (and *please* correct whatever I get wrong):-

the String -
byteString; so far as I can see ByteStrings are single-byte characters (duh) with an assumed encoding. That appears to be 'mac roman' which is almost but not quite latin1 or iso-something-or-other.
wideString; 32 bit characters where the top (ish) 8 bits are used as a leading character (not to be confused with leading in the typographic sense of affecting line spacing - isn't English wonderfully clear…) that defines an EncodedCharSet (or LanguageEnvironment, sigh) which provides for a specific scanning message to use. To complicate life further, a later character in a WideString can change the encoding to use, which may well change the font, oh frabjous day.

the Font -
we have several classes of fonts, not all in the base image right now.
I think I'd divide them into two phyla at the moment;
a) StrikeFonts and other simple bitmap glyphs. This would include StrikeFont itself, HostFont and TTCFont (since it generates bitmaps that are simply bitblt'd to use)
b) ComplicatedPluginFonts where an interface to a more complex and sophisticated renderer is used to leverage a library such as TrueType, Cairo, Pango, Weyland or whatever. These may well need to completely usurp the actual scanner to do the work.

There's another font aspect that is important too, but for now at least it is tied to a & b above - whether pair-kerning is supported. I'm sure we could make a variant of StrikeFonts that does it if we wanted but let's keep things tolerably intelligible for now, eh?

I'm going to take a quick swing at changing the scanning to delegate to
1) the string, which will then delegate to
2) the font, which for all the classes in the image right now will then delegate back to
3) the scanner, but having already worked out which form of scanning is required.

OK; I'm going in! Cover me!
 
(0014463 - 2073 - 2199 - 2199 - 2199 - 2199 - 2199)
tim
10-03-13 20:24

After some going around in convoluted convolvings of conundra, I have a proposed solution.

Instead of splitting out byte and wide strings, and kernable and non-kernable fonts within the scanner scanCharactersFrom:to:… method we now
a) send #scanCharactersFrom:to:with:rightX: font: to the string.
b) a ByteString assumes no clever encodings and forwards #scanByteCharactersFrom:to:in:with: rightX: to the font
b-1) a non-pair-kerning font (i.e. all the strike font related classes currently in the image) passes #scanByteCharactersFrom:to:in:rightX: back to the characterscanner, which now knows exactly what to do
b-2) a pair-kerning font (FreeType? Cairo?) can send #scanKernableByteCharactersFrom:to:in:rightX: instead
c) a WideString finds the relevant EncodedCharSet to use and sends #scanMultibyteCharactersFrom:to:in:with:rightX:font: to it
c-1) the encoding thingy then sends #scanXXXXCharactersFrom:to:in:with:rightX: to the font - where XXXX is relevant for the encoding. Currently the only special version is #scanMultibyteJapaneseCharactersFrom:to:in:with:rightX:
c-2) similarly to b-2, the font forwards to the character scanner.

The minor inconvenience here is that anyone adding a new encoding has to implement a couple of methods
a) over-ride {encoding}>scanMultibyteCharactersFrom:to:in:with:rightX:font:
b) add AbstractFont>#scanXXXXCharactersFrom:to:in:with:rightX:
c) maybe also for not-in-the-image font classes
d) add CharacterScanner>#scanXXXXCharactersFrom:to:in:with:rightX:

Of course, at any step along the way, the functionality can be usurped and refocussed. A new scanner could be built and used, the font might use sophisticated plugin instead of a class in-image even the encoding used might have other ideas.

The attached two changesets have the current it-works-ok-but-isnt-complete code. If we use this there are several scanner related methods that ought to be changed to avoid a wasted send through CharacterScanner >scanCharactersFrom:to:in:rightX:stopConditions:kern: And the FreeType code will need updating.
 
(0014465 - 112 - 142 - 142 - 142 - 142 - 142)
tim
10-08-13 22:24

See
Collections-tpr.539
Graphics-tpr.255
Multilingual-tpr.185
and finally
Graphics-tpr.256 for the hookup.
 

- Issue History
Date Modified Username Field Change
09-30-13 23:25 tim New Issue
09-30-13 23:26 tim Status new => assigned
09-30-13 23:26 tim Assigned To  => tim
09-30-13 23:27 tim Note Added: 0014461
09-30-13 23:27 tim Status assigned => pending
09-30-13 23:27 tim Relationship added related to 0001650
10-03-13 20:24 tim Note Added: 0014463
10-03-13 20:24 tim File Added: Scanner-scanning-refactor.1.cs
10-03-13 20:26 tim File Added: Scanner-scanning-hookup.1.cs
10-08-13 22:24 tim Status pending => resolved
10-08-13 22:24 tim Fixed in Version  => 4.4
10-08-13 22:24 tim Resolution open => fixed
10-08-13 22:24 tim Note Added: 0014465
10-08-13 22:24 tim Status resolved => closed


Mantis 1.0.8[^]
Copyright © 2000 - 2007 Mantis Group
55 total queries executed.
36 unique queries executed.
Powered by Mantis Bugtracker