Mantis - Squeak
Viewing Issue Advanced Details
7174 Kernel tweak N/A 08-31-08 20:34 04-18-10 22:05
nicolas cellier  
nicolas cellier  
closed 3.10  
none trunk  
0007174: Make SqNumberParser parse LargeInteger faster
Composing a LargeInteger digit by digit from left to right involves 2 LargeInteger arithmetic operations at each digit (large*base + digit).

'1234etc...' -> (((1*10+2)*10+3)*10+4)*10+etc...

Composing a LargeInteger by packets of digits involves more SmallInteger operations and less largeInteger operations.
This is similar to divide and conquer algorithm used for printing LargeInteger (see 0006887).

'12345678' -> (1234*10000+5678)
1234 -> (12*100+34)
5678 -> (56*100+78)
12 -> (1*10+2)

This changes further enhance speed up proposed at 0006976
See also subject 'SpeedingUp parsing of LargeInteger' on squeak-dev [^]

The strategy in .cs attached is to:
- stop reading digits 1by1 as soon as a LargeInteger is formed
  This forms an elementary LargeInteger.
- aggregate packets made of 1, then 2, then 3, etc... elementary LargeInteger

An alternate strategy would be to pre-scan the stream and gather all digits in memory before starting processing the digits...
Then divide the digits in two parts, recursively until there is not enough digits to form a LargeInteger.

Current algorithm avoid digits string duplication in memory and avoid calculating expected number of digits required to form a LargeInteger (should be calculated for any base, not only 10).
related to 0006976closed nicolas cellier [ENH] Speed-up SqNumberParser 
 SqNumberParser-LargeInteger-speedUp-M7174-nice.1.cs [^] (3,984 bytes) 08-31-08 21:04 [^] (1,274 bytes) 09-02-08 10:31

nicolas cellier   
08-31-08 21:13   
Current performances, 32 bit Athlon 1GHz, with some patches installed (0006976 0007109 0007113 0007172 0007174 and 0006982 which makes readFrom: employ SqNumberParser, thus column 2 and 3 of results is the same see 0006976 and 0007172 for comparison):

{Float pi printString.
20 factorial printString} collect: [:e | e->
    {Time millisecondsToRun: [10000 timesRepeat: [e readStream nextFloat]].
    Time millisecondsToRun: [10000 timesRepeat: [SqNumberParser parse: e readStream]].
    Time millisecondsToRun: [10000 timesRepeat: [Number readFrom: e readStream]].}].

{'3.141592653589793'->#(1024 1478 1484) .
 '0.0'->#(88 240 243) .
 '0.0001'->#(110 438 445) .
 '1.2000000'->#(133 434 438) .
 '3.14159'->#(118 459 466) .
 '-3.14159'->#(119 467 468) .
 '314159.276'->#(133 489 496) .
 '3.14159e10'->#(214 465 466) .
 '3.14159e-10'->#(228 817 813) .
 'NaN'->#(93 192 199) .
 '-Infinity'->#(131 290 294) .
 '2432902008176640000'->#(1005 541 539)}

Also 500 factorial is parsed in 5ms instead of 18ms (with 0006976) or 22ms (without).

09-02-08 09:27   

your suggestion on faster elementary number parsing is a very good one, reminds me of Donald E. Knuth's #gcd: improvements :)

No offense but I'm not exited about #skip: negative programming, even so that is common and used in so many routines by Parser & friends. The reason is, that #peek can indeed be done by every possible stream (buffering on thing ahead) but #skip: negative is out of reach for very fast streams in principle (example: radio transmission) and *elementary* parsing should *not* assume its existence IMHO.

A one-method concise number parser is attached which may be of interest for your work; it is based on genius SmallWorld/LittleSmalltalk number parsing; the stream useage and other mistakes where added by me.

The routine is faster than Squeak's Number>>#readFrom: on #pi and consumes very little resources (greetings from LittleSmalltalk :) it can be used on any 'ddd{.ddd}' part of a nontrivial number ()r()e()s and leaves the stream positioned precisely and without any #skip: negative, neither by itself nor by its caller.
09-02-08 09:42   
Oops, I think I forgot to use #digitValue ;)
nicolas cellier   
09-02-08 19:21   
Yes, thanks, that is interesting.
I made some version for non positionnable streams too once.
But I set lower requirements in this case.

There are certainly some skip: -1 than can be replaced by peek along the code.
But for example:
    (SqNumberParser parse: '1.20east,42.15north' readStream)
will leave the stream position before the $e.
And that makes it mandatory to peek two characters ahead.
That's the reason why I used skip:.
The radix/scale/exponent stuff r/s/e makes it harder.

Also note that answering nearest floating point value comes at a small cost, which also makes SqNumberParser less efficient than your version or Andreas one.
Intuitive algorithm:
    ^answer asFloat / denominator.
potentially cumulates 3 rounding errors (2 asFloat conversions can loose bits if answer or denominator highBit > 53 and Float division is also inexact).
You should use:
   ^(Fraction numerator: answer denominator: denominator) asFloat.
which both cares to do one nearest Float rounding since 3.9,
and avoid cost of gcd: unlike (answer / denominator) asFloat.
Also, if an exponent is following, your asFloat conversion is premature and will result in lost accuracy again when the exponent is processed... Of course your application might not care about a few bits off. But a general framework should IMO. Look what every other language in the planet does.

Then the code you proposed does not deal with base > 10 as I understand it, yet another requirement for SqNumberParser.

Last, i did not inline all sends in the tight loops in SqNumberParser and lost some performance again. If feel this task should be delegated to Exupery.
I'm more interested in the algorithmic part, and my aim was LargeIntegers indeed (like 2000 factorial printString). Your version cannot read such monster efficiently. If SqNumberParser can do it, without spoiling smaller number too much (and it does not), I'm happy.

I would say SqNumberParser could be subclassed with a relaxed position requirement, then the subclass could use peek only for some application.

Alternatively, the non positionnable stream could be wrapped/buffered in a positionnable one (like read line by line or up to something else, depending on application requirements).

The NumberParser does not need a random access to the stream like offered by skip:.
The idea would be to implement a MultiCharPeekableWrapperStream that would buffer multiple characters and use the buffer at next next
    ^buffer atEnd
        ifTrue: [sourceStream next]
        ifFalse: [buffer next]
peek: nChar
   | available |
    available := buffer size - buffer position.
    available < nChar ifTrue: [buffer nextPutAll: (sourceStream nextAvailable: nChar - available)].
    ^buffer nextAvailable: nChar

Above method is not well written at all, but you get the idea.
A lot of grammar require very few characters ahead, I think that can be usefull.
Implementation of peek: nChar in positionnable Stream is trivial.
nicolas cellier   
09-02-08 19:33   
My implementation of peek: is stupid :(
Better answer the nth char ahead, rahter than a collection of n chars!

peek: n
    buffer size < n ifTrue: [buffer addAll: (sourceStream nextAvailable: buffer size - n)].
    ^buffer at: n "ifAbsent: [nil] if you do not like an error atEnd"

    ^buffer isEmpty
        ifTrue: [sourceStream next]
        ifFalse: [buffer removeFirst]

An <OrderedCollection> buffer would do the job well.
nicolas cellier   
09-02-08 19:43   
"fix begin"
Installer mantis bug: 7174 fix: 'SqNumberParser-LargeInteger-speedUp-M7174-nice.1.cs'.
"fix test"
"fix end"
nicolas cellier   
08-24-09 20:19   
Fixed in [^]