![]() |
|
Published 1999-07-27 Printer-friendly version
In a previous issue Clarion Magazine issued a challenge to Clarion developers to write some object-oriented code to parse strings. The five respondents listed in Table 1 made it through to the final evaluation, and Carl Barnes submitted the winning entry. All timings were done through 10,000 iterations of the test strings on a P233 laptop running NT4 SP3, in 32 bit, with debug off.
| Name | Test 1 | Test 2 |
| Carl Barnes | 7.1 | 8.7 |
| Phil Will | 10.3 | n/a |
| Gordon Smith | 11.0 | 14.6 |
| Jesper Lorentzen and Maarten Bijl | 12.9 | 13.6 |
| Chris Hargett | 14.7 | n/a |
Test 1 was included in the example application provided to all participants, and simply consisted of an English phrase which would be parsed by code written by the participants. The test then alternated the words between upper and lower case and wrote them back to the string. The code which performed the test is shown in Listing 1.
ParserBaseClass.Test procedure
TempString string(200)
x long
code
self.Reset()
self.AddDelimiter(' ')
self.SetString('This is the test string, which should have '|
& 'its words alternating between upper case and lower '|
& 'case. The actual test will parse Clarion code and '|
& 'capitalize keywords.')
self.BeforeTest()
loop x = 1 to self.GetTokenCount()
TempString = self.GetToken(x)
if x % 2
TempString = upper(TempString)
else
TempString = lower(TempString)
end
self.PutToken(x,TempString)
end
return
The Test method in Listing 1 is straightforward. The
Reset method clears any current text in the parser and
removes any existing list of delimiters (which are strings used to
separate words or "tokens"). The test then adds a single space
character delimiter and sets the string the parser will parse.
The BeforeTest method is a placeholder virtual
method which allows participants to call their own code from within
the test method, much the way embed code is added to a Clarion
application. Typically this is where the string is actually parsed.
The Test method then loops through the string's
tokens and alternately sets the case to upper or lower.
As I indicated in the initial challenge, there was a second test which the contestants did not receive, and which involved parsing Clarion code. That test is shown in Listing 2.
ParserBaseClass.Test2 procedure
TempString string(200)
x long
code
self.Reset()
self.AddDelimiter(' ')
self.AddDelimiter('.')
self.AddDelimiter('&')
self.AddDelimiter('(')
self.AddDelimiter(')')
self.AddDelimiter('=')
self.AddDelimiter(',')
self.AddDelimiter('-')
self.AddDelimiter('+')
self.SetString('self.Text=sub(self.Text,1,self.TokenQ.'|
& 'Start-1)&clip(Text)&sub(self.Text,self.TokenQ.'|
& 'Finish+1,Len(Self.Text))')
self.BeforeTest()
loop x = 1 to self.GetTokenCount()
TempString = self.GetToken(x)
case upper(TempString)
of 'SELF'
orof 'SUB'
orof 'CLIP'
orof 'LEN'
TempString = upper(TempString)
self.PutToken(x,TempString)
end
end
return
Test2 follows the same approach as Test but adds
additional delimiters to enable the parser to pick out Clarion
keywords. The test string (obtained from Jesper Lorentzen and
Maarten Bijl's entry) contains four different keywords, so a
simple Case statement is sufficient is all that's
needed. Interestingly there are also no spaces used as delimiters,
which means that no "lucky" parsing can happen (i.e. the parser
detects the string (sub and the test code changes this
to (SUB, with the UPPER having no effect
on the parenthesis character.)
On the whole the difference between the fastest and slowest code
can't be considered order-of-magnitude dramatic, but there
were considerable differences in the participants'
implementations. Chris Hargett and Phil Will both parsed the string
on the fly, while the other three entries used the
BeforeTest method to do a one-time parse of the
string. There were some novel approaches to storing the delimiter
data, and a number of variations possible on the implementation of
PutToken. As Carl Barnes noted the ability to change a
token's size complicates the matter somewhat. I declined to
make this part of the test.
My thanks to all of the participants. This one's been somewhat grueling because of the complexity of the requirement and the amount of code that had to be written. As a result future challenges will revert to the original goal of not requiring more than a few lines of code, if still a bit of thought.
Copyright © 1999-2009 by CoveComm Inc. All Rights Reserved. Reproduction in any form without the express written consent of CoveComm Inc., except as described in the subscription agreement, is prohibited.
Clarion Magazine ISSN 1718-9942
One year: $169
(includes all back issues since '99)
Renewals from $119
Two years: $269
Renewals from $219