Clarion Challenge String Parser Final Results

Published 1999-07-27    Printer-friendly version

In a previous issue Clarion Magazine issued a challenge to Clarion developers to write some object-oriented code to parse strings. The five respondents listed in Table 1 made it through to the final evaluation, and Carl Barnes submitted the winning entry. All timings were done through 10,000 iterations of the test strings on a P233 laptop running NT4 SP3, in 32 bit, with debug off.

Name Test 1 Test 2
Carl Barnes 7.1 8.7
Phil Will 10.3 n/a
Gordon Smith 11.0 14.6
Jesper Lorentzen and Maarten Bijl 12.9 13.6
Chris Hargett 14.7 n/a

Test 1 was included in the example application provided to all participants, and simply consisted of an English phrase which would be parsed by code written by the participants. The test then alternated the words between upper and lower case and wrote them back to the string. The code which performed the test is shown in Listing 1.

Listing 1. The first test (English text).
ParserBaseClass.Test                    procedure
TempString  string(200)
x           long
   code
   self.Reset()
   self.AddDelimiter(' ')
   self.SetString('This is the test string, which should have '|
      & 'its words alternating between upper case and lower '|
      & 'case. The actual test will parse Clarion code and '|
      & 'capitalize keywords.')
   self.BeforeTest()
   loop x = 1 to self.GetTokenCount()
      TempString = self.GetToken(x)
      if x % 2
         TempString = upper(TempString)
      else
         TempString = lower(TempString)
      end
      self.PutToken(x,TempString)
   end
   return

The Test method in Listing 1 is straightforward. The Reset method clears any current text in the parser and removes any existing list of delimiters (which are strings used to separate words or "tokens"). The test then adds a single space character delimiter and sets the string the parser will parse.

The BeforeTest method is a placeholder virtual method which allows participants to call their own code from within the test method, much the way embed code is added to a Clarion application. Typically this is where the string is actually parsed. The Test method then loops through the string's tokens and alternately sets the case to upper or lower.

As I indicated in the initial challenge, there was a second test which the contestants did not receive, and which involved parsing Clarion code. That test is shown in Listing 2.

>Listing 2. The second test (Clarion code).
ParserBaseClass.Test2                    procedure
TempString  string(200)
x           long
   code
   self.Reset()
   self.AddDelimiter(' ')
   self.AddDelimiter('.')
   self.AddDelimiter('&')
   self.AddDelimiter('(')
   self.AddDelimiter(')')
   self.AddDelimiter('=')
   self.AddDelimiter(',')
   self.AddDelimiter('-')
   self.AddDelimiter('+')
   self.SetString('self.Text=sub(self.Text,1,self.TokenQ.'|
     & 'Start-1)&clip(Text)&sub(self.Text,self.TokenQ.'|
     & 'Finish+1,Len(Self.Text))')
   self.BeforeTest()
   loop x = 1 to self.GetTokenCount()
      TempString = self.GetToken(x)
      case upper(TempString)
      of 'SELF'
      orof 'SUB'
      orof 'CLIP'
      orof 'LEN'
         TempString = upper(TempString)
         self.PutToken(x,TempString)
      end
   end
   return

Test2 follows the same approach as Test but adds additional delimiters to enable the parser to pick out Clarion keywords. The test string (obtained from Jesper Lorentzen and Maarten Bijl's entry) contains four different keywords, so a simple Case statement is sufficient is all that's needed. Interestingly there are also no spaces used as delimiters, which means that no "lucky" parsing can happen (i.e. the parser detects the string (sub and the test code changes this to (SUB, with the UPPER having no effect on the parenthesis character.)

Although all the entries passed the first test, two had problems with the second, and those times aren't shown.

On the whole the difference between the fastest and slowest code can't be considered order-of-magnitude dramatic, but there were considerable differences in the participants' implementations. Chris Hargett and Phil Will both parsed the string on the fly, while the other three entries used the BeforeTest method to do a one-time parse of the string. There were some novel approaches to storing the delimiter data, and a number of variations possible on the implementation of PutToken. As Carl Barnes noted the ability to change a token's size complicates the matter somewhat. I declined to make this part of the test.

My thanks to all of the participants. This one's been somewhat grueling because of the complexity of the requirement and the amount of code that had to be written. As a result future challenges will revert to the original goal of not requiring more than a few lines of code, if still a bit of thought.

Download the source

Printer-friendly version

Reader Comments

To add a comment to this article you must log in.

 
 

Search

 

Advanced Search
Topical Index

Related Articles

Subscribe to
ClarionMag

One year: $169

(includes all back issues since '99)

Renewals from $119

Two years: $269

Renewals from $219

More Info

Subscribe Now!

ClarionMag Blog

RSS Feeds

Updates via Email

Enter your Email


Powered by FeedBlitz

Quick Links