![]() |
|
Published 2002-10-17 Printer-friendly version
My thanks to all who entered. The challenge, while simple enough (thankfully, as any complex challenge would be horrendous to judge), threw up the usual confusions! In particular, many entrants were uncertain about the characters to ignore (the lesson learned was to write a more complete spec, and keep meddling editors away!). To this end any entry which failed the initial tests (and arrived before the deadline) was permitted one resubmission (my apologies to Ladrière Xavier whose email bounced and thus didn't get the second chance).
Okay, on to the challenge. At its core the problem can be summed up as "Does 'xyz' contain ALL the chars 'abc'." The actual challenge was more specific (to ignore all chars except "a-z, A-Z, 0-9"), but the heart of the problem remains the same.
One of the more popular measures of algorithms is the "big-oh" notation ("O").
Definition: A theoretical measure of the execution of an algorithm, usually the time or memory needed, given the problem size n, which is usually the number of items.
For example: For some function f, taking
n as its input, the equation f(n) =
O(g(n)) would mean that it take less than some constant
g times n. This is expressed as "f
of n is big oh of g of n". The following procedure adheres
to this example:
getLeapCount procedure(ulong theDate)
code
loop i = 1 to theDate
!do the magic
end
return ...
Here "do the magic" will be some const value g and
will execute n times. One nice thing about the "O"
notation is that it says to just ignore the trivial stuff and focus
on the main "decelerator" of the algorithm (a loop for
example).
The most common examples are (in order of speed)
O(1) - Not dependant to any variable number, i.e.
a constant timeO(log N) - A binary chop is an example of this.
Remember the old game - pick a number between 1-100, then as the
one person guesses the other says higher or lower, so getting to 84
would look like 50-higher, 75-higher, 87-lower, 81-higher, 84 - you
got it!O(N) - Proportional to the size of N
(this would include a loop to N, also loops to any
constant multiples of N (0.5N, 2N, 100N)). So if you
double N the function would take twice as long.O(N log N) - Similar to the binary chop, but also
doing N activities for each iteration (a merge sort
for example). O(N^2) - Proportional to N*N. So if
you double N, the function would take four times
longer. Two nested loops dependent on N would yield
this result. O(N^3), O(N^4),..., O(N^n) - (triple/quad
nested loops etc.).O(N!) - which is a function which slows down
exponentially on the size of N.The above of course are generalities. It is easy to code a
O(N) function which is slower than a
O(N^2) function. In fact the challenge showed this
nicely - anyone who used a QUEUE came last (even when
their algorithms were more "efficient") - the analogy being why
would you use your car to get from your bed to the toilet?
NOTE: For an expanded discussion of the various equations see Mark Dunlop's " A Rough Guide To Big-Oh Notation."
Right, time to look at the entries. Since the
CompareMatch function takes two main parameters,
txt and chars, I will be using
N and M to represent them. Basically all
the entries fell into two categories, the O(N+M) and
the O(NM). Since LOWER,
UPPER, INSTRING, memchr, and
CLIP are all O(N) functions, the
following typicalcode falls into the O(NM) category
(bad):
loop i = 1 to len(clip(chars))
if not instring(choose(noCase = true, lower(chars[i]), |
chars[i]), choose(noCase = true, lower(txt), txt), 1, 1)
return false
end
end
return true
To prove the point I took the best O(N+M) solution
and the best O(NM) solution and put them head to head
on some small numbers:
|
N |
M |
Winner |
|
1 |
1 |
O(N+M) |
|
2 |
1 |
O(N+M) |
|
1 |
2 |
O(N+M) |
|
2 |
2 |
O(N+M) |
The test was a bit disappointing as I was hoping the
O(NM) solution would be quicker for the very small
cases; the results where, however, very, very close!
It's also worth noting that the results where very different in debug mode versus release mode; naturally the final results are based on release mode.
The following results only include functions that passed the test (see the source code for a complete list). The scores are the percentage of the speed of the overall winner:
Nothing wrong with Johns approach, it was basically an
O(N+M) solution, but he used a QUEUE to
store his character hits, had he used an array, he would have been
easily in the top half.
Matt's approach was very nearly a O(N+M) but ended
up a O(NL + ML), where L was a list of valid chars
(note the double loop because of the inner
INSTRING):
...
StringLen# = len(clip(P:Chars))
loop char# = 1 to StringLen#
SingleChar = choose(P:Nocase=True,|
upper(P:Chars[ char# : char#]),|
P:Chars[ char# : char#])
CharPos# = instring(SingleChar,MasterMap,1,1)
if CharPos#
SrchMap[CharPos# : CharPos#] = SingleChar
.
...
Ville used a classic O(NM) solution:
...
Loop x# = 1 To Len(Tempchars)
Loop y# = 1 To Len(Temptxt)
If Val(Tempchars[x#]) = Val(Temptxt[y#])
RetVal = RetVal + 1
break
End
End !second loop
End !first loop
...
Stephen (not unlike Matt) had a O(N+M) solution,
but used an inefficient test for invalid chars and ended up a
O(L + N + M), where L was in fact three
loops nested inside a 255 loop (INRANGE is a loop
after all), thus pushing his solution down to this position:
...
loop Counter = 1 to 255
if not (inrange(counter,48,57) or |
inrange(counter,65,90) or |
inrange(counter,97,122))
CharArray[Counter] = 2
.
.
...
Brice's solution was, to me, the obvious solution, and the one I
thought I would have seen more of. While small and simple, Brice's
code ended up being in the O(ML + MN + MM + MN)
category (due to two inner INSTRINGs, and two inner
UPPERs). The UPPERs could have been
easily moved outside the loop:
...
LOOP myPos = 1 TO LEN(xChars)
myChar = xChars[myPos]
IF NOT INSTRING(myChar, myQualifiers, 1, 1) Then CYCLE.
IF xNoCase Then
myChar = UPPER(myChar)
xText = UPPER(xText)
END!_IF
IF NOT INSTRING(myChar, xText, 1, 1) Then RETURN FALSE.
END!_LOOP
RETURN TRUE
...
Jeff's approach was the same as Brice's, without the two inner
UPPERs (look at the difference that made) - this made
it a O(ML + MN) solution.
Lew's code was the same as Jeff's, but without the fancy and
unnecessary NEWing of strings.
The code is getting fancy now! This is the first entry using the
clib function memchr, which is basically a variant of
INSTRING but only finds a single char in an array of
chars. Geoff's code is a O(NM) solution, but probably
slower than the other memchr solutions due to the use
of the CASE string code. CASE is
notoriously slow with strings (although fine with numbers):
...
LOOP i = 1 TO L:LenChars
TestChar = chars[i]
CASE TestChar
OF 'a' TO 'z' OROF 'A' TO 'Z' OROF '0' TO '9'
if ~MemChr(address(txt),Char2,L:LenTxt) |
then return FALSE. ! exit if no match
END !case
END !loop
...
Brian supplied a O(NM) solution, but the second
fastest at that, basically because it is using a sensible
conditional to filter out the unwanted chars:
...
loc:byte = loc:charsbyte[loc:charsndx]
if loc:byte < 48 |
or (loc:byte > 57 and loc:byte < 65) |
or (loc:byte > 90 and loc:byte < 97) |
or loc:byte > 122
cycle
end
...
Carl had two entries, a O(NM) and a
O(N+M). I included both (this is the
O(NM)) as Carl had stated a preference for the
O(NM) because it was faster in his tests, and indeed
it is the fastest O(NM) solution in this group. I
disagree with Carl as his O(N+M) was faster in my test
cases.
Phil had two entries, nearly identical. I included them both as
they produced a 5% difference (and two places in the test). This
one is a nice O(N+M) solution, local vars do
not have the AUTO attribute and there is one
extra use of VAL. I suspect, and maybe Phil will (no
pun intended) validate, that it is the lack of AUTO
that made the difference.
Mark's approach at first glance looked expensive to me due to the following code:
...
loop CurrByte = 0 to Val('0') - 1
TxtContains[ CurrByte ] = 1
end
loop CurrByte = Val('9') + 1 to Val('A') - 1
TxtContains[ CurrByte ] = 1
end
loop CurrByte = Val('Z') + 1 to Val('a') - 1
TxtContains[ CurrByte ] = 1
end
loop CurrByte = Val('z') + 1 to maximum(TxtContains,1)
TxtContains[ CurrByte ] = 1
end
..
But if you inspect the above closely you will find it is a <
192 iteration loop and an efficient one at that, making his
procedure a O(L + N+M) procedure, where L
was small.
Carl is back with his O(N+M) solution!
Phil is back with his "clean" O(M+N) version. This
version also gets my stamp of approval on readability, as it very
clearly does what it does (apart from the use of "1" for true and
uppercased keywords - but hey, the latter is subjective):
...
IF noCase
txt=UPPER(txt)
chars=UPPER(chars)
END
CLEAR(TxtArray)
! Load Text Array (ignores 0)
LOOP I=1 TO LEN(CLIP(Txt))
TxtArray[Val(Txt[I])]=TRUE
END
! Check VAL
Found=1
LOOP I=1 TO LEN(CLIP(Chars))
ThisChar=VAL(Chars[I])
CASE ThisChar
OF eValZero TO eValNine |
OROF eValA TO eValZ |
OROF eVala_ TO eValz_
IF TxtArray[ThisChar] THEN CYCLE.
Found=FALSE
BREAK
END
END
RETURN FOUND
...
Drum roll please...
Adriaan's entry (a O(M+N) solution obviously) is
over 8 % faster than his nearest rival:
ReturnValue BYTE(TRUE)
cArray BYTE,DIM(256)
c LONG,AUTO
CODE
! loop 1: mark usage of all characters in txt string in cArray
LOOP c = 1 TO LEN(txt)
IF noCase THEN
! check if char is between 'a' and 'z'
! convert to uppercase and mark usage
IF VAL(txt[c]) >= 97 AND VAL(txt[c]) <= 122 THEN
cArray[VAL(txt[c]) - 32] = TRUE
ELSE
cArray[VAL(txt[c])] = TRUE
END
ELSE
cArray[VAL(txt[c])] = TRUE
END
END
! loop 2: check usage of chars that
! need to exist in txt string
! if a char is not used then set returnvalue
! to false and break out of loop
LOOP c = 1 TO LEN(chars)
! filter out all chars that are not
! between 'A' to 'Z', 'a' to 'z'
! and '0' to '9'
IF (VAL(chars[c]) >= 97 AND VAL(chars[c]) <= 122) OR |
(VAL(chars[c]) >= 65 AND VAL(chars[c]) <= 90) OR |
(VAL(chars[c]) >= 48 AND VAL(chars[c]) <= 57) THEN
IF noCase THEN
IF VAL(chars[c]) >= 97 AND VAL(chars[c]) <= 122 THEN
IF NOT cArray[VAL(chars[c]) - 32] THEN
ReturnValue = FALSE
BREAK
END
ELSIF NOT cArray[VAL(chars[c])] THEN
ReturnValue = FALSE
BREAK
END
ELSIF NOT cArray[VAL(chars[c])] THEN
ReturnValue = FALSE
BREAK
END
END
END
RETURN ReturnValue
Why is Adriaan's code so fast? There are two main reasons:
LEN and
VAL are the only functions called. When code is this
tight, calling other procedures becomes an issue.IF statements: Adriaan tells the compiler
exactly what to do. It might produce more machine code for each
path, but each path does exactly one thing.Summary
I hope these entries have provided some new ideas on how to keep
your utility libraries small and fast. It is always worthwhile
looking at any given algorithm and seeing if you can change it into
a better "big-oh" category. One of the problems with a closed
runtime like c55runx is that you can only guess as to the
efficiency of a given library function. It's interesting to see if
you could improve on, say, CLIP or
INSTRING.
Now that the generic benchmarker is in place it may be a good platform for similar exercises! I would like to propose a series of challenges with the ultimate goal of building up a library of utility functions for all to use (I would be keen to see various "container" classes). To this end I would suggest that any future challenge would be permitted to use any available functionality within this library.
Prior to joining TopSpeed Development Centre, Gordon Smith worked for an Irish company developing software for multi-national pharmaceutical companies. He was also a member of the 3rd party accessories program (Compile Manager 2) and developed the Clarion Class Browser. He is currently working with several of the original TopSpeed Developers at Seisint Inc..
|
Posted on Thursday, October 17, 2002 by David LeYanna I love learning. Thanks. Lets see a series.
Posted on Friday, October 18, 2002 by Geoff Robinson Thanks Gordon for the analysis!
Posted on Friday, October 18, 2002 by Brice Schagane Interesting. You indicate that the fewer calls into the c55runx.dll will increase performance. Does this hold true, regardless of how the program is compiled? What is the difference between compiling as LOCAL and STANDALONE? Posted on Friday, October 18, 2002 by Gordon Smith GeoffR,
Posted on Friday, October 18, 2002 by Gordon Smith Brice Schagane,
Posted on Saturday, October 19, 2002 by Brice Thanks for clearing that up. I was always under the impression that the use of dll's caused a significant performance hit. For that reason, I try to stay away from them. Posted on Monday, October 21, 2002 by Adriaan Ieperen Wow, i actually won...
Posted on Monday, October 21, 2002 by Phil Will My final entry missed the deadline, but is just under 20% faster. It uses a data declaration and array for upper and lower evaluation. This is slightly faster the isUpper and isLower. Comparisons are done only if nocase; it then uppers or lowers depending on the initial value.
Posted on Tuesday, October 22, 2002 by Adriaan Ieperen I took another look at my procedure and it turns out that it isn't that good. With a bit of work it can be made to blow the old procedure out of the water.
Posted on Tuesday, October 22, 2002 by Geoff Robinson Adriaan
Posted on Wednesday, October 23, 2002 by Adriaan Ieperen Geoff you're absolutely right. While experimenting with the procedure i completely missed that. I wonder how much difference a case statement will make compared to an if then else structure.
Posted on Tuesday, May 31, 2005 by Carl Barnes "A Rough Guide To Big-Oh Notation." is no long there, he has a new article "Amorphous guide to Big-Oh Notation" http://www.cis.strath.ac.uk/~mdd/teaching/ac/big_oh.html
|
To add a comment to this article you must log in.
Copyright © 1999-2008 by CoveComm Inc. All Rights Reserved. Reproduction in any form without the express written consent of CoveComm Inc., except as described in the subscription agreement, is prohibited.
Clarion Magazine ISSN 1718-9942
One year: $189
(includes all back issues since '99)
Renewals from $139
Two years: $289
Renewals from $239