![]() |
|
Published 1999-04-12 Printer-friendly version
When I sent around the initial design proposal for what later became the ABC system one of the claimed benefits was a code reduction in user procedures of around half to two thirds. At the time this was viewed with some skepticism and so we set 30 percent as a reasonable goal. In the end we actually achieved around 92-94 percent, and the field class was chiefly responsible for the extra 30 percent.
To appreciate why, you need to consider how certain parts of the
ABC system would be coded if the FieldClass
didn't exist. For the sake of concreteness I am going to use
the example I used many moons ago when I was trying to persuade Tom
Moseley that OOP could work in the templates.
One constant bugbear in CW2 was overflow of the appname_ru (referential integrity, or RI) module (to >64K) on any sizeable or complex dictionary. One aim of OOP was to reduce this problem. Although there were other procedures the heart of the referential integrity can be shown by pseudo-code for the RI update function.
Listing 1 assumes F1 & F2 are the related files (on the keys F1:K & F2:K with 2 & 3 components respectively, KC1, KC2, KC3 etc).
CLEAR(F2:KC3,-1) ! Clear minor-most components
F2:KC2 = F1:KC2
F2:KC1 = F1:KC1
SET(F2:K,F2:K)
LOOP
NEXT(F2)
IF F2:KC1 <> F1:KC1 OR F2:KC2 <> F1:KC2 THEN
BREAK ! No longer meeting range
END
F2:KC2 = New:F1:KC2
F2:KC1 = New:F1:KC1
Cascade_Updates
END
I have left out several vital details but this is enough to show the nature of the problem. This code fragment (and every bit of file IO/error handling that goes with it) appears for every relation with RI restrictions on it. If you have 100 files, 250 relations means 250 copies of the code. It's not too surprising RI code frequently blows the segment limits in the legacy templates.
The challenge is to "proceduralize" the above code.
When trying to abstract out an algorithm I like to go through the code colouring the lines. I use three colours: blue for base classes; green for parameterized base classes; black for instance specific code. In my first pass over the code I just pick out the blue stuff: those lines of code which will always be the same no-matter which of the 250 copies of the code I am looking at.
CLEAR(F2:KC3,-1) ! Clear minor-most components
F2:KC2 = F1:KC2
F2:KC1 = F1:KC1
SET(F2:K,F2:K)
LOOP
NEXT(F2)
IF F2:KC1 <> F1:KC1 OR F2:KC2 <> F1:KC2 THEN
BREAK ! No longer meeting range
END
F2:KC2 = New:F1:KC2
F2:KC1 = New:F1:KC1
Cascade_Updates
END
This ranks as grim. The vast bulk of the code is actually different between the 250 copies. You could put the loop in the base class and call out for the header and loop body, but the total lines of code (once you've allowed for two new procedures) actually goes up. Listing 3 shows five lines of code that replace the three blue lines.
RelationClass.UpdateSecondary PROCEDURE CODE ! Virtual call- override for every relation SELF.UpdateSecondaryInit LOOP IF SELF.UpdateSecondaryIterate THEN BREAK . END
Perhaps the green pen will yield better results. This time I can colour lines with variables provided I then add a parameter to the procedure prototype to allow the value to be substituted.
CLEAR(F2:KC3,-1) ! Clear minor-most components F2:KC2 = F1:KC2 F2:KC1 = F1:KC1 SET(F2:K,F2:K) LOOP NEXT(F2) IF F2:KC1 <> F1:KC1 OR F2:KC2 <> F1:KC2 THEN BREAK ! No longer meeting range END F2:KC2 = New:F1:KC2 F2:KC1 = New:F1:KC1 Cascade_UpdatesEND
There is a subtlety here. Why didn't I colour the
CLEAR? Because the number of components to be cleared
is not a function of the relation, it is a function of the key used
by the secondary file. Thus you cannot readily parameterize it. So
now you get the code shown in Listing 5.
RelationClass.UpdateSecondary PROCEDURE(
File F, Key K, *? F1Field1,*? F1Field2,
*? F2Field1, *?F2Field2, ? New1, ?New2)
CODE
SELF.UpdateSecondaryClear ! Virtual call- override for every relation
F2Field1 = F1Field1
F2Field2 = F1Field2
LOOP
NEXT(F)
IF F2Field1 <> F1Field1 OR F2Field2<>F1Field1 THEN
BREAK
END
F2Field2 = New2
F2Field1 = New1
SELF.Cascade ! Virtual
END
Now each of the 250 code lumps becomes two small virtual procedures and one base class call (with 8 parameters). This cuts down the line count although the actual amount of code generated is still quite high. Each *? Parameters costs 30+ bytes, so 6 of them is 180 bytes. I have also snuck a little bug past you. I have been assuming throughout that there are two linking fields. There can (of course) be 1, or 3 or 4 etc. So you need copies of the UpdateSecondary procedure (and the other four) for each possible number of pairs of fields.
Now I have greatly shrunk the code (i.e. there will be no more 64K problems) and just about everything has been abstracted. Every now and then someone will call to complain that ABC doesn't support 9 linking fields in a relation and we can simply write a new UpdateSecondary9 (with 27 *? Parameters at 600+ bytes per call).
But code abstraction doesn't have to end here. This design has UpdateSecondary1, UpdateSecondary2 etc and these procedures are really the same except in the number of parameters passed in. You can write a generalized UpdateSecondary procedure (except it won't compile!) as shown in Listing 6.
RelationClass.UpdateSecondaryN PROCEDURE(
File F, Key K, (*? F1Field1, *? F2Field1, ? New1) * N)
CODE
SELF.UpdateSecondaryClear ! Virtual call- override for every relation
LOOP N Times
F2FieldN = F1FieldN
END
LOOP
NEXT(F)
LOOP N Times
IF F2FieldN <> F1FieldN THEN BREAK OuterLoop.
END
LOOP N Times
F2FieldN = NewN
END
SELF.Cascade ! Virtual
END
Listing 6 won't compile because that isn't a legal procedure prototype. But you can see the idea - I want to be able to pass in any number of fields without having to define the fields ahead of time.
Another place that presented problems was the browse code. Most of the engine can disappear into a procedure (Bruce actually did this for CDD3.0). However there are three very large routines you cannot take down: filling a browse queue from data; filling the record buffer from the browse; and seeing if any data in the browse queue has changed. These three routines essentially look like this :
BrowseQ:Field1 = File:Field1 BrowseQ:Field2 = File:Field2 BrowseQ:Field3 = OtherFile:Field7
where "fill buffer" goes the opposite way to "reset buffer."
Easy you say, that's the same as parameterizing. But think
about it! Restricting the number of linking fields to 9 is one
thing, but the number of browse columns? We would have to go up to
100 just to avoid getting shot by the alpha testers! On the other
hand if only I could get the LOOP N Times code
from Listing 6 to compile then this really would be so easy.
(Some of you may think we could use the :=: syntax
to move across the corresponding fields. In general that
doesn't work because it doesn't allow for browse columns
defined by local variables. It also suffers if you have two files
in the browse with clashing field names).
In my opinion it was this problem that killed the CDD browse engine. Because the engine had to call back to the main code so frequently to do almost anything (and they didn't have the virtual mechanism to clean things up) the code because almost impenetrable. So the engine died, the inline browse appeared, and the browse procedure became our main bugbear for over five years
What's The Real Requirement?
The job then becomes one of defining what it actually is about
the LOOP N Times code that will solve the problem. I
think it comes down to the following :
I need to be able to pass around a list of one or more field pairs which can then be manipulated as a single entity.
Think about those last two words; they are the key. If I can
embody the LOOP N Times into a single line of code
then I have the problem cracked.
My expression field pairs also betrays another
consideration. In the browse case there are only ever two fields
that are really interesting; for the RI code there are three
interesting values (child fields, parent fields, new parent
fields). The prototype for the UpdateSecondary is also
interesting. Note that the fields pertaining to the files are
prototyped as *?, meaning they can be assigned to and from. The new
fields are only ever used by value. It turns out that (in this
example, at least) there are typically 4 different cases :
CLEAR(keycomponent) into the base class
as well!Because the fourth case is much heavier than the others (although related) we decided to assign it to its own class which is derived from the field pairs class
The Implementation - Any Ideas?
In order to understand how this class works you certainly need
to understand queues but you also need to understand the
ANY datatype. This is given an excellent coverage in
the manuals which I shan't repeat. However, the key here is
this: an ANY can act like a *? parameter
OR a ? parameter dependant upon how you assign to
it.
Specifically,if an ANY variable
is NULL (has no value) then a straight value
assignment to it produces a value ANY, while a
reference assignment to it produces a variant any. Listing 7 shows
an example.
MyAny &= NULL Field = 22 MyAny = Field ! MyAny = 22 Field = 42 ! MyAny = 22 MyAny = 50 ! Field = 42, MyAny = 50 MyAny &= Field ! MyAny = 42 Field = 62 ! MyAny = 62 MyAny = 72 ! Field = 72
Warning: CLEAR(MyAny) is equivalent to
MyAny = 0. It is NOT the same as MyAny &=
NULL.
FieldPairsClass.Init PROCEDURE
The Init procedure is simple enough to use. It
creates the queue that forms the basis of the class. A slight
oddity is the call to Kill first. This is to allow a
FieldPairsClass to be used and reused within a
procedure. (Effectively Init acts as a
Reset.)
FieldPairsClass.AddItem PROCEDURE(*? Left)
There are two notional AddItem methods (the second
called AddPair). This one is used for cases 1 & 2.
Note the ASSERT to insure Init has been
called. The CLEAR is dealing with some (rather nasty)
memory management issues when dealing with ANY in
queues (see the manual). The incoming variable is
&= into the left hand queue element. It is then
= into the right hand element. This distinction is
crucial (see above). It means that simply
AddIteming a field is enough to snap-shot it so that
it can be reset (or tested for difference) at a later stage. The
parameter is called Left because you can think of it as something
you can assign into (and which therefore appears on the left hand
side of an assignment (=) operator.
FieldPairsClass.AddPair PROCEDURE(*? Left,*? Right)
This method is used for variant 3. Other comments are the same
as AddItem. Note also that in this case left &
right do not have any real significance. it is just a
non-suggestive way of labeling the two entities.
FieldPairsClass.AssignLeftToRight PROCEDURE
This procedure is really meaningless in variant one (actually it converts a variant one into a variant two). In variant 2 this can be seen as a way of snapshotting the current values of all the variables. In variant 3 all the values from the variables passed in as 'lefts' will be copied into the variables passed in as 'rights'.
Warning: Note the PUT after the assignment.
This is because an assignment to an ANY variable can
actually change the memory block allocated to the ANY.
Hence you have to store the queue after an assignment even if
you know the ANY is a variant
ANY.
FieldPairsClass.AssignRightToLeft PROCEDURE
Again the use of this suggests you are not really in the variant
1 case. In variant 2 it has the effect of restoring all the
variables passed in as Lefts to the values they had when an
AssignLeftToRight was last done. (Which could be the
implicit one at the Additem point). In variant three
this is an assignment from the variables passed as Rights to the
variables passed as Lefts.
FieldPairsClass.ClearLeft PROCEDURE
This has the same effect for all three variants, it
CLEARs the variables passed in as Lefts. This is not
the same as assigning to zero, because the left-hand side could be
a string. It is also not quite the same as assigning to a
blank string (consider Cstrings & Pstrings). Now you could
argue that it is the same as assigning to a zero length
string, which is true, but only by coincidence. This illustrates
one of the big pitfalls of having a language "guru" doing low-level
classes. You can use your low-level knowledge to build assumptions
into the system that are not required. The fact that
presently all Clarion data-types can be CLEARed
by assigning a zero length string is a very dangerous fact to build
into a set of base classes (consider what would happen if you could
pass a mixed-type group as a *?). The clearing mechanism is there
to protect you from such assumptions, so the base classes use the
full language facility where they can.
Note further that CLEAR(SELF.List.Left) is very
different from SELF.List.Left &= NULL (see above).
FieldPairsClass.ClearRight PROCEDURE
In variant 2 this clears the buffer values, in variant 3 it
clears the variables passed in as Rights. This method is subject to
the same considerations as ClearLeft.
FieldPairsClass.EqualLeftRight PROCEDURE
In variant 2 this compares the current values in each of the Lefts against the last snap-shotted values. It returns a zero if any of the values differ. In variant 3 it compares each Left-Right passed in and returns a zero if there are any differences.
Note that this procedure effectively does a short-circuit evaluation which means the function returns as soon as a deviation is found. It demonstrates one of the reasons that I believe certain programming mantras can and should be violated in a controlled environment.
First the controlled environment. EqualLeftRight is
10 lines long, it fits on one screen and (I claim) should be
understandable in one bite by a half-way competent programmer.
Now for the mantra. Good structured programming will teach you that any given procedure should have precisely one entry point and precisely one exit point. This procedure has two exit points. Why? Certainly efficiency, and also (I claim) clarity. Consider the obvious alternative in Listing 8.
FieldPairsClass.EqualLeftRight PROCEDURE
I UNSIGNED,AUTO
B BYTE(1)
CODE
LOOP I = 1 TO RECORDS(SELF.List)
GET(SELF.List,I)
IF SELF.List.Left <> SELF.List.Right
B = 0
END
END
RETURN B
Now the method has the required one exit point. However there is
an extra line of code and there are two extra assignments
(BYTE(1) is an implicit assignment). But the real pain
is more subtle. Imagine a big field list (100 fields) in which you
are checking for a difference (say after an Edit-In-Place operation
on a browse). This code will check all 100 fields even if the first
one sets B to zero!
So you end up having to put a BREAK into the
IF condition or code an UNTIL at the tail
of the LOOP. The latter is less efficient still. The
former is efficient but if you now draw a flow diagram of
your algorithm you will find exactly the same logical structure as
coding a RETURN but it took you 20% longer to say
it!
This brings me to the Bayliss mantra: keep it short and to the point, but then don't compromise!
FieldPairsClass.Equal PROCEDURE
This is simply a logical short-hand for people using the
FieldPairsClass as opposed to the
BufferedPairsClass (where the explicit
LeftRight is helpful).
FieldPairsClass.Kill PROCEDURE
Check over this code. The destruction sequence of queues with
ANYs needs careful work. First you have to null out
all of the any variables, then you can dispose of the list.
BufferedFieldClass
This class is really just an extension to the
FieldClass to handle case 4. Two fields are paired and
there is a shadow third value. In some ways this makes it easier to
understand than the FieldClass. If ever Left or Right
are assigned to/from then it is the values in the underlying fields
that are being used. Buffer means the shadow which never
effects any values in the "real" program.
The BufferedFieldClass is derived from the
FieldClass; that is to say whenever a buffered field
class is being used without reference to the shadow value you can
simply call the same functions as you would for a case 3 of the
field class. The buffered field class is an extension for the case
when buffering is needed. Now we could simply have implemented the
BufferedFieldClass and used it for cases 1 through 3.
The main reason we didn't is one of efficiency.
ANY variables work extremely slowly compared to
standard Clarion variables (about 30x slower, or similar to Visual
Basic) and therefore maintaining an extra 1 or 2 for the very
common cases (1 through 3) was deemed unwise. The separation also
enables the field class to have a relatively small, clean
interface.
BufferedPairsClass.Init PROCEDURE
This procedure demonstrates a simple problem, with a simple
enough fix, but to the unwary it can be very confusing. The
FieldClass contains a reference to a
FieldPairs queue (with Left & Right
ANYs). This is NEW/DISPOSEd
in the FieldClass Init and Kill methods.
The BufferedFieldPairs class has a reference to a
buffered queue with three fields. Now here is the problem: if the
FieldClass and BufferedFieldClass both
have Init and Kill called then there will
be two separate queues pointed to by two separate references. So
the BufferedFieldClass Init method does
not call its parent. As a result there is only one copy of
the queue.
But there is a subtler problem. Suppose the Equal
method is called. This drills down to
FieldPairs.EqualLeftRight which expects the
SELF.List reference to be filled in, which it
won't be. Bang!
Here is the fix. The BufferedPairsQueue is (very
deliberately) just the FieldPairsQueue with extra
fields added. The Init method &= the List in the
FieldPairsClass to the RealList in the
derived class. Now the methods in FieldClass can
access the same queue as those of the
BufferedFieldClass but via a different reference.
Tech note: A particularly nice feature of queue and class
references is that they contain type information. Thus
CLEAR(MyQueueReference) will always clear the whole
queue buffer. Similarly ADD(Queue) works on the
whole queue.
BufferedPairsClass.AddPair PROCEDURE(*? Left,*? Right)
This method overrides the equivalent method in the base class.
Later versions of it actually contain some rather intricate code to
fix a subtle bug that I missed on the first lap. All the code is
really trying to do is reference assign Left and Right (as per the
parent function) and then CLEAR the buffer value
(because I don't know whether to assign it to Left or Right).
But the question becomes, what does it mean to clear an any
variable? (See discussion on ClearLeft) What I really want
is to assign it to a value which will compare equal to the Left or
Right variables if they have been cleared. The only general way I
could think of doing this was to clear the Right variable and then
assign it to the buffer. Of course people might object to me doing
that so I temp-store it first.
I think the other methods are fairly self-explanatory given the FieldClass explanations.
ANY variables (and type polymorphism) are key
strengths of the Clarion language that make it possible to code
complex database algorithms in a totally generic and safe way. The
two field classes extend this paradigm up to lists of field pairs.
If you scan the ABC sources you will find the field pairs classes
are intrinsic to files, browses, drop combos and edit-in-place. If
you scan generated source you will find AddPairs
popping up very frequently. The combined effect of these facts is
that most procedures can be generated without any need to derive
the browse or file objects. This simplifies and reduces the amount
of code required to use these classes and gives Clarion an
implementation edge (from template or hand-code) over C++, VB and
Object Pascal.
I hope this FieldClass design overview has given you an insight to one of the fundamental building blocks of the ABC system.
David Bayliss is a Systems Architect for The TopSpeed Development Center. He has worked upon TopSpeed's compiler and was the chief architect of the Application Builder Classes.
Copyright © 1999-2008 by CoveComm Inc. All Rights Reserved. Reproduction in any form without the express written consent of CoveComm Inc., except as described in the subscription agreement, is prohibited.
Clarion Magazine ISSN 1718-9942
One year: $189
(includes all back issues since '99)
Renewals from $139
Two years: $289
Renewals from $239