David Bayliss on FieldClass

by David Bayliss

Published 1999-04-12    Printer-friendly version

When I sent around the initial design proposal for what later became the ABC system one of the claimed benefits was a code reduction in user procedures of around half to two thirds. At the time this was viewed with some skepticism and so we set 30 percent as a reasonable goal. In the end we actually achieved around 92-94 percent, and the field class was chiefly responsible for the extra 30 percent.

To appreciate why, you need to consider how certain parts of the ABC system would be coded if the FieldClass didn't exist. For the sake of concreteness I am going to use the example I used many moons ago when I was trying to persuade Tom Moseley that OOP could work in the templates.

Example 1 : Updating a link field

One constant bugbear in CW2 was overflow of the appname_ru (referential integrity, or RI) module (to >64K) on any sizeable or complex dictionary. One aim of OOP was to reduce this problem. Although there were other procedures the heart of the referential integrity can be shown by pseudo-code for the RI update function.

Listing 1 assumes F1 & F2 are the related files (on the keys F1:K & F2:K with 2 & 3 components respectively, KC1, KC2, KC3 etc).

Listing 1. Code to cascade RI updates.
CLEAR(F2:KC3,-1) ! Clear minor-most components
F2:KC2 = F1:KC2
F2:KC1 = F1:KC1
SET(F2:K,F2:K)
LOOP
  NEXT(F2)
  IF F2:KC1 <> F1:KC1 OR F2:KC2 <> F1:KC2 THEN 
    BREAK ! No longer meeting range 
  END
  F2:KC2 = New:F1:KC2
  F2:KC1 = New:F1:KC1
  Cascade_Updates
END

I have left out several vital details but this is enough to show the nature of the problem. This code fragment (and every bit of file IO/error handling that goes with it) appears for every relation with RI restrictions on it. If you have 100 files, 250 relations means 250 copies of the code. It's not too surprising RI code frequently blows the segment limits in the legacy templates.

The challenge is to "proceduralize" the above code.

When trying to abstract out an algorithm I like to go through the code colouring the lines. I use three colours: blue for base classes; green for parameterized base classes; black for instance specific code. In my first pass over the code I just pick out the blue stuff: those lines of code which will always be the same no-matter which of the 250 copies of the code I am looking at.

Listing 2. Code to cascade RI updates with common code in blue.
CLEAR(F2:KC3,-1) ! Clear minor-most components
F2:KC2 = F1:KC2
F2:KC1 = F1:KC1
SET(F2:K,F2:K)
LOOP
   NEXT(F2)
   IF F2:KC1 <> F1:KC1 OR F2:KC2 <> F1:KC2 THEN 
     BREAK ! No longer meeting range 
   END
   F2:KC2 = New:F1:KC2
   F2:KC1 = New:F1:KC1
   Cascade_Updates
END

This ranks as grim. The vast bulk of the code is actually different between the 250 copies. You could put the loop in the base class and call out for the header and loop body, but the total lines of code (once you've allowed for two new procedures) actually goes up. Listing 3 shows five lines of code that replace the three blue lines.

Listing 3. Method to call RI virtuals.
RelationClass.UpdateSecondary PROCEDURE
CODE
! Virtual call- override for every relation
SELF.UpdateSecondaryInit LOOP
   IF SELF.UpdateSecondaryIterate THEN BREAK .
END

Perhaps the green pen will yield better results. This time I can colour lines with variables provided I then add a parameter to the procedure prototype to allow the value to be substituted.

Listing 4. Parameterized and base class code in green.
CLEAR(F2:KC3,-1) ! Clear minor-most components
F2:KC2 = F1:KC2
F2:KC1 = F1:KC1
SET(F2:K,F2:K)
LOOP
 NEXT(F2)
   IF F2:KC1 <> F1:KC1 OR F2:KC2 <> F1:KC2 THEN 
      BREAK ! No longer meeting range 
  END
   F2:KC2 = New:F1:KC2
   F2:KC1 = New:F1:KC1
   Cascade_UpdatesEND

There is a subtlety here. Why didn't I colour the CLEAR? Because the number of components to be cleared is not a function of the relation, it is a function of the key used by the secondary file. Thus you cannot readily parameterize it. So now you get the code shown in Listing 5.

Listing 5. Parameterized method to handle RI update.
RelationClass.UpdateSecondary PROCEDURE(
               File F, Key K, *? F1Field1,*? F1Field2,
               *? F2Field1, *?F2Field2, ? New1, ?New2)
  CODE
  SELF.UpdateSecondaryClear ! Virtual call- override for every relation
  F2Field1 = F1Field1
  F2Field2 = F1Field2
  LOOP
    NEXT(F)
    IF F2Field1 <> F1Field1 OR F2Field2<>F1Field1 THEN 
      BREAK
    END
    F2Field2 = New2
    F2Field1 = New1
    SELF.Cascade ! Virtual
  END

Now each of the 250 code lumps becomes two small virtual procedures and one base class call (with 8 parameters). This cuts down the line count although the actual amount of code generated is still quite high. Each *? Parameters costs 30+ bytes, so 6 of them is 180 bytes. I have also snuck a little bug past you. I have been assuming throughout that there are two linking fields. There can (of course) be 1, or 3 or 4 etc. So you need copies of the UpdateSecondary procedure (and the other four) for each possible number of pairs of fields.

Now I have greatly shrunk the code (i.e. there will be no more 64K problems) and just about everything has been abstracted. Every now and then someone will call to complain that ABC doesn't support 9 linking fields in a relation and we can simply write a new UpdateSecondary9 (with 27 *? Parameters at 600+ bytes per call).

But code abstraction doesn't have to end here. This design has UpdateSecondary1, UpdateSecondary2 etc and these procedures are really the same except in the number of parameters passed in. You can write a generalized UpdateSecondary procedure (except it won't compile!) as shown in Listing 6.

Listing 6. A general UpdateSecondary procedure.
RelationClass.UpdateSecondaryN PROCEDURE(
                     File F, Key K, (*? F1Field1,   *? F2Field1, ? New1) * N)
  CODE
  SELF.UpdateSecondaryClear ! Virtual call- override for every relation
  LOOP N Times
    F2FieldN = F1FieldN
  END
  LOOP
    NEXT(F)
    LOOP N Times
      IF F2FieldN <> F1FieldN THEN BREAK OuterLoop.
    END
    LOOP N Times
      F2FieldN = NewN
    END
    SELF.Cascade ! Virtual
  END

Listing 6 won't compile because that isn't a legal procedure prototype. But you can see the idea - I want to be able to pass in any number of fields without having to define the fields ahead of time.

Example 2 : Formatting a browse line

Another place that presented problems was the browse code. Most of the engine can disappear into a procedure (Bruce actually did this for CDD3.0). However there are three very large routines you cannot take down: filling a browse queue from data; filling the record buffer from the browse; and seeing if any data in the browse queue has changed. These three routines essentially look like this :

BrowseQ:Field1 = File:Field1
BrowseQ:Field2 = File:Field2
BrowseQ:Field3 = OtherFile:Field7

where "fill buffer" goes the opposite way to "reset buffer."

Easy you say, that's the same as parameterizing. But think about it! Restricting the number of linking fields to 9 is one thing, but the number of browse columns? We would have to go up to 100 just to avoid getting shot by the alpha testers! On the other hand if only I could get the LOOP N Times code from Listing 6 to compile then this really would be so easy.

(Some of you may think we could use the :=: syntax to move across the corresponding fields. In general that doesn't work because it doesn't allow for browse columns defined by local variables. It also suffers if you have two files in the browse with clashing field names).

In my opinion it was this problem that killed the CDD browse engine. Because the engine had to call back to the main code so frequently to do almost anything (and they didn't have the virtual mechanism to clean things up) the code because almost impenetrable. So the engine died, the inline browse appeared, and the browse procedure became our main bugbear for over five years

What's The Real Requirement?

The job then becomes one of defining what it actually is about the LOOP N Times code that will solve the problem. I think it comes down to the following :

I need to be able to pass around a list of one or more field pairs which can then be manipulated as a single entity.

Think about those last two words; they are the key. If I can embody the LOOP N Times into a single line of code then I have the problem cracked.

My expression field pairs also betrays another consideration. In the browse case there are only ever two fields that are really interesting; for the RI code there are three interesting values (child fields, parent fields, new parent fields). The prototype for the UpdateSecondary is also interesting. Note that the fields pertaining to the files are prototyped as *?, meaning they can be assigned to and from. The new fields are only ever used by value. It turns out that (in this example, at least) there are typically 4 different cases :

  1. Single field. This is a list of fields with no partner. In fact the components of a key are stored this way which makes it possible to bring the CLEAR(keycomponent) into the base class as well!
  2. Single field - buffered. These are fields which have to have a snapshot of their values taken without changing and 'real' program variables so the variables can be later compared to those values.
  3. Two fields. Two sets of fields, either of which can be assigned to and from the other.
  4. Two fields - buffered. This is the most complex case of two sets of fields where either one may need snap-shotting.

Because the fourth case is much heavier than the others (although related) we decided to assign it to its own class which is derived from the field pairs class

The Implementation - Any Ideas?

In order to understand how this class works you certainly need to understand queues but you also need to understand the ANY datatype. This is given an excellent coverage in the manuals which I shan't repeat. However, the key here is this: an ANY can act like a *? parameter OR a ? parameter dependant upon how you assign to it.

Specifically,if an ANY variable is NULL (has no value) then a straight value assignment to it produces a value ANY, while a reference assignment to it produces a variant any. Listing 7 shows an example.

Listing 7. Using ANYs to store values and references.
MyAny &= NULL
Field = 22
MyAny = Field         ! MyAny = 22
Field = 42            ! MyAny = 22
MyAny = 50            ! Field = 42, MyAny = 50
MyAny &= Field        ! MyAny = 42
Field = 62            ! MyAny = 62
MyAny = 72            ! Field = 72

Warning: CLEAR(MyAny) is equivalent to MyAny = 0. It is NOT the same as MyAny &= NULL.

FieldPairsClass.Init PROCEDURE

The Init procedure is simple enough to use. It creates the queue that forms the basis of the class. A slight oddity is the call to Kill first. This is to allow a FieldPairsClass to be used and reused within a procedure. (Effectively Init acts as a Reset.)

FieldPairsClass.AddItem PROCEDURE(*? Left)

There are two notional AddItem methods (the second called AddPair). This one is used for cases 1 & 2. Note the ASSERT to insure Init has been called. The CLEAR is dealing with some (rather nasty) memory management issues when dealing with ANY in queues (see the manual). The incoming variable is &= into the left hand queue element. It is then = into the right hand element. This distinction is crucial (see above). It means that simply AddIteming a field is enough to snap-shot it so that it can be reset (or tested for difference) at a later stage. The parameter is called Left because you can think of it as something you can assign into (and which therefore appears on the left hand side of an assignment (=) operator.

FieldPairsClass.AddPair PROCEDURE(*? Left,*? Right)

This method is used for variant 3. Other comments are the same as AddItem. Note also that in this case left & right do not have any real significance. it is just a non-suggestive way of labeling the two entities.

FieldPairsClass.AssignLeftToRight PROCEDURE

This procedure is really meaningless in variant one (actually it converts a variant one into a variant two). In variant 2 this can be seen as a way of snapshotting the current values of all the variables. In variant 3 all the values from the variables passed in as 'lefts' will be copied into the variables passed in as 'rights'.

Warning: Note the PUT after the assignment. This is because an assignment to an ANY variable can actually change the memory block allocated to the ANY. Hence you have to store the queue after an assignment even if you know the ANY is a variant ANY.

FieldPairsClass.AssignRightToLeft PROCEDURE

Again the use of this suggests you are not really in the variant 1 case. In variant 2 it has the effect of restoring all the variables passed in as Lefts to the values they had when an AssignLeftToRight was last done. (Which could be the implicit one at the Additem point). In variant three this is an assignment from the variables passed as Rights to the variables passed as Lefts.

FieldPairsClass.ClearLeft PROCEDURE

This has the same effect for all three variants, it CLEARs the variables passed in as Lefts. This is not the same as assigning to zero, because the left-hand side could be a string. It is also not quite the same as assigning to a blank string (consider Cstrings & Pstrings). Now you could argue that it is the same as assigning to a zero length string, which is true, but only by coincidence. This illustrates one of the big pitfalls of having a language "guru" doing low-level classes. You can use your low-level knowledge to build assumptions into the system that are not required. The fact that presently all Clarion data-types can be CLEARed by assigning a zero length string is a very dangerous fact to build into a set of base classes (consider what would happen if you could pass a mixed-type group as a *?). The clearing mechanism is there to protect you from such assumptions, so the base classes use the full language facility where they can.

Note further that CLEAR(SELF.List.Left) is very different from SELF.List.Left &= NULL (see above).

FieldPairsClass.ClearRight PROCEDURE

In variant 2 this clears the buffer values, in variant 3 it clears the variables passed in as Rights. This method is subject to the same considerations as ClearLeft.

FieldPairsClass.EqualLeftRight PROCEDURE

In variant 2 this compares the current values in each of the Lefts against the last snap-shotted values. It returns a zero if any of the values differ. In variant 3 it compares each Left-Right passed in and returns a zero if there are any differences.

Note that this procedure effectively does a short-circuit evaluation which means the function returns as soon as a deviation is found. It demonstrates one of the reasons that I believe certain programming mantras can and should be violated in a controlled environment.

First the controlled environment. EqualLeftRight is 10 lines long, it fits on one screen and (I claim) should be understandable in one bite by a half-way competent programmer.

Now for the mantra. Good structured programming will teach you that any given procedure should have precisely one entry point and precisely one exit point. This procedure has two exit points. Why? Certainly efficiency, and also (I claim) clarity. Consider the obvious alternative in Listing 8.

Listing 8. A single exit point alternative to EqualLeftRight.
FieldPairsClass.EqualLeftRight PROCEDURE
I UNSIGNED,AUTO
B BYTE(1)
  CODE
  LOOP I = 1 TO RECORDS(SELF.List)
    GET(SELF.List,I)
    IF SELF.List.Left <> SELF.List.Right
      B = 0
    END
  END
  RETURN B

Now the method has the required one exit point. However there is an extra line of code and there are two extra assignments (BYTE(1) is an implicit assignment). But the real pain is more subtle. Imagine a big field list (100 fields) in which you are checking for a difference (say after an Edit-In-Place operation on a browse). This code will check all 100 fields even if the first one sets B to zero!

So you end up having to put a BREAK into the IF condition or code an UNTIL at the tail of the LOOP. The latter is less efficient still. The former is efficient but if you now draw a flow diagram of your algorithm you will find exactly the same logical structure as coding a RETURN but it took you 20% longer to say it!

This brings me to the Bayliss mantra: keep it short and to the point, but then don't compromise!

FieldPairsClass.Equal PROCEDURE

This is simply a logical short-hand for people using the FieldPairsClass as opposed to the BufferedPairsClass (where the explicit LeftRight is helpful).

FieldPairsClass.Kill PROCEDURE

Check over this code. The destruction sequence of queues with ANYs needs careful work. First you have to null out all of the any variables, then you can dispose of the list.

Derived Classes

BufferedFieldClass

This class is really just an extension to the FieldClass to handle case 4. Two fields are paired and there is a shadow third value. In some ways this makes it easier to understand than the FieldClass. If ever Left or Right are assigned to/from then it is the values in the underlying fields that are being used. Buffer means the shadow which never effects any values in the "real" program.

Queue Derivation

The BufferedFieldClass is derived from the FieldClass; that is to say whenever a buffered field class is being used without reference to the shadow value you can simply call the same functions as you would for a case 3 of the field class. The buffered field class is an extension for the case when buffering is needed. Now we could simply have implemented the BufferedFieldClass and used it for cases 1 through 3. The main reason we didn't is one of efficiency. ANY variables work extremely slowly compared to standard Clarion variables (about 30x slower, or similar to Visual Basic) and therefore maintaining an extra 1 or 2 for the very common cases (1 through 3) was deemed unwise. The separation also enables the field class to have a relatively small, clean interface.

BufferedPairsClass.Init PROCEDURE

This procedure demonstrates a simple problem, with a simple enough fix, but to the unwary it can be very confusing. The FieldClass contains a reference to a FieldPairs queue (with Left & Right ANYs). This is NEW/DISPOSEd in the FieldClass Init and Kill methods. The BufferedFieldPairs class has a reference to a buffered queue with three fields. Now here is the problem: if the FieldClass and BufferedFieldClass both have Init and Kill called then there will be two separate queues pointed to by two separate references. So the BufferedFieldClass Init method does not call its parent. As a result there is only one copy of the queue.

But there is a subtler problem. Suppose the Equal method is called. This drills down to FieldPairs.EqualLeftRight which expects the SELF.List reference to be filled in, which it won't be. Bang!

Here is the fix. The BufferedPairsQueue is (very deliberately) just the FieldPairsQueue with extra fields added. The Init method &= the List in the FieldPairsClass to the RealList in the derived class. Now the methods in FieldClass can access the same queue as those of the BufferedFieldClass but via a different reference.

Tech note: A particularly nice feature of queue and class references is that they contain type information. Thus CLEAR(MyQueueReference) will always clear the whole queue buffer. Similarly ADD(Queue) works on the whole queue.

BufferedPairsClass.AddPair PROCEDURE(*? Left,*? Right)

This method overrides the equivalent method in the base class. Later versions of it actually contain some rather intricate code to fix a subtle bug that I missed on the first lap. All the code is really trying to do is reference assign Left and Right (as per the parent function) and then CLEAR the buffer value (because I don't know whether to assign it to Left or Right). But the question becomes, what does it mean to clear an any variable? (See discussion on ClearLeft) What I really want is to assign it to a value which will compare equal to the Left or Right variables if they have been cleared. The only general way I could think of doing this was to clear the Right variable and then assign it to the buffer. Of course people might object to me doing that so I temp-store it first.

I think the other methods are fairly self-explanatory given the FieldClass explanations.

Finally

ANY variables (and type polymorphism) are key strengths of the Clarion language that make it possible to code complex database algorithms in a totally generic and safe way. The two field classes extend this paradigm up to lists of field pairs. If you scan the ABC sources you will find the field pairs classes are intrinsic to files, browses, drop combos and edit-in-place. If you scan generated source you will find AddPairs popping up very frequently. The combined effect of these facts is that most procedures can be generated without any need to derive the browse or file objects. This simplifies and reduces the amount of code required to use these classes and gives Clarion an implementation edge (from template or hand-code) over C++, VB and Object Pascal.

I hope this FieldClass design overview has given you an insight to one of the fundamental building blocks of the ABC system.


David Bayliss is a Systems Architect for The TopSpeed Development Center. He has worked upon TopSpeed's compiler and was the chief architect of the Application Builder Classes.

Printer-friendly version

Reader Comments

To add a comment to this article you must log in.

 
 

Search

 

Advanced Search
Topical Index

Related Articles

Subscribe to
ClarionMag

One year: $189

(includes all back issues since '99)

Renewals from $139

Two years: $289

Renewals from $239

More Info

Subscribe Now!

ClarionMag Blog

RSS Feeds

Updates via Email

Enter your Email


Powered by FeedBlitz

Quick Links