The Problem with Embeds, Part 2: Refactoring the TXA Parser
Skip to end of metadata
Go to start of metadata
by David Harms

In the first article in this series I said that most Clarion developers use embed points the wrong way, and by doing so they make their applications more difficult to maintain, test, debug and document. Almost every Clarion developer has done that; I've done it too. In these articles I intend to show how you can improve your code base by taking the majority of that code out of the embed points.

I'll be working through a couple of examples, including the Invoice app which ships with Clarion. But in this article I'll focus on the TXA embed parser example I introduced in Part 1.

In any analysis of embed code and how it might be better deployed, the first thing you need is a convenient way to look at just your embeds. That's not so easy because, embeds being embeds, they're sprinkled throughout your application.

Really you have two options: You can browse it in the embed list or in the embeditor, or you can use a tool to extract just the embeds so you can look at them without the distraction of the generated code.

About TXAs

As far as I know, there's only one way to programmatically extract the embed code from an APP. First, you have to export a TXA, which is an import/export text version of the information contained in an APP file. And second, you to parse the TXA to get the embeds, which can be a bit of a pain as the TXA format isn't documented.

A couple of years I wrote a TXA parser to do just that task, in support of an article series on the most popular embed points. So it seemed natural, when I needed a way to extract embed points for this series, to revisit that code.

Unfortunately, that code isn't very pretty. Most of it is contained in just a couple of embed points. In the TakeAccepted method's data section there are some declarations:

And then a little later on in TakeAccepted, at an embed that's called when the user presses the Import button, the TXA gets parsed. That code loops through the records in a previously created queue of TXA files (txaq):

In have a look at the ImportTXAs procedure in for the complete source.

What I had in mind for my new utility was something more along these lines:

My original parser's functionality was overkill for this new app; I really didn't need to build up an elaborate database of applications, procedures, and embed points. I just needed a list of embed points. But obviously I needed all of the parsing capabilities, gnarly though the code might be.

Unfortunately, there wasn't any way to use my original code unchanged, primarily because that code didn't actually extract the embedded source from the TXA - there was no need to capture the actual embed code since I was just logging embed usage. And I was motivated not to store the embed code: I had asked for TXA submissions which could contain sensitive information, and I didn't want to accidentally expose anyone's embed code to public view.

So what were my options? A few came to mind:

  • Just cut and paste the source. This is what a lot of us do, and it has the obvious drawbacks of creating multiple versions of the code to maintain. If I find a bug in my parsing code, I'll need to hunt down every place I've pasted a copy and make the change there.
  • Put my original procedure in a DLL and call it as needed. In this case my DLL also presents a user interface. That would result in some UI clunkiness in the app I envisioned in Figure 1, which doesn't need to call yet another window just to do the parsing.
  • Put the common source in source files and INCLUDE them. I could even include just portions of the files using labels. The drawback here is that there's no way to know how the various sections of source code might be used, or how any bug fixes to that source might cause unexpected bugs. Using INCLUDE statements this way results in an almost complete loss of control over the source code.
  • Create a template containing the source code. This certainly helps keep the code in one place, but it has a lot of negatives when it comes to maintaining and testing that code since you have to put the template in an app and generate the code, and then you have to port any changes back to the template.

And there were other problems. Because my original app was tied to a particular data store (a PostgreSQL database), any re-use of that code would have to know the table definitions. Since Clarion only supports one dictionary per app, any apps that used this procedure would either need to use the dictionary or import the tables from that dictionary.

Class or procedure?

So what's the answer? If it's not a template, and not a multi-purpose generated procedure and not INCLUDEd source, what's left? Procedures and classes, that's what. But not just any procedures or classes. I wanted to write code that had as few dependencies as possible.

Some of the dependencies I wanted to avoid:

  • Files/tables - not tie my code to a specific database
  • Windows/controls - not tie my code to a specific user interface element
  • Other code - keep calls to other procedures/classes to a minimum

So when should you use a class, and when a procedure?

In almost all cases a class is preferable to a procedure, in the same way that a procedure is almost always preferable to spaghetti code. A procedure presents a single point of entry and a single result. That's not to say you can't return multiple values from a procedure - you clearly can, as Steve Parker has showed. But procedures don't have the flexibility of classes.

In fact, a class method is really just a procedure, so using a class already gets you everything a procedure can do. But it also gets you more, because in a class multiple methods can operate on shared data.

The test app

In recent years I've become a devotee of test-driven development, which embodies the idea that you write your test first and then start in on the code needed to pass the test. I find this provides a lot of clarity to my class design.

So my first question is, what should my test(s) look like? 

I'll need a parser object, of course. And this is going to be part of the DevRoadmaps Clarion Library, so it needs to follow the DCL naming conventions. That means that DCL_ is the first part of the name (Clarion doesn't support namespaces in class names, so I fake these by separating name parts with underscores). 

The second part of the name? Maybe Clarion, since the parser is a class that relates specifically to Clarion's own functionality. If, for instance, I wrote some code to manipulate solution or project files I'd also begin those classes with DCL_Clarion_. If I'm not expecting a lot of classes then two levels of naming is sufficient, although sometimes I'll go to three. 

Provisionally, I'll name my class DCL_Clarion_TXAParser. For more on creating test apps see Creating a ClarionTest test DLL (DCL).

The first test

So what should my first test look like? A reasonable place to start seems to be identifying the number of embed points in a TXA, using code like this:

I'm not creating this code from scratch; instead I'm refactoring a previously created set of classes to use the DCL standards and existing DCL classes. But the original code doesn't have a GetEmbedCount method so I'll have to add that at some point. 

The class

The first decision I have to make is how to store the embed data. Originally I used a SQL database, but this now appears to be a liability. My parser should use something more transient that can be converted to a permanent store or just discarded after use. An in-memory data store, in the form of nested queues, fits the bill:

Since these queues are used by DCL_Clarion_TXAParser I've included them in and have given them a prefix of DCL_Clarion_TXAParser_. 

The parser's job will be to populate these queues so that I end up with a DCL_Clarion_TXAParser_ProcedureQueue containing one or more records. Each of those procedure records has one or more DCL_Clarion_TXAParser_EmbedQueue records, each of which has one or more DCL_Clarion_TXAParser_EmbedLineQueue records for each line in a given embed point. With that queue structure in hand I can easily update a database or create a text file, as I see fit.

At first blush this looks like an ideal task for a procedure. Pass in the name of the TXA and an empty queue and get back a filled queue: presto! But here's why I think the procedural solution is hardly ever a good solution: there's almost always some new functionality you will want to add which doesn't fit into the existing procedure.

Think about error handling. You can have the parsing procedure return an errorcode if the parse fails for any reason, but what if you want to get a more detailed error response? What if you want to enable tracing or logging? How would you do that in a single procedure call, without burdening the procedure with some obscenely large number of parameters?

In fact, once I began rewriting my parsing code as a class I ended up with rather a lot of methods and a few properties as well:

I won't go into all of these methods in detail - you can have a look at the source if you're interested. Briefly, there are a few private methods to break up the internal code into reusable blocks (you'd use routines in a procedural implementation), and there are some public methods to get the procedure and embed counts, parse the specified TXA, reset the parser and specify the particular queue to use.

Now, show me how you'd implement all that in a procedure!

Note that use of the SetQueue method. By default the class will use its own internal DCL_Clarion_TXAParser_ProcedureQueue instance, but if I wish I can tell the class to use an external queue so I can more easily process the data, e.g. to write it to a database or a text file. 

Next time: testing the class!


  • No labels
  1. The TXA structure can be confusing to understand because it resues some sections like PROCEDURE in a nested structure. The HELP is well done. If you need to write a TXA parser a big help is the ABC APP Converter. The source (ConvSrc)  shipped in C4, C5 and C55, but not C6+. One thing that helps is the file TERMS.CLW that contains each Section and what sections or syntax terminates the section. E.g. EMBED is terminated by:

  2. Thanks for the tip, Carl! I had to fire up Search Everything to find TERMS.CLW on my machine. And there it was, in the C55-C6 converter that shipped with C6. 


  3. I know I'm two years late to the party, but is the TERMS.CLW file something that someone with only C10 would be able to get a hold of? It would be incredibly helpful for a side-project of mine, but I haven't had any luck finding someone near me that has an old enough version of Clarion. Thanks!


    1. Hi Michael. I think you must be looking for a different file. Here's what's in my TERMS.CLW:

      !Format of group is:

      !Section Header
      !Termination sections

      Terminators GROUP()
      Number USHORT(18)

      1. Hi Dave

        not sure if you have a different version of terms.clw to me, or if it is a formatting issue, but my version has square brackets as shown above by Carl...  see below


        Geoff R

        !Format of group is:

        !Section Header
        !Termination sections

        Terminators GROUP()
        Number USHORT(18)