![]() |
|
Published 1999-10-26 Printer-friendly version
Recently on the newsgroups someone mentioned that TopScan didn't identify duplicate record errors while building the keys. After digging around in the source code I confirmed this to be indeed true and set about writing a "duplicate record" detector (DRD) which was file neutral. This article will take a look at the resulting class.
The DRD would have to perform the following functions:
The Solution Overview
The solution ended up being quite a bit simpler than I had originally envisaged, although on the down side it involves a brute force approach where every record in the file will need to be tested. The logical flow goes something like this:
DUPLICATE(FILE) function
to see if it is a duplicate.It is important to note that when duplicate records exist in a given key there will always be one valid record, while the rest of the duplicates will simply not exist for that key.
There are several ways a duplicate record can be created, the following are probably the most popular:
The solution has been presented in the form of a class. (The attached source code also has a small example PRJ and CLW file). The definition looks something like this:
FindDupClass class, type
init procedure(file f)
kill procedure
buildKeys procedure
findDup procedure
display procedure
end
The class would typically be used as follows (and since it is a class, the actual code is appropriately simple):
TestFileForDup procedure(FILE TestFile) cFind FindDupClass code cFind.Init(TestFile) cFind.BuildKeys() cFind.FindDup() cFind.Display() cFind.Kill()
BuildKeys Method
Rather than calling BUILD(FILE), this method builds
each key individually by using the PROP:Keys and
PROP:Key file properties. The main reason for this is
to avoid a nasty side effect of BUILD(FILE) where it
aborts BUILDing all keys when an error is encountered
(in this case a duplicate record error!). A nice addition to this
method would be to actually check if any errors occurred during
this operation, if not then there will be no need to check for
duplicates. Another useful addition would be to use the
PROP:Completed and PROP:ProgressEvents
properties to display a nice progress window, with the option to
gracefully cancel.
FindDupClass.BuildKeys procedure
i unsigned, auto
k &key
code
loop i = 1 to self.f{PROP:Keys}
k &= self.f{PROP:Key, i}
build(k)
end
FindDup Method
The FindDup method is the brute force part of the
solution (it loops through the entire file, record by record).
Since files can be large it uses the EVENT:Timer event
on a simple status window. When a duplicate is found, a private
method is called (CalcDupInfo) to analyse the
duplicate.
FindDupClass.findDup procedure
Prog long(0)
Found long(0)
Window WINDOW('Caption'),AT(,,200,44)
,FONT('MS Sans Serif',8,,FONT:regular)
,TIMER(1),SYSTEM,GRAY,DOUBLE
PROGRESS,USE(Prog),AT(5,10,190,10),RANGE(0,100)
PROMPT('Duplicates:'),AT(5,27)
BUTTON('&Cancel'),AT(150,25,45,14),USE(?ButtCancel)
STRING(@s64),AT(50,27),USE(Found)
END
i unsigned, auto
FinFlag byte(FALSE)
code
open(self.f)
assert(~errorcode())
stream(self.f)
assert(~errorcode())
set(self.f)
open(window)
window{PROP:Text} = self.f{PROP:Name}
?Prog{PROP:RangeHigh} = records(self.f)
accept
case event()
of EVENT:Timer
if ~FinFlag
loop 100 times
next(self.f)
if errorcode()
FinFlag = TRUE
post(EVENT:CloseWindow)
break
end
Prog += 1; display(?Prog)
if duplicate(self.f)
Found += 1; display(?Found)
self.CalcDupInfo()
end
end
end
end
end
close(window)
close(self.f)
return FinFlag
calcDupInfo Method
This method goes through the following steps to find duplicates.
GET(SELF.F,K) procedure. Since the
current record buffer for the "error" record "matches" the one
valid record, when the GET is called it will return
the one valid record!AddResult is called twice
(NOTE: AddResult will only add records to the result
queue if they haven't already been added):The original record is then restored with the
RESET, NEXT combination (you must use the
RESET(FILE) form of RESET, for obvious
reasons).
FindDupClass.calcDupInfo procedure
j unsigned, auto
k &key
tmpPos like(ResultQueue.Pos)
tmpPoint long, auto
code
tmpPos = position(self.f)
tmpPoint = pointer(self.f)
loop j = 1 to self.f{PROP:Keys}
k &= self.f{PROP:Key, j}
if duplicate(k)
get(self.f, k)
assert(~errorcode())
self.AddResult(pointer(self.f), k)
self.AddResult(tmpPoint, k, TRUE)
reset(self.f, tmpPos)
next(self.f)
assert(~errorcode())
end
end
AddResult Method
This method adds the duplicate record information to the result
queue for displaying later (in the display method). It checks that
this particular duplicate record hasn't been added already
(based on the key name and its POSITION information),
and if it has it will append the current record number to it.
FindDupClass.addResult procedure(string pnt, key k, byte AppendPos = FALSE)
code
clear(self.qResult)
self.qResult.Rec = pnt
self.qResult.Key = k{PROP:Label}
self.qResult.Pos = position(k)
get(self.qResult, +self.qResult.Key, +self.qResult.Pos)
if errorcode()
add(self.qResult, self.qResult.Rec)
elsif AppendPos
self.qResult.Rec = clip(self.qResult.Rec) & ', ' & pnt
put(self.qResult)
end
All in all it was quite pleasing that the solution ended up being so straightforward. Something similar will undoubtedly find its way into TopScan. One more point: to verify the sample program, try using TopScan to view the created file.
Prior to joining TopSpeed Development Centre, Gordon Smith worked for an Irish company developing software for multi-national pharmaceutical companies. He was also a member of the 3rd party accessories program (Compile Manager 2) and developed the Clarion Class Browser. He is currently working with several of the original TopSpeed Developers at Seisint Inc..
|
Posted on Tuesday, June 12, 2001 by Greg Salyer This is good except that it only shows the record number and key name for each dupe found. It needs a way to specify an additional field to display such as an ID number field. |
To add a comment to this article you must log in.
Copyright © 1999-2008 by CoveComm Inc. All Rights Reserved. Reproduction in any form without the express written consent of CoveComm Inc., except as described in the subscription agreement, is prohibited.
Clarion Magazine ISSN 1718-9942
One year: $189
(includes all back issues since '99)
Renewals from $139
Two years: $289
Renewals from $239