Hardcore Clarion - How to Find GPF's in Your Application

by Paul Attryde

Published 1999-04-01    Printer-friendly version

Download the code here

One of the most annoying things about being a software developer is hearing the word 'GPF'.

Quite often, or so it seems, the GPF will occur in a DLL which you have no control over - CWRUN, or maybe KERNEL32. Even if you do have the source, your client may not, and if you can't reproduce the problem on your machine then the debugger isn't going to be much use. (Of course, if your machine is anything like mine, the debugger GPFs more often than the program you're trying to debug).

This month I'm going to try and give some pointers (with some rather large hints from Alexey) on how to identify the code which produced the problem. I'll be working with a very simple CW2003 app which can either produce a GPF inside the executable itself (by POKEing memory address 0), or in the Clarion runtime DLL (by manipulating a VIEW when the files for the view aren't actually open).

The technique I'm going to describe this month will only work on DLLs or executables for which you have actual code - i.e., your application and the Clarion runtime. This won't work for GPFs in KERNEL32, because finding the GPF relies on searching for assembler instructions, and we don't have a LIB file for KERNEL32 that includes actual code. We do have 2 .LIB files for CW2RUN32 - CW2RUN32.LIB and CL2RUN32.LIB. The difference, in case you were wondering, is that CW2RUN32 contains no executable code - it's a LIB that you link in to your application so that the linker can resolve the function calls you made to the Clarion runtime functions. The other library, CL2RUN32, does contain assembler code - in fact, it contains assembler instructions for the entire Clarion RTL. That's why we can't find GPFs in KERNEL32, because the only LIB we have is to link against, it doesn't actually contain any code.

The Basics

The key to finding a GPF is in having a .MAP file for the DLL or EXE that produced the error. If you don't have one it may be because you've turned it off in the project file. To turn it back on, choose Project->Edit. Hit the properties button, go the Link tab and check the box marked "Create Map File".

If you really don't have a map file (i.e., it GPFd inside the Clarion runtime) then it's not quite as simple. In that situation, even though you don't have source code for the DLL, it can still be just as useful to know what Clarion or Windows API caused the problem. For example, if your application GPFs and you can determine that it happened in SetWindowLong, and you have a sub-classed window in your application, then that points to a sub-classing problem in your code. Or if you have a view and it GPFd in the ViewDriver function in the Clarion runtime, that points to a problem with the view.

What happens when an application GPFs can differ, depending on whether it's a 16-bit or 32-bit application, and whether the operating system is set to run a debugger when a GPF occurs or not.

Microsoft KnowledgeBase articles Q103861, Q175644 and Q121434 all relate to configuring the default Windows debugger, but basically under Windows NT this is controlled by the registry key HKLM\Software\Microsoft\Windows NT\CurrentVersion\AeDebug, and under Windows 95 it's supposedly in the [AeDebug] section in WIN.INI.

I say supposedly because MSKB article Q138786 states "If the registry key exists in your Windows 95 registry, its settings will preempt the settings in the Win.ini file". That explains why both the 32-bit CW2 and C5EE debuggers both access the registry key used for Windows NT under Windows 95 and Windows 98.

The following table shows what happens for 32-bit applications - for 16-bit applications you only get a error dialog, and no chance to run the debugger.
 

  Windows 95 Windows NT
No debugger installed Standard GPF dialog Standard GPF dialog, and Dr Watson produces a DRWTSN32.LOG file in the \WINNT directory
Debugger installed Standard GPF dialog with extra 'Debug' button which runs the debugger Different GPF dialog with the option to run the debugger. No Dr Watson log file.

All the screen shots I've used in the article were taken on machines without a debugger installed, so they should be quite standard.

If you have a map file for a 16-bit application

Once you have generated the map file, you need to know the address where the error occurred. Under Windows and Windows NT you'll be given a segment:offset address.

The following screen shots tell you what to expect to see. You'll notice that Windows NT doesn't show as much information as Windows 95/98, but that doesn't matter.

1695i.gif

16nti.gif

The important thing to note is that they both show us the address where the error occurred. To find the GPF, open up the map file inside an editor and search for the segment:offset prior to the one given in the error message (0004:0131).

Start Length Name Class Group
. . .
0004:0034 000D7H ENTERCODE CODE (none)
0004:010C 00188H GPFNOW_TEXT CODE (none)
Address Publics by Value
. . .
0003:003C __IREG
0004:010C GPF_TYPE1@F
0004:0144 GPF_TYPE2@F
0004:019E _MAIN

In this example we find 2 occurrences. The first part of the MAP file doesn't tell us that much - just what section (GPFNOW_TEXT, or compiled code) the procedure is in. If we had an APP instead of a source program, each module would be given its own code segment, so at least we'd be able to say what module the offending procedure was in. The second part, the 'publics by value' section, tells us that 0004:0131 is between addresses 0004:10C and 0004:0144.

That tells us that the procedure which caused the problem is GPF_TYPE1() (which it is). The name mangling on the procedure (@F) tells us that it doesn't take any parameters. There are only 56 (0144-010C = 0038) bytes in the procedure, so it's not very long. 0131 is 37 bytes into the procedure (0131-010C = 0025), so we can guess that the code which produces the problem is roughly in the middle of the generated assembler code for that procedure. If the procedure was very long, we may not know where in the source that assembler code relates to, so we may have to resort to looking at the generated assembler, then trying to match that back to the source code.

If you don't have a map file for a 16-bit application.

1695o.gif

16nto.gif

If you don't have a .MAP file for the DLL (or EXE) then the process is more complicated, and what happens next depends on whether you know the assembler instructions that caused us the problem or not. Windows 95 tells us this (it's the part labeled "Bytes at CS:EIP") but Windows NT does not, so if all you have is the information from NT you'll have to run the DLL in question (in this case CW2RUN16) through a dis-assembler. There are plenty to choose from, but I (and Alexey) recommend HIEW (available from ftp://ftp.kemsc.ru/pub/sen).

Dis-assembling the DLL shows us this:

Hiew16.gif

The instructions at 00405511 (shown on the left preceeded by a ".") match those shown in the Windows 95 error window under "Bytes at CS:EIP" (26 F6 47 2E 02 74 16 …), so at least we know we've found the correct code in the DLL.

Once we know the assembler code which caused us a problem, we can use a hex editor to search the corresponding LIB for those instructions. Note that we don't search CW2RUN16.LIB, which doesn't have any exectuable code in it, but CL2RUN16.LIB, which does.

Hw16.gif

Once we've found the code in the LIB, we simply work backwards until we find a procedure name. In this case it'sVIEW_CHECK_STATE@FP4. Because we also have the source code in this case, we know it's to do with a view so this simply confirms what we've just found.

If you have a map file for a 32-bit application

3295i.gif

32nti.gif

In 32-bit under Windows 95 you may be given what appears to be a segment:offset address, but it isn't. Because 32-bit applications use linear address, what it's actually showing is the code segment, and the address where the error occurred. Ignore the segment - you only need to note the address part, in this case 00401189. Windows NT, having a different underlying architecture in the first place, only gives you the address. We basically follow the same procedure that we did for 16-bit applications, and turn to the map file and search for that address.

401058 137 Code GPFNOW_TEXT
401190 3B Code IEXE32_TEXT
4011CC 2A Code INIT_TEXT
. . .
401058 _main
401100 GPF_TYPE2@F
401174 GPF_TYPE1@F
402000 $WINDOW
. . .

And again, we find 2 places in the map file that look interesting. The first tells us that the procedure that caused us the problem is in the GPFNOW_TEXT segment. The second section tells us that the procedure is GPF_TYPE1 @F (00401189 is between 401174 and 402000).

If you don't have a map file for a 32-bit application

3295o.gif

32nto.gif

If you don't have the map file for a 32-bit application, we basically follow the same procedure that we did for the 16-bit application - but there's a problem. See it? If we're running under Windows NT, we don't know what DLL caused the problem! Windows 95 tells us in the error window, but NT doesn't.

Now, at this point you should be able to go to the Dr Watson log file and work out which DLL caused the problem. As this excerpt shows, the log lists the base address of the DLLs and their name. The names of the Clarion DLLs are conspicuous by their absence. We know that the problem is in the DLL whose address is 00800000 (from the address of 00854904), but we don't know what the DLL is called!

(00400000 - 00408000) 
(77f60000 - 77fbc000) dll\ntdll.dbg
(00800000 - 00800000) 
(77dc0000 - 77dfe000) dll\advapi32.dbg
(77f00000 - 77f5e000) dll\kernel32.dbg
(77e70000 - 77ec4000) dll\user32.dbg
(77ed0000 - 77efc000) dll\gdi32.dbg
(77e10000 - 77e62000) dll\rpcrt4.dbg
(00240000 - 002b3000) COMCTL32.dbg
(77d80000 - 77db2000) dll\comdlg32.dbg
(77c40000 - 77d7c000) dll\shell32.dbg
(5f600000 - 5f618000) drv\winspool.dbg
(006f0000 - 006f0000)

About the only way I know of finding the DLL is by using some third-party software, and you have to have that software running when your application GPFs. I use a nifty little utility called HandleEx (from http://www.sysinternals.com) that shows each application, it's associated DLLs, and the base memory address for each DLL.

Once we've worked out what DLL caused the problem, there are a number of different options open to us, most of which still involve searching the .LIB for assembler code. The easiest and quickest involves using the built-in 'quick view' function on the DLL. It only works as long as the offending procedure is exported from the DLL. Of course, you won't know that until you try, but that's only a 5 minute exercise.

Run Explorer and right-click on CW2RUN32.DLL. Look for the line that says Image base' - for CW2RUN32 it should read 0800000. Subtract 0800000 from our address of 854904 and we get 54904. Look further down in the listing, and you'll see 3 columns labeled 'ordinal', 'entry point' and 'name'. Look for an entry point that's lower than 54904, and you should find 543dc (at ordinal 02AF). That's the procedure (ViewDriver2) that caused the problem. Of course, if the procedure is not exported then this won't work, and we have to resort to another method.

If we don't have a debugger installed and we can get the Dr Watson log file, it shows us the assembler which caused us the problem. Then we just search the .LIB for the assembler code and work out the procedure name as before.

008548f4 807f4800 cmp byte ptr [edi+0x48],0x0 ds:01daf77e=??
008548f8 752d jnz 00854927
008548fa 89f8 mov eax,edi
008548fc e817080000 call 00855118
00854901 8b4033 mov eax,[eax+0x33] ds:0119ea06=????????
FAULT ->00854904 8b4039 mov eax,[eax+0x39] ds:0119ea06=????????
00854907 2500020000 and eax,0x200
0085490c 83f800 cmp eax,0x0
0085490f 7416 jz 00854927
00854911 6a00 push 0x0

If we don't have a debugger, but we don't have the log file (for whatever reason), we have to follow the same procedure we did for 16-bit applications and dis-assemble the DLL first. Once we have the assembler instructions, we can search the .LIB for the assembler code and work out the procedure name as before.

If we have Visual Studio installed, that will give us the opportunity to debug the application. Of course, it won't do us a lot of good because Visual Studio has no knowledge of Clarion source code and we still end up with a faceful of assembler. However, it does tell us what other DLLs are referenced by the application and what DLL caused the problem. Even though we know that, we still have to search the .LIB for the assembler instructions and work out what procedure caused the problem.

If you have the Clarion debugger installed, you get basically the same information all over again. Under Windows 95 and Windows 98 the debugger can (if you have source code on that machine) show the offending source code that generated the GPF. It should also do that under Windows NT, but (again) there's a problem - the user who's currently logged in to WindowsNT may not have security rights to debug the application. Looking in MSDN at the DebugActiveProcess API gives us this:

  • "The debugger must have appropriate access to the target process; it must be able to open the process for PROCESS_ALL_ACCESS access. On Windows 95 and Windows 98, the debugger has appropriate access if the process identifier is valid. However, on Windows NT, DebugActiveProcess can fail if the target process was created with a security descriptor that grants the debugger anything less than full access. Note that if the debugging process has the SE_DEBUG_NAME privilege granted and enabled, it can debug any process."

Looking in MSDN a bit more we find the interesting little snippet:

  • "Microsoft allows a process to override this access check by using a privilege. In Windows NT, a privilege is an attribute assigned to a user that allows the user to override what would normally be a restriction to some part of the operating system. Windows NT 4.0 supports about 24 privileges that allow users to do things like backup and restore files that they wouldn't normally be allowed to access, change the system time, run a process in the realtime priority class, and shut down the system. One of these privileges is SeDebugPrivilege. Normally, the operating system forbids users from debugging processes. This makes sense, of course, because you don't want to allow users (or hackers) the ability to alter a process while it is executing. However, there is a class of users who do need to be able to debug processes. By default, Windows NT automatically assigns the debug privilege to administrators. You can verify this by running the User Manager administrative tool, selecting the Policies | User Rights menu option, clicking the Show Advanced User Rights checkbox, and then choosing the Debug programs."

However, if your system is anything like mine, it doesn't make the slightest bit of difference - all I get is an error message:

  • "An unexpected failure occurred whilst processing a DebugActiveProcess API.You may choose OK to terminate the process or Cancel to ignore the error"

So, basically, your back where you started - searching for assembler. Looking at the Disassembley window shows the assembler code and contents of the registers, but still doesn't get us any farther.

3295dbg.gif

Whether we have a debugger installed or not basically only eliminates the disassembly step. We still have to search the CL2RUN32 library for the offending bytes, then work out what the procedure name is called.   Of course, as I said at the beginning, we're only able to do that because we have a CL2RUN32 that has actual compiled code in it.  If the GPF occurs in KERNEL32 for instance, we're up a creek without a paddle.  There is no compiled library with source code in it, so we can't easily determine which procedure caused the problem. Of course, we can always resort to debugging in assembler, but that's not fun and doesn't get us much further along in the process."

Summary

Finding a GPF can be relatively easy, or downright frustrating. Hopefully this should give you some pointers on where to start. It's not infallible, but it should certainly get you going.

Of course, it would be a damn sight easier if you could use some of the 3rd party debug tools that are out there such as SoftIce or BoundsChecker, but because the only tool that recognises the TSWD debug format is Clarion's own 32-bit debugger, you're out of luck :-(

Next Time …

Synchronous RS232 communication using the Windows API

And Remember …

The best place for information is MSDN. If you don't subscribe, you can still get to the information via the web at http://msdn.microsoft.com/

MSKB articles of interest:

Q133174 - How to Locate Where a General Protection (GP) Fault Occurs

Q103861 - Choosing the Debugger That the System Will Spawn
 

Q175644 - Dr. Watson Fails to Appear Because of Long File Names in Path
Q121434 - Specifying the Debugger for Unhandled User Mode Exceptions

Q138786 - Just-In-Time Debugging Launches Wrong Debugger

Q178547 - DOTCRASH Helps Debug System Hangs and Memory Leaks in Windows NT
 
 
 
 

Printer-friendly version

 
 

Search

 

Advanced Search
Topical Index

Related Articles

Subscribe to
ClarionMag

One year: $184

(includes all back issues since '99)

Renewals from $134

Two years: $274

Renewals from $224

More Info

Subscribe Now!

ClarionMag Blog

RSS Feeds

Updates via Email

Enter your Email


Powered by FeedBlitz

Quick Links