![]() |
|
Published 1999-04-01 Printer-friendly version
One of the most annoying things about being a software developer is hearing the word 'GPF'.
Quite often, or so it seems, the GPF will occur in a DLL which you have no control over - CWRUN, or maybe KERNEL32. Even if you do have the source, your client may not, and if you can't reproduce the problem on your machine then the debugger isn't going to be much use. (Of course, if your machine is anything like mine, the debugger GPFs more often than the program you're trying to debug).
This month I'm going to try and give some pointers (with some rather large hints from Alexey) on how to identify the code which produced the problem. I'll be working with a very simple CW2003 app which can either produce a GPF inside the executable itself (by POKEing memory address 0), or in the Clarion runtime DLL (by manipulating a VIEW when the files for the view aren't actually open).
The technique I'm going to describe this month will only work on DLLs or executables for which you have actual code - i.e., your application and the Clarion runtime. This won't work for GPFs in KERNEL32, because finding the GPF relies on searching for assembler instructions, and we don't have a LIB file for KERNEL32 that includes actual code. We do have 2 .LIB files for CW2RUN32 - CW2RUN32.LIB and CL2RUN32.LIB. The difference, in case you were wondering, is that CW2RUN32 contains no executable code - it's a LIB that you link in to your application so that the linker can resolve the function calls you made to the Clarion runtime functions. The other library, CL2RUN32, does contain assembler code - in fact, it contains assembler instructions for the entire Clarion RTL. That's why we can't find GPFs in KERNEL32, because the only LIB we have is to link against, it doesn't actually contain any code.
The key to finding a GPF is in having a .MAP file for the DLL or EXE that produced the error. If you don't have one it may be because you've turned it off in the project file. To turn it back on, choose Project->Edit. Hit the properties button, go the Link tab and check the box marked "Create Map File".
If you really don't have a map file (i.e., it GPFd inside the Clarion runtime) then it's not quite as simple. In that situation, even though you don't have source code for the DLL, it can still be just as useful to know what Clarion or Windows API caused the problem. For example, if your application GPFs and you can determine that it happened in SetWindowLong, and you have a sub-classed window in your application, then that points to a sub-classing problem in your code. Or if you have a view and it GPFd in the ViewDriver function in the Clarion runtime, that points to a problem with the view.
What happens when an application GPFs can differ, depending on whether it's a 16-bit or 32-bit application, and whether the operating system is set to run a debugger when a GPF occurs or not.
Microsoft KnowledgeBase articles Q103861, Q175644 and Q121434 all relate to configuring the default Windows debugger, but basically under Windows NT this is controlled by the registry key HKLM\Software\Microsoft\Windows NT\CurrentVersion\AeDebug, and under Windows 95 it's supposedly in the [AeDebug] section in WIN.INI.
I say supposedly because MSKB article Q138786 states "If the registry key exists in your Windows 95 registry, its settings will preempt the settings in the Win.ini file". That explains why both the 32-bit CW2 and C5EE debuggers both access the registry key used for Windows NT under Windows 95 and Windows 98.
The following table shows what happens for 32-bit applications -
for 16-bit applications you only get a error dialog, and no chance
to run the debugger.
| Windows 95 | Windows NT | |
| No debugger installed | Standard GPF dialog | Standard GPF dialog, and Dr Watson produces a DRWTSN32.LOG file in the \WINNT directory |
| Debugger installed | Standard GPF dialog with extra 'Debug' button which runs the debugger | Different GPF dialog with the option to run the debugger. No Dr Watson log file. |
All the screen shots I've used in the article were taken on machines without a debugger installed, so they should be quite standard.
Once you have generated the map file, you need to know the address where the error occurred. Under Windows and Windows NT you'll be given a segment:offset address.
The following screen shots tell you what to expect to see. You'll notice that Windows NT doesn't show as much information as Windows 95/98, but that doesn't matter.


The important thing to note is that they both show us the address where the error occurred. To find the GPF, open up the map file inside an editor and search for the segment:offset prior to the one given in the error message (0004:0131).
Start Length Name Class Group . . . 0004:0034 000D7H ENTERCODE CODE (none) 0004:010C 00188H GPFNOW_TEXT CODE (none) Address Publics by Value . . . 0003:003C __IREG 0004:010C GPF_TYPE1@F 0004:0144 GPF_TYPE2@F 0004:019E _MAIN
In this example we find 2 occurrences. The first part of the MAP file doesn't tell us that much - just what section (GPFNOW_TEXT, or compiled code) the procedure is in. If we had an APP instead of a source program, each module would be given its own code segment, so at least we'd be able to say what module the offending procedure was in. The second part, the 'publics by value' section, tells us that 0004:0131 is between addresses 0004:10C and 0004:0144.
That tells us that the procedure which caused the problem is GPF_TYPE1() (which it is). The name mangling on the procedure (@F) tells us that it doesn't take any parameters. There are only 56 (0144-010C = 0038) bytes in the procedure, so it's not very long. 0131 is 37 bytes into the procedure (0131-010C = 0025), so we can guess that the code which produces the problem is roughly in the middle of the generated assembler code for that procedure. If the procedure was very long, we may not know where in the source that assembler code relates to, so we may have to resort to looking at the generated assembler, then trying to match that back to the source code.


If you don't have a .MAP file for the DLL (or EXE) then the process is more complicated, and what happens next depends on whether you know the assembler instructions that caused us the problem or not. Windows 95 tells us this (it's the part labeled "Bytes at CS:EIP") but Windows NT does not, so if all you have is the information from NT you'll have to run the DLL in question (in this case CW2RUN16) through a dis-assembler. There are plenty to choose from, but I (and Alexey) recommend HIEW (available from ftp://ftp.kemsc.ru/pub/sen).
Dis-assembling the DLL shows us this:

The instructions at 00405511 (shown on the left preceeded by a ".") match those shown in the Windows 95 error window under "Bytes at CS:EIP" (26 F6 47 2E 02 74 16 …), so at least we know we've found the correct code in the DLL.
Once we know the assembler code which caused us a problem, we can use a hex editor to search the corresponding LIB for those instructions. Note that we don't search CW2RUN16.LIB, which doesn't have any exectuable code in it, but CL2RUN16.LIB, which does.

Once we've found the code in the LIB, we simply work backwards until we find a procedure name. In this case it'sVIEW_CHECK_STATE@FP4. Because we also have the source code in this case, we know it's to do with a view so this simply confirms what we've just found.


In 32-bit under Windows 95 you may be given what appears to be a segment:offset address, but it isn't. Because 32-bit applications use linear address, what it's actually showing is the code segment, and the address where the error occurred. Ignore the segment - you only need to note the address part, in this case 00401189. Windows NT, having a different underlying architecture in the first place, only gives you the address. We basically follow the same procedure that we did for 16-bit applications, and turn to the map file and search for that address.
401058 137 Code GPFNOW_TEXT 401190 3B Code IEXE32_TEXT 4011CC 2A Code INIT_TEXT . . . 401058 _main 401100 GPF_TYPE2@F 401174 GPF_TYPE1@F 402000 $WINDOW . . .
And again, we find 2 places in the map file that look interesting. The first tells us that the procedure that caused us the problem is in the GPFNOW_TEXT segment. The second section tells us that the procedure is GPF_TYPE1 @F (00401189 is between 401174 and 402000).


If you don't have the map file for a 32-bit application, we basically follow the same procedure that we did for the 16-bit application - but there's a problem. See it? If we're running under Windows NT, we don't know what DLL caused the problem! Windows 95 tells us in the error window, but NT doesn't.
Now, at this point you should be able to go to the Dr Watson log file and work out which DLL caused the problem. As this excerpt shows, the log lists the base address of the DLLs and their name. The names of the Clarion DLLs are conspicuous by their absence. We know that the problem is in the DLL whose address is 00800000 (from the address of 00854904), but we don't know what the DLL is called!
(00400000 - 00408000) (77f60000 - 77fbc000) dll\ntdll.dbg (00800000 - 00800000) (77dc0000 - 77dfe000) dll\advapi32.dbg (77f00000 - 77f5e000) dll\kernel32.dbg (77e70000 - 77ec4000) dll\user32.dbg (77ed0000 - 77efc000) dll\gdi32.dbg (77e10000 - 77e62000) dll\rpcrt4.dbg (00240000 - 002b3000) COMCTL32.dbg (77d80000 - 77db2000) dll\comdlg32.dbg (77c40000 - 77d7c000) dll\shell32.dbg (5f600000 - 5f618000) drv\winspool.dbg (006f0000 - 006f0000)
About the only way I know of finding the DLL is by using some third-party software, and you have to have that software running when your application GPFs. I use a nifty little utility called HandleEx (from http://www.sysinternals.com) that shows each application, it's associated DLLs, and the base memory address for each DLL.
Once we've worked out what DLL caused the problem, there are a number of different options open to us, most of which still involve searching the .LIB for assembler code. The easiest and quickest involves using the built-in 'quick view' function on the DLL. It only works as long as the offending procedure is exported from the DLL. Of course, you won't know that until you try, but that's only a 5 minute exercise.
Run Explorer and right-click on CW2RUN32.DLL. Look for the line that says Image base' - for CW2RUN32 it should read 0800000. Subtract 0800000 from our address of 854904 and we get 54904. Look further down in the listing, and you'll see 3 columns labeled 'ordinal', 'entry point' and 'name'. Look for an entry point that's lower than 54904, and you should find 543dc (at ordinal 02AF). That's the procedure (ViewDriver2) that caused the problem. Of course, if the procedure is not exported then this won't work, and we have to resort to another method.
If we don't have a debugger installed and we can get the Dr Watson log file, it shows us the assembler which caused us the problem. Then we just search the .LIB for the assembler code and work out the procedure name as before.
008548f4 807f4800 cmp byte ptr [edi+0x48],0x0 ds:01daf77e=?? 008548f8 752d jnz 00854927 008548fa 89f8 mov eax,edi 008548fc e817080000 call 00855118 00854901 8b4033 mov eax,[eax+0x33] ds:0119ea06=???????? FAULT ->00854904 8b4039 mov eax,[eax+0x39] ds:0119ea06=???????? 00854907 2500020000 and eax,0x200 0085490c 83f800 cmp eax,0x0 0085490f 7416 jz 00854927 00854911 6a00 push 0x0
If we don't have a debugger, but we don't have the log file (for whatever reason), we have to follow the same procedure we did for 16-bit applications and dis-assemble the DLL first. Once we have the assembler instructions, we can search the .LIB for the assembler code and work out the procedure name as before.
If we have Visual Studio installed, that will give us the opportunity to debug the application. Of course, it won't do us a lot of good because Visual Studio has no knowledge of Clarion source code and we still end up with a faceful of assembler. However, it does tell us what other DLLs are referenced by the application and what DLL caused the problem. Even though we know that, we still have to search the .LIB for the assembler instructions and work out what procedure caused the problem.
If you have the Clarion debugger installed, you get basically the same information all over again. Under Windows 95 and Windows 98 the debugger can (if you have source code on that machine) show the offending source code that generated the GPF. It should also do that under Windows NT, but (again) there's a problem - the user who's currently logged in to WindowsNT may not have security rights to debug the application. Looking in MSDN at the DebugActiveProcess API gives us this:
Looking in MSDN a bit more we find the interesting little snippet:
However, if your system is anything like mine, it doesn't make the slightest bit of difference - all I get is an error message:
So, basically, your back where you started - searching for assembler. Looking at the Disassembley window shows the assembler code and contents of the registers, but still doesn't get us any farther.

Whether we have a debugger installed or not basically only eliminates the disassembly step. We still have to search the CL2RUN32 library for the offending bytes, then work out what the procedure name is called. Of course, as I said at the beginning, we're only able to do that because we have a CL2RUN32 that has actual compiled code in it. If the GPF occurs in KERNEL32 for instance, we're up a creek without a paddle. There is no compiled library with source code in it, so we can't easily determine which procedure caused the problem. Of course, we can always resort to debugging in assembler, but that's not fun and doesn't get us much further along in the process."
Finding a GPF can be relatively easy, or downright frustrating. Hopefully this should give you some pointers on where to start. It's not infallible, but it should certainly get you going.
Of course, it would be a damn sight easier if you could use some of the 3rd party debug tools that are out there such as SoftIce or BoundsChecker, but because the only tool that recognises the TSWD debug format is Clarion's own 32-bit debugger, you're out of luck :-(
Synchronous RS232 communication using the Windows API
The best place for information is MSDN. If you don't subscribe, you can still get to the information via the web at http://msdn.microsoft.com/
MSKB articles of interest:
Q133174 - How to Locate Where a General Protection (GP) Fault Occurs
Q103861 - Choosing the Debugger That the System Will Spawn
| Q175644 - Dr. Watson Fails to Appear Because of Long File Names in Path |
| Q121434 - Specifying the Debugger for Unhandled User Mode Exceptions |
Q138786 - Just-In-Time Debugging Launches Wrong Debugger
Q178547 - DOTCRASH Helps Debug System Hangs and Memory Leaks in
Windows NT
Copyright © 1999-2008 by CoveComm Inc. All Rights Reserved. Reproduction in any form without the express written consent of CoveComm Inc., except as described in the subscription agreement, is prohibited.
Clarion Magazine ISSN 1718-9942
One year: $184
(includes all back issues since '99)
Renewals from $134
Two years: $274
Renewals from $224