4 DEBUGGING KERNEL TRAPS
Sometimes things go wrong and your machine panics. We often see messages on Usenet or help aliases about getting a "Bad Trap" panic and a load of hex with a plea for help.
In most cases, it is usually necessary to raise the issue with your support channel so that we at Sun get to look at the problem. This is simply because we have the source code and that, combined with a crash dump from your system, helps us to diagnose the problem.
However, if you have the source code or you want to know more about how to decipher these panic messages, then we should have a look at this here.
4.1 Finding Out What Went Wrong
The first objective in understanding the problem is to make sure we have all the information to hand. A crash dump is a pre-requisite if you are to do any real debugging. When the kernel panics, it writes the core image of the kernel (including modules and active pages of physical memory) to the primary swap device which we can retrieve using the savecore command, so make sure that savecore is enabled in the file /etc/init.d/sysetup. Simply uncomment the line containing the "savecore" command. Alternatively, if a panic has occurred and you haven't had a chance to enable it, you can run the command manually and, with luck, you can still get it out from the swap partition. Naturally, if you leave it to long so that the OS has started to use the swap device, you run the risk of damaging part or all of the crash dump image so it's a good idea to get it out as soon as possible. Once we have the crash dump, we can start to decipher that "BAD TRAP" message we saw when the machine went down.
4.1.1 Deciphering a BAD TRAP panic message
We can get a lot of information from this output. Don't worry if you don't have a copy of it from the console or /var/adm/messages because you can still see it in your crash dump. Run the "$<msgbuf" macro under adb to get it out. Let's see an example which I can talk you through:
BAD TRAP: type=1 rp=f0586c1c addr=25732564 mmu_fsr=164 rw=3 mv: Text fault kernel read fault at addr=0x25732564, pme=0x0 MMU sfsr=164: Invalid Address on supv instr fetch at level 1 pid=14825, pc=0x25732564, sp=0xf0586c68, psr=0x40000c6, context=0 g1-g7: 10bf8484, 8000000, ffffffff, 0, 0, 1, fc44de00 Begin traceback... sp = f0586c68 Called from fc3616bc, fp=f0586cd0, args=fc42ac50 e226f000 e226f0ff 1 ffffffff 3e8 Called from fc361588, fp=f0586d98, args=fc4db30c 0 fc42ac08 fc42ac00 fc42ac50 fc4db350 Called from f00e7e24, fp=f0586e00, args=fc4db30c fc087415 fc42ac08 fc378c14 fc4ef480 200 Called from f0072054, fp=f0586ed8, args=0 effff5a4 21c0001 fc42ac08 20 effffaeb Called from f0041aa0, fp=f0586f58, args=f0160fe8 fc6d0cd4 0 fc6d0cb0 fffffffc ffffffff Called from 11700, fp=efffd050, args=effffaeb effff5a4 23150 1 0 0 End traceback...This is an example of a "Bad Trap" from a sun4m running Solaris 2.3. The first few lines give us information about the type of trap that caused the panic. The fields shown in the first line have the following meanings:
Fortunately, we don't need to remember the various architecture specific trivia to decode this as the kernel will do it for us. As we can see on the following lines, it tells us that this was a kernel read fault and at what address it occurred. On the next line, it expands the MMU status register and in this case it tells us that it was an invalid address on a supv (supervisor) instruction fetch at level 1 (of the page table MMU hierarchy). Still, at least from the above description you can see how these fields are decoded.
The next line under the MMU decoding gives us the process id of the process that was running on that CPU when the trap occurred. It also gives us the values of the program counter (PC) which contains the address of the instruction that caused the trap. In this case, we can see that the address is the same as the address that the MMU information told us was bad. We also know from the MMU information that the problem was trying to fetch an instruction from an illegal address and so it is obvious that in this case, our PC contains a bogus address and that is what has caused the trap.
The other fields of interest on this line are sp and psr which give us the stack pointer and the processor status register value respectively. We can use the stack pointer to unwind the sequence of function calls to help diagnose the problem, although the kernel does that for us just underneath.
Next we get the global variables, from %g1 onwards. There's no point in dumping %g0 because it is hardwired to always be zero.
Finally, the stack traceback. When we call a function through the CALL instruction, the address of that CALL is written into register %o7. If the function we're calling does a SAVE to change register windows and reserve some stack space (see section 3.2 on Register Windows) then the caller's register frame gets saved on the stack and we can back-track through the stack to see the sequence of calls by looking at the address where the contents of the registers is stored on the stack. At this point, we might as well look at how to manually walk through a stack to get the call trace out.
4.1.2 Manually Extracting the Stack Trace
Okay, so we have our crash dump and we want to find out a bit more about what went wrong. We can often get a picture of what happened from the stack trace of function calls. For example, using the crash dump that accompanies the example messages shown in the previous section, if we look at this crash dump with adb we can try and see what this stack trace was:
# adb -k unix.0 vmcore.0 physmem 1e68 $c complete_panic(0xf00494d8,0xf0586b04,0xf0586990,0xfc255600,0x0,0x1) + 10c do_panic(?) + 1c vcmn_err(0xf015f7e8,0xf0586b04,0xf0586b04,0x80,0xffeec000,0x3) cmn_err(0x3,0xf015f7e8,0xf015fc00,0x35,0x35,0xf0152400) + 1c die(0x1,0xf0586c1c,0x25732564,0x164,0x3,0xf015f7e8) + 78 trap(0x1,0xf0586c1c,0xf0182520,0x164,0x3,0xfc34b9c0) + 454 fault(0xfc42ac50,0xfc,0x25732564,0xf015379c,0x200,0x2) + 84Here I've used adb's call trace command ($c) in an attempt to find out the sequence of calls leading up to the panic. Unfortunately, we don't see anything useful because all we have on the active stack is the sequence of calls from the "fault" trap handler upwards. What we need is the stack pointer at the time of the trap itself and fortunately that is one of the registers saved when a panic occurs. In the previous section we saw that the 'rp' field of the panic message gives us the address of the saved registers. Don't worry if you don't happen to have the messages to hand because it just so happens that the pointer to the saved registers is passed to the trap() and die() routines as the second argument. The arguments to functions (the values of %i0 thru %i5) are shown in the brackets and so we can see that the pointer to the saved registers is, in this case, 0xf0586c1c. So we take that value and run the "regs" adb macro against it:
0xf0586c1c$<regs 0xf0586c1c: psr pc npc 40000c6 25732564 25732568 0xf0586c28: y g1 g2 g3 548dd520 10bf8484 8000000 ffffffff 0xf0586c38: g4 g5 g6 g7 0 0 1 fc44de00 0xf0586c48: o0 o1 o2 o3 fc42ac50 fc 25732564 f015379c 0xf0586c58: o4 o5 o6 o7 200 2 f0586c68 f006ea88Here we can see the values of the registers at the time the trap occurred. The address of the caller is in %o7 and our frame pointer is in %o6. Let's take a look at who called us by looking at the instruction at the address in %o7:
f006ea88?ia mutex_adaptive_enter+0x2c: jmpl %o2, %o7Okay, so we can see that the mutex_adaptive_enter routine tried to jump to the address in %o2 and, if we look back at the saved registers above, we can see that the address in %o2 is the same as the one in %pc...the address we paniced on! In fact if we have the source we can see that in this case someone has corrupted the function pointer used by the mutex code to point to the lock enter routine.
If necessary we could trace back further by walking on through the stack because when we get window overflow, the window contents are saved on the stack. This means that we can trace our calls through as many levels as there are saved windows on the stack. The format of a stack frame in SunOS looks like this:
%fp->| | |-------------------------------| | Locals, temps, saved floats | |-------------------------------| | outgoing parameters past 6 | |-------------------------------|-\ | 6 words for callee to dump | | | register arguments | | |-------------------------------| > minimum stack frame | One word struct-ret address | | |-------------------------------| | | 16 words to save IN and | | %sp->| LOCAL register on overflow | | |-------------------------------|-/This means that by dumping the first 16 words on the stack, we can extract the saved window's %o6 and %o7 to see the previous stack pointer and the address of the calling instruction respectively. Here's an example of this manual walk-through taking the above %o6 from the saved registers as a starting point:
f0586c68,10/X 0xf0586c68: 0 f0586ed0 f016574c f0586eb8 0 f00f4730 0 1 fc4db30c fc087415 fc42ac08 fc378c14 fc4ef480 200 f0586e00 f00e7e24Here we have the saved local and in registers from the previous window. The first two rows contain registers %l0 through %l7 and the last two rows contain %o0 through %o7. Let's take a peek at the instruction that called us by using the address in the saved %o7:
f00e7e24?ia vn_rename+0x14c: jmpl %g1, %o7 vn_rename+0x150:We can also take the value in the saved %o6, shown immediately to the left of the saved %o7 above and repeat the procedure to look at the previous caller:
f0586e00,10/X 0xf0586e00: 0 9 fc68490c 20e fc36004c 14f fc06ce7c 21c0001 0 effff5a4 21c0001 fc42ac08 20 effffaeb f0586ed8 f0072054 f0072054/ia syscall+0x3ec: jmpl %g1, %o7 syscall+0x3f0: f0586ed8,10/X 0xf0586ed8: 1 0 f0156d28 0 fc28b000 86 f0586fb4 fc6d0800 f0160fe8 fc6d0cd4 0 fc6d0cb0 fffffffc ffffffff f0586f58 f0041aa0 f0041aa0?ia _sys_rtt+0x4d8: call syscallThis shows you how you can back-track through the stack to rebuild a call trace. Sometimes, the value in %o6 cannot be dumped and this happens when the panic has occurred in such a way that the register windows have not been able to be saved to the stack. Normally when we panic, we flush all the windows to the stack to avoid this problem.
The fact that saved register window frames on the stack allows us to peek back at the local and input registers together with the call trace helps us to build up a picture of the logic flow. The hardest part of this is correlating the assembler dumps of the affected functions with the source code. The reason this is often quite tricky is that the kernel is compiled with optimisation enabled so that the sequence of events in the 'C' source code doesn't seem to relate to what you see when you look at the assembler for the function, especially when loop unfolding has done it's evil work! To make matters worse, there are a great many macro's in the source which further convolute matters. However, don't be put off. Have a go at figuring out the sequence of events, even if you don't have the source code because it can be very useful for debugging your own device drivers, as you will have the source for those.
More information on crash dump analysis can be found in a book called PANIC: UNIX System Crash Dump Analysis by my friend and colleague, Kimberley Brown.