SPARC traps under SunOS

By: Jim Moore, SunSoft, Sun Microsystems Inc
Email: Jim.Moore@UK.Sun.COM
Version: 1.2
Date: 12 September 1997


Chapter 3

3 TRAPS - HOW SUNOS HANDLES THEM

In this section we will look at how SunOS handles traps and look at some of the alternatives which were available. Despite all the differences between SPARC v8 and v9 traps I'll do a fairly generic description here as it really isn't necessary to describe in detail what SunOS does for v9 traps as you can see from the previous section what the differences in trap processing are. Suffice to say that the SunOS kernel adheres to those rules. Instead, we'll concentrate on the principles used by the kernel when handling various traps.

3.1 Generic Trap Handling

We'll look at some specifics in a moment but first we'll cover the generic trap handling algorithm.
When traps are handled, the typical procedure is as follows:

  1. Check CWP.
    If we need to handle the trap by jumping to 'C' (which would use save and restore instructions between function calls) then we must make sure we won't have cause an overflow when we dive into 'C'. If we do detect that this would be a problem we do the overflow processing now.
  2. Is this an interrupt?
    If so, jump to the interrupt handler. Refer to section 3.3 on interrupts.
  3. Enable traps and dive into the standard trap handler.
    We enable traps so that we can catch any exceptions brought about by handling *this* trap without causing a watchdog reset.
  4. On return from the trap handler,
    we check the CWP with the CWP we came in with at the start to see if we have to undo the overflow processing we might have done before, so that we don't get an underflow when we return to the trapped instruction (or worse, execution continues in the WRONG window).
  5. Before we actually return to the trapped instruction,
    we check to see if kprunrun is asserted (ie. a higher priority thread is waiting to run). If so, we allow preemption to occur.
Traps are used by SunOS for system calls as well as for machine generated exceptions. The parameters to the system call are placed in the output registers, the number of the system call required (see /usr/include/sys/syscall.h) is placed in %g1 and then it executes a "ta 0x8" instruction. This appears in the kernel as a trap with TT = 0x88 and the system trap handler determines this to be a system call and calls the relevant function as per the system call number in %g1.
Note: SunOS 4.x system calls come in via trap 0x80 (software trap 0x0).
Occasionally, a process will attempt to execute from a page of VM that is not mapped in (ie. it is marked invalid in the MMU) and this will cause a text fault trap. The kernel will then attempt to map in the required text page and resume execution. However, if the process does not have the correct permissions or the mapping cannot be satisfied then the kernel will mark a pending SIGSEGV segmentation violation against that process and then resume execution of the process. A similar scenario applies to data faults; a process attempts to read or write to an address in a page marked invalid in the MMU and the kernel will attempt to map in the corresponding page for this address if possible (ie. maybe the page has been swapped out or this is the first attempt to read from that page and so we demand-page it in). I'll explain all this in detail in another text on process address spaces, paging and swapping which I plan to do as soon as I get time.
A "bad trap" is simply a trap that cannot be handled (or isn't supported). Usually under SunOS a bad trap has a type of 9 or 2, for data or text fault respectively (maybe 7 for alignment in some cases).

3.2 Register Windows

SPARC uses a large number of registers, some of which are globally accessible, some are accessible only from supervisor mode and the rest are divided up into "windows" of 8 local registers, 8 input registers and 8 output registers. The number of register windows is implementation dependent but is usually 7 or 8.
Register windows appear to be arranged in a ring to software. This means that if you move round the ring you will wrap around to the first window again. This gives software the illusion of a seemingly infinite number of register windows. Software uses the "save" instruction to move round to a new window and "restore" to retreat back to a previous window. This is most commonly used for procedure calls so that each procedure has it's own private set of local registers for it's own exclusive use. To help you visualize the structure and use of register windows, imagine that each register window is a cardboard box. Within each cardboard box there is a set of 8 small compartments which are private to that box (the local registers). On one side of the box, you have an "in" tray for your input registers and on the other, an "out" tray for your output registers. Now imagine that these boxes are arranged in a circle with the "out" tray of each box overlapping the "in" tray of the next box. Finally, place a box in the centre of the ring marked "global" for the global registers.
If you assume that you are the executing code in a register window, you will be standing in one of the boxes in the ring. You will have exclusive use of your 8 local registers and you will be sharing your "out" registers with the next window's "in" registers, and your "in" registers with the previous window's "out" registers.
Imagine for example that you wish to make a function call to add two numbers together. You place these two numbers in the first two "out" registers and call the function. You are now executing the addition function which, for the sake of argument, we will assume requires some local registers to do the addition. The first thing that the addition function code does is a "save" instruction to move into the next register window so that it can have it's own set of local registers (step into the next cardboard box). Now, you have a new set of local registers and the parameters to the addition function are in your "in" registers, because it overlaps the previous window's "out" registers.
To illustrate, consider this code:

                .global simple_add
                .type   simple_add, #function
        simple_add:
                save    %sp, -96, %sp   ! Change window and save some stack
                mov     %i0, %l0        ! Load parameters into locals
                mov     %i1, %l1
                add     %l0, %l1, %i0   ! Add and write result into %i0
                ret                     ! Return to main
                restore                 ! delay slot: back to previous window

                .global main
                .type   main, #function
        main:
                mov     1, %o0          ! Move '1' into output reg zero
                call    simple_add      ! Call simple_add function
                mov     2, %o1          ! delay slot: move two into %o1
                ...

In this example, we have two functions. "main" loads the values 1 and 2 into the first two output registers zero and one (%o0 and %o1). Then it calls the "simple_add" function which does a "save" to get into a new register window and then it loads the parameters which are now in the "in" registers (%i0 and %i1) into some local registers, adds them together writing the result back into input register zero (%i0). This could be greatly optimized but this is just an example. After the add, the function returns to "main" by the "ret" instruction and the "restore" instruction (which is also executed in the ret delay slot) moves us back into the previous window (step back into the original box). Now we have the result of the addition returned in our first output register (%o0).
At this point you may be wondering what happens if you wrap all the way around the ring of register windows (by doing recursive procedure calls for example). How do we prevent ourselves from writing over a window at the other end of the circle in this case?
The answer to this is that the OS marks the last window in the ring as invalid so that when we get to the penultimate window, the next attempt to move to a new window will generate a trap because it is marked invalid. There is a Window Invalid Mask register (WIM) which is a bit mask of invalid windows. For example, on an implementation with eight register windows, the bottom 8 bits of the WIM are used to represent each window:

        Window 0        Bit 1 is set if invalid (value = 1)
        Window 1        Bit 2 is set if invalid (value = 2)
        Window 2        Bit 3 is set if invalid (value = 4)
        Window 3        Bit 4 is set if invalid (value = 8)
        Window 4        Bit 5 is set if invalid (value = 16)
        Window 5        Bit 6 is set if invalid (value = 32)
        Window 6        Bit 7 is set if invalid (value = 64)
        Window 7        Bit 8 is set if invalid (value = 128)
So by logical bitwise operations we can test and set invalid windows in the WIM. If an attempt is made to move into an invalid window by a "save" instruction, the processor will generate a window overflow trap (window spill on SPARC v9). Conversely, if we take the opposite scenario where we attempt to retreat back into a register window that is marked invalid in the WIM via a "restore" instruction, the processor will generate a window underflow trap (or a window_fill trap on SPARC v9). This means that we can use this behaviour to catch situations where we are about to wrap around the ring of windows onto a previously used window by marking the last window as invalid. Then, when we get the window overflow (or spill) trap, we can circumvent the problem by preserving the next valid window on the stack, make that the new invalid window and validate the currently invalid window so that we can continue into the next window safely. Also, when we retreat back through the register window circle, we can restore the previously saved window from the stack because we will get a window underflow trap (or fill) when we attempt to "restore" back into it.
In addition, because traps are similar to unexpected procedure calls, the SPARC v7/v8 IU will always move us into the next window whenever a trap occurs so that we are guaranteed a free set of local registers to process the trap. This is another reason for marking one window invalid so that we can guarantee there will always be a window free for handling traps. For SPARC v9, the current window is NOT changed for any traps except window_fill and window_spill because the v9 architecture has a concept of trap nesting (see section 2.2.2.1) which allows the processor to preserve it's state, making it unnecessary for us to rotate around the register windows in order to get fresh registers. All we do in this case is save any registers to the stack, process the trap, restore the registers and return. The processor will restore the state when we do so (see section 2.2.2).
It isn't a rule that the kernel uses register windows this way. It is up to the OS designer to decided if they want to use the rotating register window facility or not. Another option would be to use one register window for traps and interrupts, one for the kernel and the rest for user processes. This would sacrifice the advantages of register windows but would simplify context switching...just change CWP to change context! This does raise some other implementation problems but for a small embedded system, it may be a viable approach.
SPARC v7/v8 Important Notes:
The Current Window Pointer (CWP) is a field in the Processor Status Register (PSR) and this field contains the number of the current window. A "save" instruction decrements the CWP whereas a "restore" increments it. Also note that an increment or decrement of the CWP is modulo the number of implemented register windows.
SPARC v9 Important Notes:
The Current Window Pointer (CWP) is a register itself. To further confuse matters, a "save" will increment the CWP whereas a "restore" will decrement it. This is the opposite of the v7/v8 behaviour! However, the increment or decrement is still modulo the number of implemented windows.
Let's take a look at how SunOS handles window overflow and underflow on SPARC v7/v8 (fill and spill on v9) but before we go into it, you must remember the following rules following rules:
  1. The CWP (Current Window Pointer) contains the number of the register window we are currently executing in.
  2. On SPARC v7/v8 a trap will ALWAYS cause the CWP to be decremented regardless of whether the next window is valid or not in the WIM. This is so that the trap handler will have a new set of locals with which it can process the trap. This is another point that you must remember to avoid confusion later. However, on SPARC v9 the window only changes when the trap is a register window trap (ie. fill or spill). Other v9 traps do NOT change the CWP.
  3. On SPARC v7/v8, the kernel returns from a trap using the rett instruction. The rett instruction causes the CWP to be incremented, putting us back in the window we were executing in when the trap occurred. On v9, returning from traps is done via the "done" or "retry" instruction.

3.2.1 Register Windows, SPARC v9 State Registers

One of the major differences between SPARC v8 and v9 is that v9 has a set of privileged state registers to describe the state of the register window file. You need to read this section to understand the sections on the SPARC v9 Window Spill and Fill traps below.

CWP - Current Window Pointer
On SPARC v9, the CWP is a separate register instead of being a field in the PSR. Aside from that the purpose of the CWP is the same. Don't forget that save's and restore's increment and decrement the CWP in the opposite direction than on SPARC v7/v8.
CANSAVE - Savable Windows
The CANSAVE register contains the number of register windows following the CWP that are not in use and are therefore available for allocation by a save instruction without generating a Window Spill trap.
CANRESTORE - Restorable Windows
Exactly the inverse of the CANSAVE register; this register contains the number of register windows preceeding the CWP that are in use by software and can be restored to via the restore instruction without causing a Window Fill trap.
OTHERWIN - Other Windows
The OTHERWIN register contains the number of register windows in the ring outside of those accounted for by CWP, CANSAVE, CANRESTORE and one overlap window (akin to a trap window under v8). The algorithm is CANSAVE + CANRESTORE + OTHERWIN = NWINDOWS - 2.
The windows covered by OTHERWIN are used to spill or fill when CWP moves beyond CANSAVE or CANRESTORE respectively.
WSTATE - Window State
The WSTATE register contains two fields, OTHER and NORMAL. In each of these fields, the bits are used to select one of eight different trap vectors for spill/fill exceptions. If OTHERWIN = 0 at the time of a trap, then the bits in the WSTATE.NORMAL field are used to determine the trap vector, otherwise the WSTATE.OTHER bits are used. This can be used in conjunction with the OTHERWIN register to segregate one or more contiguous windows for an alternate address space to the current one, if the supervisor software so decides. We'll look at this vectoring in the sections on spill and fill traps below.
CLEANWIN - Clean Windows
This register contains the number of windows that are "clean" from the perspective of the current program, either because they are zeroed or because they contain valid data/addresses for that programs address space. When a clean window is requested via a save instruction and none are available, a Clean Window trap occurs to cause the next window to be scrubbed.
Now that explains what the register window state registers are, we can move on to look at the overflow/spill and underflow/fill handlers...

3.2.2 SPARC v7/v8 Window Overflow Handling

When a window overflow trap occurs under SunOS, the trap handler knows the following:

Therefore, the trap handler must validate the invalid window so that the "save" can succeed when we return from the trap. However, if we validate this window, we will need to mark the next window as invalid so that we can catch any subsequent attempt to move into that window. Also, we have to preserve the new invalid window on the stack as we know that it is a window that has been used previously and to which we will ultimately want to "restore" to. So the strategy is that once we have rotated around the register window ring once, we then start to save the windows to the stack before continuing around it again so that when we restore back around the window ring, we can reinstate the windows from the stack as we go.
All this means that an overflow handler has to do the following:
  1. The next window will be the new invalid window, so we "save" to get into that window and store it's registers on the stack.
  2. Rotate the WIM right by one, modulo the number of implemented register windows (because "saves" decrement). This makes the next window invalid.
  3. "restore" from the new invalid window back into the trap window we came in. We can now return from the trap to the original window and "save" instruction, which will be executed successfully.
Here is the source to a simple window overflow handler:

                !
                ! On entry:
                !
                ! %l1 = trapped %pc (save)
                ! %l2 = trapped %npc
                !
                .global window_overflow
                .type   window_overflow, #function
        window_overflow:
                !
                ! Read the current WIM
                !
                mov     %wim, %l0

                !
                ! Find out how many register windows are implemented
                ! from 'nwindows', a variable we set when we first
                ! start
                !
                sethi   %hi(nwindows), %l4
                ld      [%lo(nwindows) + %l4], %l4

                !
                ! subtract 1 from the value in nwindows so that our
                ! modulo maths will work
                !
                sub     %l4, 1, %l4

                !
                ! now rotate the WIM right by 1 (modulo nwindows) so
                ! that the next window is marked invalid.  Once we have
                ! moved to the next window (to save it on the stack) we
                ! write the new WIM value.  However, our calculation here
                ! is done using locals so we must preserve a global register
                ! and use that to contain the result so that we can still
                ! see it when we change windows.
                !
                mov     %g1, %l6        ! Preserve a global
                srl     %l0, 1, %l5
                sll     %l0, %l4, %l0
                or      %l0, %l5, %g1   ! %g1 = new WIM

                !
                ! move to the next window, set the new WIM value and save
                ! the volatile window registers (local's and in's) to the
                ! stack, the "outs" don't matter.
                !
                save
                mov     %g1, %wim
                std     %l0, [%sp]              ! %sp is double word aligned
                std     %l2, [%sp + 8]
                std     %l4, [%sp + 16]
                std     %l6, [%sp + 24]
                std     %i0, [%sp + 32]
                std     %i2, [%sp + 40]
                std     %i4, [%sp + 48]
                std     %i6, [%sp + 56]

                !
                ! Return to the trap window and restore the global register
                !
                restore
                mov     %l6, %g1

                !
                ! All done.  Return from the trap
                !
                jmp     %l1
                rett    %l2
The actual SunOS overflow handler has much more to it. The biggest difference is that the above example doesn't differentiate between kernel and user windows whereas the SunOS one is forced to. This means that the SunOS kernel has to check that the user's stack is paged in, aligned and valid for writing. If not, the window has to be saved in a buffer temporarily while we raise a user page fault to get the page into memory. Then we can continue with handling the overflow.

3.2.3 SPARC v7/v8 Window Underflow Handling

As for the overflow case, the window underflow handler can safely make some assumptions about the system state. These are:

This means that the underflow handler has to restore the invalid window from the stack and rotate the Window Invalid Mask left by one (modulo the number of implemented windows) so that when we return from the trap, we can move to the previous window without causing an exception and also with the correct register values back in place. This is the converse situation to the strategy we described in the overflow handler. In the overflow case, we have run out of register windows and so we save previously used windows on to the stack so that we can wrap around the ring safely. This is the inverse scenario...we are retreating back around the ring restoring the previously saved windows as we go.
Therefore the basic stages that the window underflow handler must go through are as follows:
  1. Adjust the WIM so that the window prior to the invalid one is marked invalid instead.
  2. Move back two windows into the currently invalid window (we move two windows back because the occurrence of the trap has moved us one window forwards).
  3. Restore the window from the stack.
  4. Return to the trap window (two windows forward) and return from the trap.
Here is an example of a simple case window underflow handler:
                !
                ! On entry:
                !
                ! %l1 = trapped %pc (save)
                ! %l2 = trapped %npc
                !
                .global window_underflow
                .type   window_underflow, #function
        window_underflow:
                !
                ! Read the current WIM
                !
                mov     %wim, %l0

                !
                ! Find out how many register windows are implemented
                ! from 'nwindows', a variable we set when we first
                ! start
                !
                sethi   %hi(nwindows), %l4
                ld      [%lo(nwindows) + %l4], %l4

                !
                ! subtract 1 from the value in nwindows so that our
                ! modulo maths will work
                !
                sub     %l4, 1, %l4

                !
                ! Rotate the WIM left by one and set that new value
                ! in the %wim register
                !
                sll     %l0, 1, %l6
                srl     %l0, %l4, %l5
                or      %l5, %l6, %l5
                mov     %l5, %wim

                !
                ! Writes to the %wim have a potential 3-cycle latency so
                ! we can't change window until then.  Use 'nop' instructions
                ! in the delay cycles...
                !
                nop; nop; nop

                !
                ! Okay, now we restore twice to get into the target window
                ! (the one that was marked invalid) and we restore it from
                ! the stack
                !
                restore
                restore
                ldd     [%sp], %l0
                ldd     [%sp + 8], %l2
                ldd     [%sp + 16], %l4
                ldd     [%sp + 24], %l6
                ldd     [%sp + 32], %i0
                ldd     [%sp + 40], %i2
                ldd     [%sp + 48], %i4
                ldd     [%sp + 56], %i6

                !
                ! Get back to the trap window and return from
                ! the trap
                !
                save
                save
                jmp     %l1
                rett    %l2
Again, just as in the overflow case, the actual SunOS underflow handler has to cope with windows used in user mode and therefore it has to be sure that the users stack is mapped in and valid.

3.2.4 SPARC v9 Window Spill

When a save instruction is executed and CANSAVE = 0, we have a window overflow exception, called a Spill Trap. (If CANSAVE means nothing to you, go back and read section 3.2.1 for a description of the window state registers).
If OTHERWIN is zero, then we know that no other windows are used for alternate address spaces. In this case, the trap vector taken is determined by the bit value in the WSTATE.NORMAL field. We have a choice of eight normal spill trap vectors and the kernel can select which trap vector to use by asserting the corresponding bit(s) in the WSTATE.NORMAL field. For example, if WSTATE.NORMAL contains the value '4', the spill trap taken will be spill_4_normal (trap 0x084).
If OTHERWIN is non-zero, the same vectoring strategy is used but this time the vector is determined by the value in the WSTATE.OTHER field. Using a similar example, if a spill trap occurs and the value contained in WSTATE.OTHER is '4', the spill trap taken will be spill_4_other (trap 0x0A4).
When a spill trap occurs, the CWP is changed so that we are in the window we need to "spill" into. All we need to do is save the window and return.
The spill trap vector entries in the trap table contain the first 32 instructions of the trap handler. In fact, the entire trap handler can be contained within this 32 instruction space! The basic handler/vector would look similar to this:

                !
                ! This is an example for spilling in a 32-bit address
                ! space.  For 64-bit, use stx instruction and adjust
                ! the %sp offsets accordingly
                !
                ! CWP has been set so that we are in the window that
                ! we need to spill (save).  Save the volatile window
                ! registers to the stack
                !
                st      %l0, [%sp + 0]
                st      %l1, [%sp + 4]
                st      %l2, [%sp + 8]
                st      %l3, [%sp + 12]
                st      %l4, [%sp + 16]
                st      %l5, [%sp + 20]
                st      %l6, [%sp + 24]
                st      %l7, [%sp + 28]
                st      %i0, [%sp + 32]
                st      %i1, [%sp + 36]
                st      %i2, [%sp + 40]
                st      %i3, [%sp + 44]
                st      %i4, [%sp + 48]
                st      %i5, [%sp + 52]
                st      %i6, [%sp + 56]
                st      %i7, [%sp + 60]
                !
                ! Now the window is saved, we do a "saved" instruction
                ! to adjust CANSAVE and CANRESTORE accordingly so that
                ! we can retry the trapped instruction without raising
                ! an exception again.
                !
                saved
                !
                ! Tell the IU to retry the trapped instruction...
                !
                retry
                !
                ! That's all folks...add nops or ".skip"s to fill
                ! the remainder of the 32 instruction space
                !
Note that it is up to the kernel to utilise the WSTATE.NORMAL and WSTATE.OTHER in it's own way and this means that you may want to write your vector code differently so that you store to different address spaces (remember that OTHERWIN and WSTATE allows us to have more than one address space...we can use primary and secondary address space identifiers in our spill vectors and use the WSTATE fields to select which vector we use).

3.2.5 SPARC v9 Window Fill

Window fill works in the same way as the window spill handlers. The IU will adjust the CWP so that we are in the window that we need to "fill" and so we restore that window from the stack and return.
Just as for the window spill case, the OTHERWIN register determines whether we vector to a normal fill trap or an other fill trap. To use a similar example as in the spill section, assume that OTHERWIN is zero and the value in WSTATE.NORMAL is '4'. This would result in us vectoring to the fill_4_normal trap (0x0C4). If the OTHERWIN register was non-zero and WSTATE.OTHER happened to contain the value '4', we would vector to fill_4_other (trap 0x0E4).
As explained previously, the kernel could (and does in the case of SunOS) use more than one address space in the register file, making use of the OTHERWIN and WSTATE register fields. However, in order to complete the picture, the basic fill vector would look something like this:

                !
                ! The CWP has been set so that we are in the window
                ! to "fill".  This window has been previously "spilled"
                ! so we restore from the stack
                !
                ld      [%sp + 0], %l0
                ld      [%sp + 4], %l1
                ld      [%sp + 8], %l2
                ld      [%sp + 12], %l3
                ld      [%sp + 16], %l4
                ld      [%sp + 20], %l5
                ld      [%sp + 24], %l6
                ld      [%sp + 28], %l7
                ld      [%sp + 32], %i0
                ld      [%sp + 36], %i1
                ld      [%sp + 40], %i2
                ld      [%sp + 44], %i3
                ld      [%sp + 48], %i4
                ld      [%sp + 52], %i5
                ld      [%sp + 56], %i6
                ld      [%sp + 60], %i7
                !
                ! Adjust CANSAVE and CANRESTORE accordingly
                !
                restored
                !
                ! Retry the trapped instruction
                !
                retry
                !
                ! That's it...fill the remaining 14 instructions
                ! with nops or a ".skip" directive
                !
As I mentioned in the "spill" section previously, you can use the lda (load alternate) instructions to load from a different address space if you are using more than one address space in your register window file. There are Address Space Identifiers (ASI's) which can be used to specify where to load or store to. These represent primary and secondary address spaces, optionally with little endian access and/or user privileges (refer to the SPARC v9 architecture manual). You would use the WSTATE fields to select an appropriate trap vector and your trap vectors would contain the sta (store alternate) or lda (load alternate) instructions to a specific ASI as required.

3.3 Interrupts

When the IU detects that an interrupt is pending, it will generate a trap and vector into the trap table so that the interrupt can be handled (see appendices for the numeric trap values that correspond to the interrupts and their priorities).
The generic process for handling an interrupt in the kernel is to raise the processor interrupt level (PIL) to the same level as the occurring interrupt so that we do not run the risk of a lower priority interrupt butting in on our current interrupt handler. Then we would clear the interrupt pending bit in the system interrupt pending register (SIPR) and handle the interrupt with some code relevant to the interrupt type. When we are finished with handling the interrupt, we restore the PIL to it's original value and return from the trap with a rett instruction.
Interrupts are typically handled on a per-cpu interrupt stack and in some cases, the interrupt is cleared and a lower-priority soft interrupt is posted for a device driver or similar to deal with later. We want to do as little as possible on receipt of a high level hard interrupt to avoid difficulties with deadlocks due to blocking I/O and lock contention (not to mention performance). The kernel can determine whether the interrupt is a soft interrupt by checking a specific soft interrupt bit in the interrupt register.
The default trap table in the SunOS kernel directs most traps through a generic trap handler front end which then decides which lower level handler to call based on the trap type. If the trap type is an interrupt (as in this case) the generic system trap front end calls _interrupt() to decide what to do with it.
The _interrupt() routine compares the interrupt level against the high level threshold of the system (the LOCK LEVEL, typically that of the level-10 clock). If the interrupt level is below the lock level, the interrupt will be handled as a separate interrupt thread. If not, then _interrupt() first checks to see if this interrupt is a level-10 clock interrupt and if it is, it jumps directly to the level-10 handler. The level-10 handler typically calls clock(), the function that is the root of all scheduling and callout queue administration. (The level-10 interrupt is set to interrupt every 10 ms by default although the clock chip is programmable. As of Solaris 2.6, it is possible for system administrators to affect the timing of the clock interrupts).
If the interrupt is not a level-10 interrupt (or if the interrupt is handled on a lower-priority interrupt thread) the kernel will have to look through all the interrupt service routines (ISR's) that have registered an interest in this level of interrupt. If there are no ISR's registered then the kernel will print out a "spurious level 'x' interrupt..." message. If there are one or more ISR's registered for a given interrupt level, the kernel will call each ISR vector one by one. On return from each ISR, the kernel checks the SIPR mask to see if the ISR has serviced the interrupt and cleared the pending bit. If no ISR services the interrupt then the kernel will print out a message on the console like "level 'x' interrupt not serviced...".
Note that the higher level interrupts which are not handled with interrupt threads in effect commandeer the current thread running on the processor. This is necessary because scheduling of threads is initiated by the receipt of a level-10 interrupt and so it does not make sense to use an interrupt thread which is subject to level-10 scheduling for interrupts of a higher priority! However, this has certain implications for high level ISR writers because they have to take into account that they cannot use any blocking calls (or locks) that are affected by lower priority threads as they run the risk of a deadlock situation where the high priority ISR is blocked on a resource held by a lower priority thread and that resource will never be freed because the PIL has been set to the current interrupt level (meaning that level-10 clock interrupts won't occur, therefore threads won't get scheduled and the resource can never be released by the owning thread). For this reason, high level ISR's will do as much as they are able to and then post a soft interrupt so that an interrupt thread can be started at a lower priority to finish off handling the interrupt, if necessary.

3.4 Text and Data Faults

A text or data fault occurs when an access is attempted to an address that is not valid in the MMU. For example, if an attempt is made to fetch an instruction from the address in %pc but that address is not backed by a valid mapping, then the IU will raise a text fault. Alternatively, if the action of executing an instruction attempts to write or read from an address that is not backed by a valid mapping, a data fault would be raised.

Basically, what happens is that the action of fetching an instruction (and often the action of executing that instruction) will cause virtual addresses to be presented to the MMU (either sunmmu on sun4 and sun4c systems or the sparc reference mmu (srmmu) for sun4m).

The MMU will walk through it's translation lookaside buffer (TLB), which is it's cache of recent virtual to physical translations, to see if it has an entry for this virtual address. If it does, it returns the appropriate physical address so that the MMU hardware can access the actual target address. If the virtual address does not exist in the TLB, the MMU will then walk through it's translation tables in memory to resolve the address to a valid physical address. Ultimately the walk through will lead down to a table of page table entries (PTE's) which contain various flags which indicate whether the page is valid (mapped in) or not. If the PTE does not exist, or if the valid flag is not set, or if the page is marked as swapped out, then this causes the MMU to raise an exception which in turn causes the IU to raise a text or data fault according to the type of access (instruction fetch or data access). This description is pretty generic as it would take quite a lot of detail on the MMU hardware and software table layout to explain it at a lower level. Anyway, you get the picture I hope. The kernel is notified of this MMU exception when the text/data fault comes in through the trap table. The fault handler will then attempt to correct the fault by mapping in the required page. For example, a text fault may indicate to the kernel that the next page of program text needs to be mapped in from the executable file. The kernel would then access the executable file via the vnode to satisfy that mapping. Likewise, if it's a data fault, the kernel will attempt to retrieve the page either from the executable's data segment in the file or from swap if the page was swapped out.

If the address is totally bogus (ie. a bad pointer) then there will be no way to satisfy the fault and this will result in either a "bad trap, text/data fault" panic if it's a kernel fault or a segmentation fault signal (SIGSEGV) being sent to the user process if it's a user fault. One other possibility is a SIGBUS to signal a bus error. This is usually because the user fault would normally be satisfied by mapping in the page but for some reason it doesn't exist any more or because the underlying executable file has been changed. Some UNIX implementations do disallow this and return ETXTBSY to the writing process but Solaris will allow writes to executing files. This is a direct consequence of the new vm system. The system must be able to write mapped files, including those that are mapped for execution. Since with NFS there's no way to determine that someone is executing out of a file, and with shared libraries it's difficult to determine that someone is executing out of a library, and since there are a number of other applications in which it is important to be able to write mappings that are also being executed (self-modifying code), ETXTBSY no longer makes sense. In the case of a kernel initiated fault, we can tell that it is a kernel fault because the processor will be in supervisor mode when the trap was taken and this is indicated in the PSR. Also, we know that kernel pages are never swapped out so that cuts down on page fault processing overhead. The text faulting facility is the basis of demand paging.


Go to chapter 4 Back to table of contents