3       TRAPS - HOW SUNOS HANDLES THEM
        In this section we will look at how SunOS handles traps and look
        at some of the alternatives which were available.  Despite all the
        differences between SPARC v8 and v9 traps I'll do a fairly generic
        description here as it really isn't necessary to describe in detail
        what SunOS does for v9 traps as you can see from the previous
        section what the differences in trap processing are.  Suffice to
        say that the SunOS kernel adheres to those rules.  Instead, we'll
        concentrate on the principles used by the kernel when handling
        various traps.
        We'll look at some specifics in a moment but first we'll cover the
        generic trap handling algorithm.
	When traps are handled, the typical procedure is as follows:
	SPARC uses a large number of registers, some of which are globally
        accessible, some are accessible only from supervisor mode and
        the rest are divided up into "windows" of 8 local registers,
        8 input registers and 8 output registers.  The number of register
        windows is implementation dependent but is usually 7 or 8.
        Register windows appear to be arranged in a ring to software. This
        means that if you move round the ring you will wrap around to the
        first window again.  This gives software the illusion of a seemingly
        infinite number of register windows.  Software uses the "save"
        instruction to move round to a new window and "restore" to retreat
        back to a previous window.  This is most commonly used for procedure
        calls so that each procedure has it's own private set of local
        registers for it's own exclusive use.  To help you visualize the
        structure and use of register windows, imagine that each register
        window is a cardboard box.  Within each cardboard box there is a
        set of 8 small compartments which are private to that box (the local
        registers).  On one side of the box, you have an "in" tray for your
        input registers and on the other, an "out" tray for your output
        registers.  Now imagine that these boxes are arranged in a circle
        with the "out" tray of each box overlapping the "in" tray of the
        next box.  Finally, place a box in the centre of the ring marked
	"global" for the global registers.
	If you assume that you are the executing code in a register window,
        you will be standing in one of the boxes in the ring.  You will
        have exclusive use of your 8 local registers and you will be
        sharing your "out" registers with the next window's "in" registers,
        and your "in" registers with the previous window's "out" registers.
        Imagine for example that you wish to make a function call to add two
        numbers together.  You place these two numbers in the first two
        "out" registers and call the function.  You are now executing the
        addition function which, for the sake of argument, we will assume
        requires some local registers to do the addition.  The first thing
        that the addition function code does is a "save" instruction to
        move into the next register window so that it can have it's own
        set of local registers (step into the next cardboard box).  Now,
        you have a new set of local registers and the parameters to the
        addition function are in your "in" registers, because it overlaps
        the previous window's "out" registers.
	To illustrate, consider this code:
                .global simple_add
                .type   simple_add, #function
        simple_add:
                save    %sp, -96, %sp   ! Change window and save some stack
                mov     %i0, %l0        ! Load parameters into locals
                mov     %i1, %l1
                add     %l0, %l1, %i0   ! Add and write result into %i0
                ret                     ! Return to main
                restore                 ! delay slot: back to previous window
                .global main
                .type   main, #function
        main:
                mov     1, %o0          ! Move '1' into output reg zero
                call    simple_add      ! Call simple_add function
                mov     2, %o1          ! delay slot: move two into %o1
                ...
        In this example, we have two functions.  "main" loads the values
        1 and 2 into the first two output registers zero and one (%o0 and
        %o1).  Then it calls the "simple_add" function which does a "save"
        to get into a new register window and then it loads the parameters
        which are now in the "in" registers (%i0 and %i1) into some local
        registers, adds them together writing the result back into input
        register zero (%i0).  This could be greatly optimized but this is
        just an example.  After the add, the function returns to "main" by
        the "ret" instruction and the "restore" instruction (which is
        also executed in the ret delay slot) moves us back into the previous
        window (step back into the original box).  Now we have the result
        of the addition returned in our first output register (%o0).
        Window 0        Bit 1 is set if invalid (value = 1)
        Window 1        Bit 2 is set if invalid (value = 2)
        Window 2        Bit 3 is set if invalid (value = 4)
        Window 3        Bit 4 is set if invalid (value = 8)
        Window 4        Bit 5 is set if invalid (value = 16)
        Window 5        Bit 6 is set if invalid (value = 32)
        Window 6        Bit 7 is set if invalid (value = 64)
        Window 7        Bit 8 is set if invalid (value = 128)
So by logical bitwise operations we can test and set invalid
windows in the WIM.  If an attempt is made to move into an invalid
window by a "save" instruction, the processor will generate a
window overflow trap (window spill on SPARC v9).  Conversely, if we
take the opposite scenario where we attempt to retreat back into a
register window that is marked invalid in the WIM via a "restore"
instruction, the processor will generate a window underflow trap (or
a window_fill trap on SPARC v9).  This means that we can use this
behaviour to catch situations where we are about to wrap around the
ring of windows onto a previously used window by marking the last
window as invalid.  Then, when we get the window overflow (or spill)
trap, we can circumvent the problem by preserving the next valid
window on the stack, make that the new invalid window and validate
the currently invalid window so that we can continue into the next
window safely.  Also, when we retreat back through the register
window circle, we can restore the previously saved window from the
stack because we will get a window underflow trap (or fill) when
we attempt to "restore" back into it.
3.2.1   Register Windows, SPARC v9 State Registers
One of the major differences between SPARC v8 and v9 is that v9 has a set of privileged state registers to describe the state of the register window file. You need to read this section to understand the sections on the SPARC v9 Window Spill and Fill traps below.
3.2.2   SPARC v7/v8 Window Overflow Handling
When a window overflow trap occurs under SunOS, the trap handler knows the following:
                !
                ! On entry:
                !
                ! %l1 = trapped %pc (save)
                ! %l2 = trapped %npc
                !
                .global window_overflow
                .type   window_overflow, #function
        window_overflow:
                !
                ! Read the current WIM
                !
                mov     %wim, %l0
                !
                ! Find out how many register windows are implemented
                ! from 'nwindows', a variable we set when we first
                ! start
                !
                sethi   %hi(nwindows), %l4
                ld      [%lo(nwindows) + %l4], %l4
                !
                ! subtract 1 from the value in nwindows so that our
                ! modulo maths will work
                !
                sub     %l4, 1, %l4
                !
                ! now rotate the WIM right by 1 (modulo nwindows) so
                ! that the next window is marked invalid.  Once we have
                ! moved to the next window (to save it on the stack) we
                ! write the new WIM value.  However, our calculation here
                ! is done using locals so we must preserve a global register
                ! and use that to contain the result so that we can still
                ! see it when we change windows.
                !
                mov     %g1, %l6        ! Preserve a global
                srl     %l0, 1, %l5
                sll     %l0, %l4, %l0
                or      %l0, %l5, %g1   ! %g1 = new WIM
                !
                ! move to the next window, set the new WIM value and save
                ! the volatile window registers (local's and in's) to the
                ! stack, the "outs" don't matter.
                !
                save
                mov     %g1, %wim
                std     %l0, [%sp]              ! %sp is double word aligned
                std     %l2, [%sp + 8]
                std     %l4, [%sp + 16]
                std     %l6, [%sp + 24]
                std     %i0, [%sp + 32]
                std     %i2, [%sp + 40]
                std     %i4, [%sp + 48]
                std     %i6, [%sp + 56]
                !
                ! Return to the trap window and restore the global register
                !
                restore
                mov     %l6, %g1
                !
                ! All done.  Return from the trap
                !
                jmp     %l1
                rett    %l2
	The actual SunOS overflow handler has much more to it.  The biggest
        difference is that the above example doesn't differentiate between
        kernel and user windows whereas the SunOS one is forced to.  This
        means that the SunOS kernel has to check that the user's stack is
        paged in, aligned and valid for writing.  If not, the window has to
        be saved in a buffer temporarily while we raise a user page fault
        to get the page into memory.  Then we can continue with handling
        the overflow. 
3.2.3   SPARC v7/v8 Window Underflow Handling
	As for the overflow case, the window underflow handler can safely
        make some assumptions about the system state.  These are: 
                !
                ! On entry:
                !
                ! %l1 = trapped %pc (save)
                ! %l2 = trapped %npc
                !
                .global window_underflow
                .type   window_underflow, #function
        window_underflow:
                !
                ! Read the current WIM
                !
                mov     %wim, %l0
                !
                ! Find out how many register windows are implemented
                ! from 'nwindows', a variable we set when we first
                ! start
                !
                sethi   %hi(nwindows), %l4
                ld      [%lo(nwindows) + %l4], %l4
                !
                ! subtract 1 from the value in nwindows so that our
                ! modulo maths will work
                !
                sub     %l4, 1, %l4
                !
                ! Rotate the WIM left by one and set that new value
                ! in the %wim register
                !
                sll     %l0, 1, %l6
                srl     %l0, %l4, %l5
                or      %l5, %l6, %l5
                mov     %l5, %wim
                !
                ! Writes to the %wim have a potential 3-cycle latency so
                ! we can't change window until then.  Use 'nop' instructions
                ! in the delay cycles...
                !
                nop; nop; nop
                !
                ! Okay, now we restore twice to get into the target window
                ! (the one that was marked invalid) and we restore it from
                ! the stack
                !
                restore
                restore
                ldd     [%sp], %l0
                ldd     [%sp + 8], %l2
                ldd     [%sp + 16], %l4
                ldd     [%sp + 24], %l6
                ldd     [%sp + 32], %i0
                ldd     [%sp + 40], %i2
                ldd     [%sp + 48], %i4
                ldd     [%sp + 56], %i6
                !
                ! Get back to the trap window and return from
                ! the trap
                !
                save
                save
                jmp     %l1
                rett    %l2
	Again, just as in the overflow case, the actual SunOS underflow
        handler has to cope with windows used in user mode and therefore
        it has to be sure that the users stack is mapped in and valid.
	When a save instruction is executed and CANSAVE = 0, we have a
        window overflow exception, called a Spill Trap.  (If CANSAVE means
        nothing to you, go back and read section 3.2.1 for a description
        of the window state registers). 
	If OTHERWIN is zero, then we know that no other windows are used
        for alternate address spaces.  In this case, the trap vector taken
        is determined by the bit value in the WSTATE.NORMAL field.  We
        have a choice of eight normal spill trap vectors and the kernel
        can select which trap vector to use by asserting the corresponding
        bit(s) in the WSTATE.NORMAL field.  For example, if WSTATE.NORMAL
        contains the value '4', the spill trap taken will be spill_4_normal
        (trap 0x084). 
	If OTHERWIN is non-zero, the same vectoring strategy is used but
        this time the vector is determined by the value in the WSTATE.OTHER
        field.  Using a similar example, if a spill trap occurs and the
        value contained in WSTATE.OTHER is '4', the spill trap taken will
        be spill_4_other (trap 0x0A4). 
	When a spill trap occurs, the CWP is changed so that we are in
        the window we need to "spill" into.  All we need to do is save the
        window and return. 
	The spill trap vector entries in the trap table contain the first
        32 instructions of the trap handler.  In fact, the entire trap
        handler can be contained within this 32 instruction space!  The
        basic handler/vector would look similar to this: 
                !
                ! This is an example for spilling in a 32-bit address
                ! space.  For 64-bit, use stx instruction and adjust
                ! the %sp offsets accordingly
                !
                ! CWP has been set so that we are in the window that
                ! we need to spill (save).  Save the volatile window
                ! registers to the stack
                !
                st      %l0, [%sp + 0]
                st      %l1, [%sp + 4]
                st      %l2, [%sp + 8]
                st      %l3, [%sp + 12]
                st      %l4, [%sp + 16]
                st      %l5, [%sp + 20]
                st      %l6, [%sp + 24]
                st      %l7, [%sp + 28]
                st      %i0, [%sp + 32]
                st      %i1, [%sp + 36]
                st      %i2, [%sp + 40]
                st      %i3, [%sp + 44]
                st      %i4, [%sp + 48]
                st      %i5, [%sp + 52]
                st      %i6, [%sp + 56]
                st      %i7, [%sp + 60]
                !
                ! Now the window is saved, we do a "saved" instruction
                ! to adjust CANSAVE and CANRESTORE accordingly so that
                ! we can retry the trapped instruction without raising
                ! an exception again.
                !
                saved
                !
                ! Tell the IU to retry the trapped instruction...
                !
                retry
                !
                ! That's all folks...add nops or ".skip"s to fill
                ! the remainder of the 32 instruction space
                !
Note that it is up to the kernel to utilise the WSTATE.NORMAL and
        WSTATE.OTHER in it's own way and this means that you may want to
        write your vector code differently so that you store to different
        address spaces (remember that OTHERWIN and WSTATE allows us to have
        more than one address space...we can use primary and secondary
        address space identifiers in our spill vectors and use the WSTATE
        fields to select which vector we use). 
	Window fill works in the same way as the window spill handlers.
        The IU will adjust the CWP so that we are in the window that we
        need to "fill" and so we restore that window from the stack and
        return. 
	Just as for the window spill case, the OTHERWIN register determines
        whether we vector to a normal fill trap or an other fill trap.
        To use a similar example as in the spill section, assume that
        OTHERWIN is zero and the value in WSTATE.NORMAL is '4'.  This would
        result in us vectoring to the fill_4_normal trap (0x0C4).  If the
        OTHERWIN register was non-zero and WSTATE.OTHER happened to contain
        the value '4', we would vector to fill_4_other (trap 0x0E4).
	As explained previously, the kernel could (and does in the case of
        SunOS) use more than one address space in the register file, making
        use of the OTHERWIN and WSTATE register fields.  However, in order
        to complete the picture, the basic fill vector would look something
        like this: 
                !
                ! The CWP has been set so that we are in the window
                ! to "fill".  This window has been previously "spilled"
                ! so we restore from the stack
                !
                ld      [%sp + 0], %l0
                ld      [%sp + 4], %l1
                ld      [%sp + 8], %l2
                ld      [%sp + 12], %l3
                ld      [%sp + 16], %l4
                ld      [%sp + 20], %l5
                ld      [%sp + 24], %l6
                ld      [%sp + 28], %l7
                ld      [%sp + 32], %i0
                ld      [%sp + 36], %i1
                ld      [%sp + 40], %i2
                ld      [%sp + 44], %i3
                ld      [%sp + 48], %i4
                ld      [%sp + 52], %i5
                ld      [%sp + 56], %i6
                ld      [%sp + 60], %i7
                !
                ! Adjust CANSAVE and CANRESTORE accordingly
                !
                restored
                !
                ! Retry the trapped instruction
                !
                retry
                !
                ! That's it...fill the remaining 14 instructions
                ! with nops or a ".skip" directive
                !
	As I mentioned in the "spill" section previously, you can use the
        lda (load alternate) instructions to load from a different address
        space if you are using more than one address space in your register
        window file.  There are Address Space Identifiers (ASI's) which can
        be used to specify where to load or store to.  These represent
        primary and secondary address spaces, optionally with little endian
        access and/or user privileges (refer to the SPARC v9 architecture
        manual).  You would use the WSTATE fields to select an appropriate
        trap vector and your trap vectors would contain the sta (store
        alternate) or lda (load alternate) instructions to a specific ASI
        as required. 
	When the IU detects that an interrupt is pending, it will generate
        a trap and vector into the trap table so that the interrupt can be
        handled (see appendices for the numeric trap values that correspond
        to the interrupts and their priorities).
	The generic process for handling an interrupt in the kernel is to
        raise the processor interrupt level (PIL) to the same level as the
        occurring interrupt so that we do not run the risk of a lower
        priority interrupt butting in on our current interrupt handler.
        Then we would clear the interrupt pending bit in the system
        interrupt pending register (SIPR) and handle the interrupt with
        some code relevant to the interrupt type.  When we are finished
        with handling the interrupt, we restore the PIL to it's original
        value and return from the trap with a rett instruction.
	Interrupts are typically handled on a per-cpu interrupt stack and
        in some cases, the interrupt is cleared and a lower-priority
        soft interrupt is posted for a device driver or similar to deal
        with later.  We want to do as little as possible on receipt of a
        high level hard interrupt to avoid difficulties with deadlocks due
        to blocking I/O and lock contention (not to mention performance).
        The kernel can determine whether the interrupt is a soft interrupt
        by checking a specific soft interrupt bit in the interrupt
        register.
	The default trap table in the SunOS kernel directs most traps
        through a generic trap handler front end which then decides which
        lower level handler to call based on the trap type.  If the trap
        type is an interrupt (as in this case) the generic system trap
        front end calls _interrupt() to decide what to do with it.
	The _interrupt() routine compares the interrupt level against the
        high level threshold of the system (the LOCK LEVEL, typically that
        of the level-10 clock).  If the interrupt level is below the lock
        level, the interrupt will be handled as a separate interrupt thread.
        If not, then _interrupt()  first checks to see if this interrupt
        is a level-10 clock interrupt and if it is, it jumps directly
        to the level-10 handler.  The level-10 handler typically calls
        clock(), the function that is the root of all scheduling and
        callout queue administration.  (The level-10 interrupt is set to
        interrupt every 10 ms by default although the clock chip is
        programmable.  As of Solaris 2.6, it is possible for system
        administrators to affect the timing of the clock interrupts).
	If the interrupt is not a level-10 interrupt (or if the interrupt
        is handled on a lower-priority interrupt thread) the kernel will
        have to look through all the interrupt service routines (ISR's)
        that have registered an interest in this level of interrupt.  If
        there are no ISR's registered then the kernel will print out a
        "spurious level 'x' interrupt..." message.  If there are one or
        more ISR's registered for a given interrupt level, the kernel
        will call each ISR vector one by one.  On return from each ISR,
        the kernel checks the SIPR mask to see if the ISR has serviced
        the interrupt and cleared the pending bit.  If no ISR services
        the interrupt then the kernel will print out a message on the
        console like "level 'x' interrupt not serviced...". 
	Note that the higher level interrupts which are not handled with
        interrupt threads in effect commandeer the current thread running
        on the processor.  This is necessary because scheduling of threads
        is initiated by the receipt of a level-10 interrupt and so it does
        not make sense to use an interrupt thread which is subject to
        level-10 scheduling for interrupts of a higher priority!  However,
        this has certain implications for high level ISR writers because
        they have to take into account that they cannot use any blocking
        calls (or locks) that are affected by lower priority threads as
        they run the risk of a deadlock situation where the high priority
        ISR is blocked on a resource held by a lower priority thread and
        that resource will never be freed because the PIL has been set
        to the current interrupt level (meaning that level-10 clock
        interrupts won't occur, therefore threads won't get scheduled and
        the resource can never be released by the owning thread).  For
        this reason, high level ISR's will do as much as they are able to
        and then post a soft interrupt so that an interrupt thread can be
        started at a lower priority to finish off handling the interrupt,
        if necessary. 
A text or data fault occurs when an access is attempted to an address that is not valid in the MMU. For example, if an attempt is made to fetch an instruction from the address in %pc but that address is not backed by a valid mapping, then the IU will raise a text fault. Alternatively, if the action of executing an instruction attempts to write or read from an address that is not backed by a valid mapping, a data fault would be raised.
Basically, what happens is that the action of fetching an instruction (and often the action of executing that instruction) will cause virtual addresses to be presented to the MMU (either sunmmu on sun4 and sun4c systems or the sparc reference mmu (srmmu) for sun4m).
The MMU will walk through it's translation lookaside buffer (TLB), which is it's cache of recent virtual to physical translations, to see if it has an entry for this virtual address. If it does, it returns the appropriate physical address so that the MMU hardware can access the actual target address. If the virtual address does not exist in the TLB, the MMU will then walk through it's translation tables in memory to resolve the address to a valid physical address. Ultimately the walk through will lead down to a table of page table entries (PTE's) which contain various flags which indicate whether the page is valid (mapped in) or not. If the PTE does not exist, or if the valid flag is not set, or if the page is marked as swapped out, then this causes the MMU to raise an exception which in turn causes the IU to raise a text or data fault according to the type of access (instruction fetch or data access). This description is pretty generic as it would take quite a lot of detail on the MMU hardware and software table layout to explain it at a lower level. Anyway, you get the picture I hope. The kernel is notified of this MMU exception when the text/data fault comes in through the trap table. The fault handler will then attempt to correct the fault by mapping in the required page. For example, a text fault may indicate to the kernel that the next page of program text needs to be mapped in from the executable file. The kernel would then access the executable file via the vnode to satisfy that mapping. Likewise, if it's a data fault, the kernel will attempt to retrieve the page either from the executable's data segment in the file or from swap if the page was swapped out.
If the address is totally bogus (ie. a bad pointer) then there will be no way to satisfy the fault and this will result in either a "bad trap, text/data fault" panic if it's a kernel fault or a segmentation fault signal (SIGSEGV) being sent to the user process if it's a user fault. One other possibility is a SIGBUS to signal a bus error. This is usually because the user fault would normally be satisfied by mapping in the page but for some reason it doesn't exist any more or because the underlying executable file has been changed. Some UNIX implementations do disallow this and return ETXTBSY to the writing process but Solaris will allow writes to executing files. This is a direct consequence of the new vm system. The system must be able to write mapped files, including those that are mapped for execution. Since with NFS there's no way to determine that someone is executing out of a file, and with shared libraries it's difficult to determine that someone is executing out of a library, and since there are a number of other applications in which it is important to be able to write mappings that are also being executed (self-modifying code), ETXTBSY no longer makes sense. In the case of a kernel initiated fault, we can tell that it is a kernel fault because the processor will be in supervisor mode when the trap was taken and this is indicated in the PSR. Also, we know that kernel pages are never swapped out so that cuts down on page fault processing overhead. The text faulting facility is the basis of demand paging.