Page Table Structure and Hardware Support

Page Table Structure and Hardware Support:

Different operating systems have different methods for implementing page tables. Most allocate a page table per process. A pointer to page table of each process is maintained with all of the other register values associated with the process and stored in its PCB.

Reloading these register values and defining the correct hardware page table values from the stored process/user page table is part of a context-switch.

The actual implementation of the page table can be accomplished in several different ways. The simplest technique involves defining a dedicated set of registers to be used as the page table. Since every memory access requires going through the page table, these registers need very high-speed logic associated with them to make the address translation efficient. This register technique works reasonably well if the size of page table is relatively small.

Most modern systems allow extremely large page tables i.e. a million entries or more. r;6r such machines, the register implementation of the page table is not feasible primarily due to cost considerations. Instead of registers such machines keep the page table in main memory and use a page table base register (PTBR) to maintain a pointer to the page table. Changing page tables during a context switch requires changing only the value in this register rather than physically loading a large number of registers.

This approach has an inherent problem with the time that is required to access a logical memory address. To access logical address n, access must first be into the page table using the PTBR offset by the logical page number in which n is located. This requires a memory access. This access provides the physical frame number in which the logical page holding n is currently located. Then the memory access to this address occurs. This scheme requires two memory accesses for every logical address generated by the CPU. This doubles the time required to perform a physical memory access when the CPU generates a logical address. Such an increase in the time required to perform a memory access is not tolerable.

The standard solution to this problem is to use content-associative memory (CAM). It is also called content-addressable memory, associative registers or translation look-aside buffers (TLBs). CAM is built from extremely high-speed memory where each cell (these can be thought of as registers) consists of two parts: a key and a value.

When the CAM is presented with an item to match, that item is compared with all the keys simultaneously and if one of the cells keys matches with the item to be matched, its value component is output.

When used as a page table, the CAM is presented with an item to match that represents a logical page number. Each cell in the CAM represents one page table entry where the value part of the cell holds the physical frame number in which the logical page currently resides. This is the value, which is output by the cell. If the logical page number is found in the CAM, its frame number becomes immediately available and is used to access the physical memory. While this type of memory is quite expensive, it is also extremely fast. Typically a CAM used for such purposes contains between 8 and 2048 cells. If the page number is not in the CAM, then a memory reference to the page table (in memory) must be made. When the frame number is obtained from the physical memory then it can be used for the translation and performing the second memory access. This page number and frame number are added to the CAM so, that on the next request it will be in the CAM.

If the CAM is already full, the operating system must select a CAM entry for removal so that the one can be entered. Operating system will use a CAM entry replacement protocol for the basis of this decision. Each context-switch will require that the CAM be flushed to ensure that the next process does not use the translation information left behind by the process just switched out.

Each logical address request generated by the CPU whose translation information is in the CAM at the time of the request is called a CAM hit. The percentage of time that this occurs is called the CAM hit ratio. An 86% hit ratio means that 86% of the time the necessary translation information is in the CAM. For example, if it takes 15 nanoseconds to search the CAM and a memory access requires 100 nanoseconds, then the mapped memory access requires a total of 115 nanoseconds if there is a CAM hit. If there is a CAM miss on the logical address request, then the total time required will be 215 nanoseconds, since two memory accesses are required in addition to the CAM search time. Assume negligible time to add the new entry to the CAM - although in reality this time is not negligible. To find the effective memory access time (its like the average access rime under these conditions) each case must be multiplied by its probability, which gives:

Effective memory access time = (0.86 x 115) + (0.14 x 215) = 98.9 + 30.1 = 129

Thus, the effective memory access time is 129 nanoseconds that represent a 14% slow down approximately,

The hit ratio is related to the number of cells in the CAM. When the number of cells ranges between 16 and 512 a hit ratio of between 80%-98% can be achieved. Intel's 80486 chip uses 32 cells. Following diagram shows the address translation that occurs when using a CAM to speed-up page table look-up.