7th ed. chapter 08

(Latest Revision: Nov 18, 2015 )

Chapter Eight -- Main Memory -- Lecture Notes

8.0 Objectives
- Describe ways of organizing memory hardware
- Discuss a range of memory management techniques
- Describe memory management in the Intel Pentium
8.1 Background
- The MMU sees only a stream of memory addresses
- 8.1.1 -- Basic Hardware
  - Hardware must provide a mechanism for memory protection
  - Use of base and limit registers is one simple example:
    - Hardware checks every address generated in user mode.
    - Attempts in user mode to access memory out of bounds results in a trap.
    - Changing base or limit are privileged instructions.
    - Kernel gets unrestricted access to all memory - a necessity for performing system tasks such as loading jobs and fetching parameters of system calls.
- 8.1.2 -- Address Binding
  - Addresses in the source program are generally symbolic -- e.g. count
  - Typically the compiler binds symbolic addresses to relocatable, relative addresses, given as offsets from the base address of the program or the containing module.
  - The relative addresses may be converted to absolute addresses by the linkage editor or loader.
  - If we want to be able to move programs from one location in memory to another then it will not be workable to load programs containing absolute addresses. In a modern operating system it is typically possible to swap a process out, and swap it back in to a different area in primary memory.
- 8.1.3 -- Logical- versus Physical-Address Space
  - Under execution-time binding (the dominant paradigm) while a process is executing, the memory management unit (MMU) hardware is responsible for the mapping from logical address to physical address required by executing processes during their fetch/decode/execute cycle.
  - The user program deals with the logical addresses exclusively. The (MMU) hardware translates a logical address only when a memory access is performed.
  - In a simple example situation, the MMU hardware might translate logical addresses in the range 0..max to the range R..R+max, where R is the value stored in the relocation register.
- 8.1.4 -- Dynamic Loading
  - Under dynamic loading, a routine is not loaded until it is called.
  - Each routine has a disk image represented in a relocatable load format
  - Routines that are never called are never loaded. This may result in considerable savings in memory usage.
  - Dynamic loading can be implemented just with user processes. There is no need for any special assistance from hardware or OS.
- 8.1.5 -- Dynamic Linking and Shared Libraries
  - The in-memory program text originally contains a stub for each reference to a library routine that the program has. The stub is a piece of code that tells where in memory or on disk to locate the library routine.
  - When the stub is first executed it is replaced with the address of the routine. (If need be, the routine is loaded first.)
  - All processes share the same copy of each library routine.
  - Because of memory managment, user processes need help from the OS to check on the memory locations of routines. The sharing of library routines requires help from hardware and OS.
8.2 Swapping
- Some older multiprogramming systems performed swapping to the disk each time there was a context switch. This was a useful approach on some systems at a time when memories were too small to contain very many processes.
- (When you read about "standard swapping" don't let it get you confused about the difference between swapping and context switching. They are completely different things.)
- It isn't practical to do swapping like this in a modern interactive system. Unless the time slice is much larger than the total swap time - on the order of several seconds - it won't be possible to swap processes in and out fast enough to give them all their turns in the CPU. Such a large quantum would make the system very sluggish in its response to interactive users.
- It is common for an OS to swap out one or more processes when the system has begun to run out of physical memory.
- Windows 3.1 used a form of swapping. When you clicked on a window the associated process would be swapped in, if it was not already in memory.
8.3 Contiguous Memory Allocation
- In a contiguous memory allocation set-up each process resides in some contiguous address range in memory (e.g. in the L addresses from base address B to B+L-1). The OS would typically reside in low memory, along with the interrupt vector.
- 8.3.1 -- Memory Mapping and Protection
  - A scheme similar to the base-limit registers idea discussed in chapter two will suffice to keep track of and enforce memory allocations.
  - In this scheme there are both logical and physical address spaces. A user process works with L legal addresses: the contiguous range from 0 to L-1. The MMU hardware checks every logical address generated by the user process to make sure it is within the legal range and maps each valid logical address to a corresponding physical address by adding the value of the relocation (base) register.
  - By changing the values of the relocation and limit registers, the OS can keep track of processes as it relocates and/or resizes them. The OS can change its own size too.
- 8.3.2 -- Memory Allocation
  - Fixed-size partitioning is a very simple memory allocation methodology.
  - Variable-sized partitioning is more flexible than fixed-size partitioning.
    - The OS maintains a list of available "holes" in memory.
    - When a process needs to be loaded into memory, the OS finds a hole from the list that is big enough, and places the process into an initial contiguous section of the hole. Any unused remainder of the hole is returned to the list.
    - When a process terminates, it releases its memory allocation. The OS checks to see if the freed memory can be merged with adjacent free holes to form a larger free hole. The resulting hole is inserted into the list. (Note: holes in the list that are merged with the new hole have to be deleted from the list.)
    - The job of allocating the memory under these conditions is known as the dynamic storage allocation problem: "... how to satisfy a request of size N from a list of free holes"
    - The strategy of searching for a hole may affect performance. First fit, best fit, and worst fit are possibilities.
    - If we order the list of holes by size, we can decrease the time required to find a suitable hole for a process, but keeping the list in order requires extra time.
    - Simulations show:
      - First fit is generally faster than best fit.
      - Both first fit and best fit are better than worst fit in terms of storage utilization and speed.
    - All three algorithms suffer from the effects of external fragmentation. For example, 1/3 of the memory may be wasted (unusable).
- 8.3.3 -- Fragmentation
  - Fragmentation can be external or internal.
  - External fragmentation is memory that is available but unusable. (for example, holes that are too small)
  - Internal fragmentation is memory that is allocated but not used. (The allocation method may require that processes sometimes get more than they need. For example, there may be a minimum allocation, or allocations may be made in chunks of a specific size.)
  - If processes are dynamically relocatable then the OS can move them around to compact external fragmentation into usable holes. PROBLEM WITH THIS: it can take a long time and if you try to do it piecemeal it becomes a complex job that is very difficult to do correctly.
  - The idea of paging and segmentation is to do an "end run" around these problems by allowing the memory allocation of a process to consist of non-contiguous chunks of physical memory.
8.4 Paging
- When swapping is done in conjunction with variable-size partitioning, there is typically a dynamic storage allocation problem to solve on the swap space device in addition to the problem in main memory. Backing stores are very slow compared to main memories so compaction is not a realistic option. This makes it all the more attractive to use paging or segmentation instead of variable-size partitioning.
- 8.4.1 -- Basic Method
  - For purposes of this discussion, let's assume that the smallest addressable unit of primary memory is a byte. It should be obvious how to apply the concepts developed here to situations in which there is a different word size.
  - The hardware has a given page size such as 4Kbytes (in other words, 4096 bytes). We divide primary memory and backing store into page-sized contiguous chunks (called frames). For example page #0 runs from byte #0 through byte #4095; and page #1 runs from byte #4096 through byte (4096+4095)=8191.
  - Processes are loaded into a number of frames. The frames don't have to be contiguous with each other. For example the frame used for the first 4096 bytes of the program could be frame #17, which has base address 17*(4096)=69632 and runs up through byte 69632+4095=73727. The second 4096 bytes of the program could be in frame #3, which runs from byte 3*4096=12288 to byte 12288+4095=16383.
  - The logical address space is a contiguous extent ranging in address from 0 to some upper limit.
  - As a program runs the MMU hardware does all the routine translation of logical addresses to physical addresses by using a page table. The operating system does not perform this routine address translation -- that would require an interrupt for every memory access, and would be extremely slow!
  - The OS creates a page table entry for each page, when it first loads the page. When a process attempts a memory access, hardware uses the first part of the logical address as an index into the page table. The hardware finds the base address of a frame in the page table and combines it with the offset in the logical address. The result is the physical address corresponding to the logical address. The hardware then continues with the memory access.
  - There is no external fragmentation with paging. However, typically a process does not need all of the memory in its "last frame." The remainder is internal fragmentation - about half a page, on average.
  - A small page size reduces internal fragmentation. A large page size keeps the page table smaller and reduces the total amount of I/O overhead for copying pages to and from the backing store.
  - Memory protection with paging is pretty straightforward. The OS creates the page table. The OS uses the page table to protect memory, much as another OS would use base-limit registers. (The 'bases' are the frame numbers, rather than physical addresses of memory cells, and the 'limits' are not explicitly stored, because they're all just equal to the page size.)
  - The OS has to keep track of all the allocations of the physical frames.
  - The OS keeps track of a copy of the page table of each process.
  - Suppose a user process gives an address as a parameter when communicating with the OS. For example the address could be the base address of an array that the process wants to use as an I/O buffer. The process gives the OS a logical address. (The process only knows about logical addresses.) The operating system needs to know the physical address. The OS will use the page table of the process to translate.
- 8.4.2 -- Hardware Support
  - The copy of the page table used by the hardware might be a set of dedicated CPU registers.
  - In a modern general-purpose system the CPU contains a page-table base register (PTBR) pointing to a large page table resident in the main memory.
  - Such a 'modern' system also uses a fast associative memory address cache (Translation Lookaside Buffer - TLB) so that the MMU does not usually have to take the time to access the page table when performing an address translation.
  - Address Space Identifier (ASID) technology allows the TLB to contain address translation information for several different processes.
  - ASID technology also cuts down on the necessity to do time-consuming cache flushes during a context switch.
  - Effective memory access time is a function of the hit ratio, memory access time, and TLB search time.
- 8.4.3 -- Protection
  - Some systems make the page table only as long as is necessary for the size of the process. Such a system would typically have a page-table length register (PTLR). A process attempting to access an address "past the end of the table" would generate a trap to the OS.
  - In any case, the valid bit in "extra" page table entries can be cleared by the OS so that the process will trap if it tries to use one of those entries.
  - Unfortunately a process generally can access the internal fragment in its last page.
- 8.4.4 -- Shared Pages
  - The paging paradigm easily supports shared memory (at least when "traditional" hierarchical page tables are used.)
  - If two processes have the same frame number in both their page tables then they are able to share that frame.
  - The OS can use this idea to allow many processes to share the same read-only program text.
  - Writeable memory may be shared as a means of interprocess communication.
8.5 Structure of the Page Table
- 8.5.1 --- Hierarchical Paging
  - Page tables may be quite large. In that case we may divide the page table into pages.
  - In one scheme, the logical address is partitioned as (p1|p2|d). P1 is used as an index into an outer page table. This leads us to one page of the page table. P2 and d are then used in the "normal way" to complete the address translation.
  - For still larger page tables SPARC's support three-level paging and Motorola 68030 supports four-level paging.
  - Generally it is not considered appropriate to map a 64-bit paged address space with "traditional" hierarchical page tables. It requires a "ridiculous" number of levels of page tables -- e.g. seven levels.
- 8.5.2 -- Hashed Page Tables
  - Per-process hashed page tables are an alternative to hierarchical page tables. A hash function is applied to the virtual address. Collisions are resolved with external chaining. Each entry on a chain contains a virtual address, frame number, and pointer for the next item on the chain.
  - Clustered page tables are a variant in which each entry in the page table refers to several pages.
- 8.5.3 -- Inverted Page Table
  - The UltraSPARC and PowerPC use an inverted page table. This table has one entry for each frame. The entry contains the virtual address for the frame and info on the process that is using the frame.
  - There may be some total space savings with this set-up, but hardware and OS cannot directly index into the table using the page number, so it would take a long time to search this table to find the information for a forward address translation.
  - The idea of the hashed page table is used in conjunction with the inverted page table to speed the search for the correct table entry.
  - Of course if there is a cache hit in the TLB, the page table is not consulted and effective memory access time is nearly equal to memory access time. If the page table is consulted, then address translation requires one additional memory access for each probed link in the hash-overflow chain.
  - When entries in the inverted page table are able to contain only one virtual page number, it becomes difficult to implement shared memory. All processes sharing a frame have to reference it with the same virtual page number.
8.6 Segmentation
- 8.6.1 -- Basic Method
  - Programmers tend to think of their programs as a collection of named functions, modules, and data structures -- not arranged in any particular order.
  - Maybe it is not so "natural" to think of the program as occupying a linear array of word-cells starting at address 0 and running to some upper limit.
  - The segmentation memory management scheme makes it a little easier for the programmer to view the program as that unordered collection.
  - Instead of pages we have 'named' segments of memory of varying length. Logical addresses consist of a segment 'name' (a number) followed by an offset within the segment.
- 8.6.2 -- Hardware
  - A segment table is indexed by segment number (name). Each entry of the table contains the base address and limit (length) of a segment.
  - To translate an address we compare the offset part with the limit in the segment table entry. If the offset is not too large, the physical address is the sum of the segment base plus the offset. Otherwise we have to trap a violation of a segment limit.
8.7 Example: The Intel Pentium
- In the Intel Pentium, there are segmentation and paging units that together function as "the MMU." The Pentium supports pure segmentation and paged segments.