(Latest Revision:
Nov 18, 2015
)
Chapter Eight -- Main Memory -- Lecture Notes
- 8.0 Objectives
- Describe ways of organizing memory hardware
- Discuss a range of memory management techniques
- Describe memory management in the Intel Pentium
- 8.1 Background
- The MMU sees only a stream of memory addresses
- 8.1.1 -- Basic Hardware
- Hardware must provide a mechanism for memory protection
- Use of base and limit registers is one simple example:
- Hardware checks every address generated
in user mode.
- Attempts in user mode to access memory out of bounds
results in a trap.
- Changing base or limit are privileged instructions.
- Kernel gets unrestricted access to all memory -
a necessity for performing system tasks such as loading
jobs and fetching parameters of system calls.
- 8.1.2 -- Address Binding
- Addresses in the source program are generally symbolic --
e.g. count
- Typically the compiler binds symbolic addresses to relocatable,
relative addresses, given as offsets from the base address of
the program or the containing module.
- The relative addresses may be converted to absolute
addresses by the linkage editor or loader.
- If we want to be able to move programs from one location in
memory to another then it will not be workable to load programs
containing absolute addresses. In a modern operating system it
is typically possible to swap a process out, and swap it back
in to a different area in primary memory.
- 8.1.3 -- Logical- versus Physical-Address Space
- Under execution-time binding (the dominant paradigm) while a
process is executing, the memory management unit (MMU)
hardware is responsible for the mapping
from logical address to physical address required by executing
processes during their fetch/decode/execute cycle.
- The user program deals with the logical addresses exclusively.
The (MMU) hardware translates a logical address only
when a memory access is performed.
- In a simple example situation, the MMU hardware might
translate logical addresses in the range 0..max to the range
R..R+max, where R is the value stored in the relocation
register.
- 8.1.4 -- Dynamic Loading
- Under dynamic loading, a routine is not loaded until it is
called.
- Each routine has a disk image represented in a relocatable
load format
- Routines that are never called are never loaded. This may
result in considerable savings in memory usage.
- Dynamic loading can be implemented just with user processes.
There is no need for any special assistance from hardware or
OS.
- 8.1.5 -- Dynamic Linking and Shared Libraries
- The in-memory program text originally contains a stub for each
reference to a library routine that the program has. The stub is a
piece of code that tells where in memory or on disk to locate
the library routine.
- When the stub is first executed it is replaced with the address
of the routine. (If need be, the routine is loaded first.)
- All processes share the same copy of each library routine.
- Because of memory managment, user processes need help from the
OS to check on the memory locations of routines. The sharing of library
routines requires help from hardware and OS.
- 8.2 Swapping
- Some older multiprogramming systems performed swapping to the disk each
time there was a context switch. This was a useful approach on some
systems at a time when memories were too small to contain very many
processes.
- (When you read about "standard swapping" don't let it get you
confused about the difference between swapping and context
switching. They are completely different things.)
- It isn't practical to do swapping like this in a modern interactive
system. Unless the time slice is much larger than the total swap time
- on the order of several seconds - it won't be possible to swap processes
in and out fast enough to give them all their turns in the CPU. Such a large
quantum would make the system very sluggish in its response to
interactive users.
- It is common for an OS to swap out one or more processes when the
system has begun to run out of physical memory.
- Windows 3.1 used a form of swapping. When you clicked on a window
the associated process would be swapped in, if it was not already in
memory.
- 8.3 Contiguous Memory Allocation
- In a contiguous memory allocation set-up each process resides in
some contiguous address range in memory (e.g. in the L addresses
from base address B to B+L-1). The OS would typically reside in low
memory, along with the interrupt vector.
- 8.3.1 -- Memory Mapping and Protection
- A scheme similar to the base-limit registers idea discussed
in chapter two will suffice to keep track of and enforce
memory allocations.
- In this scheme there are both logical and physical address
spaces. A user process works with L legal addresses: the
contiguous range from 0 to L-1. The MMU hardware checks every
logical address generated by the user process to make sure it
is within the legal range and maps each valid logical address
to a corresponding physical address by adding the value of the
relocation (base) register.
- By changing the values of the relocation and limit registers,
the OS can keep track of processes as it relocates and/or
resizes them. The OS can change its own size too.
- 8.3.2 -- Memory Allocation
- Fixed-size partitioning is a very simple memory allocation
methodology.
- Variable-sized partitioning is more flexible than fixed-size
partitioning.
- The OS maintains a list of available "holes" in memory.
- When a process needs to be loaded into memory, the OS
finds a hole from the list that is big enough, and places
the process into an initial contiguous section of the
hole. Any unused remainder of the hole is returned
to the list.
- When a process terminates, it releases its memory
allocation. The OS checks to see if the freed memory can
be merged with adjacent free holes to form a larger free
hole. The resulting hole is inserted into the list.
(Note: holes in the list that are merged with the new hole
have to be deleted from the list.)
- The job of allocating the memory under these conditions is
known as the dynamic storage allocation problem:
"... how to satisfy a request of size N from a list of
free holes"
- The strategy of searching for a hole may affect
performance. First fit, best fit, and worst fit are
possibilities.
- If we order the list of holes by size, we can decrease
the time required to find a suitable hole for a
process, but keeping the list in order requires extra
time.
- Simulations show:
- First fit is generally faster than best fit.
- Both first fit and best fit are better than worst
fit in terms of storage utilization and speed.
- All three algorithms suffer from the effects of
external fragmentation. For example, 1/3 of the memory
may be wasted (unusable).
- 8.3.3 -- Fragmentation
- Fragmentation can be external or internal.
- External fragmentation is memory that is available but
unusable. (for example, holes that are too small)
- Internal fragmentation is memory that is allocated but not
used. (The allocation method may require that processes
sometimes get more than they need. For example, there may be a
minimum allocation, or allocations may be made in chunks of a
specific size.)
- If processes are dynamically relocatable then the OS can move
them around to compact external fragmentation into
usable holes. PROBLEM WITH THIS: it can take a long time and
if you try to do it piecemeal it becomes a complex job that is
very difficult to do correctly.
- The idea of paging and segmentation is to do an "end run"
around these problems by allowing the memory allocation of a
process to consist of non-contiguous chunks of physical memory.
- 8.4 Paging
- When swapping is done in conjunction with variable-size
partitioning, there is typically a dynamic storage allocation
problem to solve on the swap space device in addition to the
problem in main memory. Backing stores are very slow compared to
main memories so compaction is not a realistic option. This makes
it all the more attractive to use paging or segmentation instead of
variable-size partitioning.
- 8.4.1 -- Basic Method
- For purposes of this discussion, let's assume that the
smallest addressable unit of primary memory is a
byte. It should be obvious how to apply the
concepts developed here to situations in which there is a
different word size.
- The hardware has a given page size such as 4Kbytes (in other
words, 4096 bytes). We divide primary memory and backing store
into page-sized contiguous chunks (called frames). For
example page #0 runs from byte #0 through byte #4095; and page
#1 runs from byte #4096 through byte (4096+4095)=8191.
- Processes are loaded into a number of frames. The frames don't
have to be contiguous with each other. For example the frame
used for the first 4096 bytes of the program could be frame
#17, which has base address 17*(4096)=69632 and runs up through
byte 69632+4095=73727. The second 4096 bytes of the program
could be in frame #3, which runs from byte 3*4096=12288 to byte
12288+4095=16383.
- The logical address space is a contiguous extent ranging in
address from 0 to some upper limit.
- As a program runs the MMU hardware does all the
routine translation of logical addresses to physical addresses
by using a page table. The operating system does
not perform this routine address translation -- that
would require an interrupt for every memory access, and would
be extremely slow!
- The OS creates a page table entry for each page, when it first
loads the page. When a process attempts a memory access,
hardware uses the first part of the logical address as
an index into the page table. The hardware finds the
base address of a frame in the page table and combines it with
the offset in the logical address. The result is the physical
address corresponding to the logical address. The
hardware then continues with the memory access.
- There is no external fragmentation with paging. However,
typically a process does not need all of the memory in its
"last frame." The remainder is internal fragmentation - about
half a page, on average.
- A small page size reduces internal fragmentation. A large
page size keeps the page table smaller and reduces the total
amount of I/O overhead for copying pages to and from the
backing store.
- Memory protection with paging is pretty straightforward. The
OS creates the page table. The OS uses the page table
to protect memory, much as another OS would use base-limit registers.
(The 'bases' are the frame numbers, rather than physical
addresses of memory cells, and the 'limits' are not explicitly
stored, because they're all just equal to the page size.)
- The OS has to keep track of all the allocations of the
physical frames.
- The OS keeps track of a copy of the page table of each process.
- Suppose a user process gives an address as a parameter when
communicating with the OS. For example the address could be the
base address of an array that the process wants to use as an
I/O buffer. The process gives the OS a logical address.
(The process only knows about logical addresses.) The
operating system needs to know the physical address.
The OS will use the page table of the process to translate.
- 8.4.2 -- Hardware Support
- The copy of the page table used by the hardware might be a set
of dedicated CPU registers.
- In a modern general-purpose system the CPU contains a
page-table base register (PTBR) pointing to a large page table
resident in the main memory.
- Such a 'modern' system also uses a fast associative memory
address cache (Translation Lookaside Buffer - TLB) so that the
MMU does not usually have to take the time to access the page
table when performing an address translation.
- Address Space Identifier (ASID) technology allows the TLB to
contain address translation information for several different
processes.
- ASID technology also cuts down on the necessity to do
time-consuming cache flushes during a context switch.
- Effective memory access time is a function of the hit ratio,
memory access time, and TLB search time.
- 8.4.3 -- Protection
- Some systems make the page table only as long as is necessary
for the size of the process. Such a system would typically
have a page-table length register (PTLR). A process attempting
to access an address "past the end of the table" would generate
a trap to the OS.
- In any case, the valid bit in "extra" page table entries can be
cleared by the OS so that the process will trap if it tries to
use one of those entries.
- Unfortunately a process generally can access the
internal fragment in its last page.
- 8.4.4 -- Shared Pages
- The paging paradigm easily supports shared memory (at least
when "traditional" hierarchical page tables are used.)
- If two processes have the same frame number in both their
page tables then they are able to share that frame.
- The OS can use this idea to allow many processes to share
the same read-only program text.
- Writeable memory may be shared as a means of interprocess
communication.
- 8.5 Structure of the Page Table
- 8.5.1 --- Hierarchical Paging
- Page tables may be quite large. In that case we may divide the
page table into pages.
- In one scheme, the logical address is partitioned as
(p1|p2|d). P1 is used as an index into an outer page
table. This leads us to one page of the page table. P2 and d
are then used in the "normal way" to complete the address
translation.
- For still larger page tables SPARC's support three-level paging
and Motorola 68030 supports four-level paging.
- Generally it is not considered appropriate to map a 64-bit
paged address space with "traditional" hierarchical page
tables. It requires a "ridiculous" number of levels of page
tables -- e.g. seven levels.
- 8.5.2 -- Hashed Page Tables
- Per-process hashed page tables are an alternative to
hierarchical page tables. A hash function is applied to the
virtual address. Collisions are resolved with external
chaining. Each entry on a chain contains a virtual address,
frame number, and pointer for the next item on the chain.
- Clustered page tables are a variant in which each entry in the
page table refers to several pages.
- 8.5.3 -- Inverted Page Table
- The UltraSPARC and PowerPC use an inverted page table. This
table has one entry for each frame. The entry contains the
virtual address for the frame and info on the process that is
using the frame.
- There may be some total space savings with this set-up, but
hardware and OS cannot directly index into the table using the
page number, so it would take a long time to search this table
to find the information for a forward address translation.
- The idea of the hashed page table is used in conjunction with
the inverted page table to speed the search for the correct
table entry.
- Of course if there is a cache hit in the TLB, the page table is
not consulted and effective memory access time is nearly
equal to memory access time. If the page table is consulted,
then address translation requires one additional memory access
for each probed link in the hash-overflow chain.
- When entries in the inverted page table are able to contain
only one virtual page number, it becomes difficult to implement
shared memory. All processes sharing a frame have to reference
it with the same virtual page number.
- 8.6 Segmentation
- 8.6.1 -- Basic Method
- Programmers tend to think of their programs as a collection of
named functions, modules, and data structures -- not arranged
in any particular order.
- Maybe it is not so "natural" to think of the program as
occupying a linear array of word-cells starting at address 0
and running to some upper limit.
- The segmentation memory management scheme makes it a
little easier for the programmer to view the program as that
unordered collection.
- Instead of pages we have 'named' segments of memory of varying
length. Logical addresses consist of a segment 'name'
(a number) followed by an offset within the segment.
- 8.6.2 -- Hardware
- A segment table is indexed by segment number (name). Each
entry of the table contains the base address and limit (length)
of a segment.
- To translate an address we compare the offset part with the
limit in the segment table entry. If the offset is not too
large, the physical address is the sum of the segment base plus
the offset. Otherwise we have to trap a violation of a segment
limit.
- 8.7 Example: The Intel Pentium
- In the Intel Pentium, there are segmentation and paging units that
together function as "the MMU." The Pentium supports pure
segmentation and paged segments.