(Latest Revision:
Sun Sep 19, 2005
)
Chapter Eight
--
Memory Management
--
Chapter Notes
- Section 8.1 -- Background
- Section 8.1.1 -- Address Binding
- Addresses in the source program are generally symbolic --
e.g. count
- Typically the compiler binds symbolic addresses to
relocatable, relative addresses, given as offsets from the
base address of the program or the containing module.
- The relative addresses may be converted to absolute
addresses by the linkage editor or loader.
- If we want to be able to move programs from one location in
memory to another then it will not be workable to load
programs containing absolute addresses. In a modern
operating system processes are usually relocatable.
- Section 8.1.2 -- Logical- versus Physical-Address Space
- Under execution-time binding (the dominant paradigm) while a
process is executing, the memory management unit (MMU)
hardware is responsible for any required mapping from
logical address to physical address.
- The user program deals with the logical addresses
exclusively. The (MMU) hardware translates a logical address
only when a memory access is performed.
- In a simple example situation, the MMU might translate
logical addresses in the range 0..max to the range R..R+max,
where R is the value stored in the relocation register.
- Section 8.1.3 -- Dynamic Loading
- Under dynamic loading, a routine is not loaded until it is
called.
- Each routine has a disk image represented in a relocatable
load format
- Routines that are never called are never loaded. This may
result in considerable savings in memory usage.
- Section 8.1.4 -- Dynamic Linking and Shared Libraries
- The in-memory program text originally contains a stub for
each reference the program has to a library routine. The
stub is a piece of code that tells where in memory or on
disk to locate the library routine.
- When the stub is first executed it is replaced with the
address of the routine. (If need be, the routine is loaded
first.)
- All processes share the same copy of each library routine.
- Section 8.1.5 -- Overlays
- Can we run a process that is larger than the physical
memory?
- One "primitive" way to do this is through the use of
overlays.
- For example you could write the program so it loads part of
itself, runs for a while, then loads some more of itself on
top of a part of itself that is no longer needed, and then
runs some more.
- It is difficult but not impossible to write such a program.
It used to be done fairly commonly. It is still done on a
limited basis.
- The idea of virtual memory is to automate this task so that
the programmer doesn't have to do anything special with
programs that are bigger than physical memory.
- Section 8.2 -- Swapping
- Some old multiprogramming systems did swapping to the disk
each time there was a context switch. This made some sense
back when memories were so small that very few processes
could be loaded in memory simultaneously.
- (When you read about this don't let it get you confused
about the difference between swapping and context switching.
They are completely different things.)
- It isn't practical to do swapping this often in a modern
system -- the system would be too unresponsive.
- It is common for an OS to swap out one or more processes
when the system has begun to run out of physical memory.
- Windows 3.1 used swapping. When you clicked on a window
the associated process would be swapped in, if it was not
already in memory.
- Section 8.3 -- Contiguous Memory Allocation
- In a contiguous memory allocation set-up each process resides in
some contiguous address range in memory (e.g. in the L addresses
from base address B to B+L-1). The OS would typically reside in low
memory, along with the interrupt vector.
- Section 8.3.1 -- Memory Protection
- A scheme similar to the base-limit registers idea discussed
in chapter two will suffice to keep track of and enforce
memory allocations.
- By changing the values of the base and limit, the OS can
keep track of processes as it relocates or resizes them.
- Section 8.3.2 -- Memory Allocation
- Fixed-size partitioning is a very simple memory allocation
methodology.
- Variable-sized partitioning is more flexible than fixed-size
partitioning.
- The OS maintains a list of available "holes" in memory.
- A process is placed in a hole big enough for it, and
the remainder not used is returned to the list as a
smaller hole.
- Holes are returned to the list when the process
terminates and adjacent holes are merged into one.
- The job of allocating the memory under these conditions
is known as the dynamic storage allocation
problem.
- The strategy of searching for a hole may affect
performance. First fit, best fit, and worst fit are
possibilities.
- If we order the list of holes by size, we can decrease
the time required to find a suitable hole for a
process, but keeping the list in order requires extra
time.
- First fit is generally faster than best fit. Both
first fit and best fit are better than worst fit in
terms of storage utilization and speed.
- All three algorithms suffer from the effects of
external fragmentation. For example, 1/3 of the memory
may be wasted (unusable).
- Section 8.3.3 -- Fragmentation
- Fragmentation can be external or internal.
- Internal fragmentation is memory that is allocated but not
used.
- If processes are relocatable then the OS can move them
around to compact external fragmentation into usable
holes. PROBLEM WITH THIS: it can take a long time and if
you try to do it piecemeal it becomes a complex job that is
very difficult to do correctly.
- The idea of paging and segmentation is to do an "end run"
around these problems by allowing the memory allocation of a
process to consist of non-contiguous chunks.
- Section 8.4 -- Paging
- When swapping is done in conjunction with variable-size
partitioning, there is typically a dynamic storage allocation
problem to solve on the swap space device in addition to the
problem in main memory. Backing stores are very slow compared
to main memories so compaction is not a realistic option. This
makes it all the more attractive to use paging or segmentation
instead of variable-size partitioning.
- Section 8.4.1 -- Basic Method
- The hardware has a given page size such as 4Kbytes (in other
words, 4096 bytes). We divide primary memory and backing store
into page-sized contiguous chunks (called frames). For
example page #0 runs from byte #0 through byte #4095; and page
#1 runs from byte #4096 through byte (4096+4095)=8191
- Processes are loaded into a number of frames. The frames don't
have to be contiguous with each other. For example the frame
used for the first 4096 bytes of the program could be frame
#17, which has base address 17*(4096)=69632 and runs up through
byte 69632+4095=73727. The second 4096 bytes of the program
could be in frame #3, which runs from byte 3*4096=12288 to byte
12288+4095=16383.
- The logical address space is a contiguous extent ranging in
address from 0 to some upper limit.
- As a program runs the hardware does all the
routine translation of logical addresses to physical addresses
by using a page table. The operating system does
not perform this routine address translation -- that
would be extremely slow!
- The entries of the page table are set up when each page is
first loaded. When a process attempts a memory access,
hardware uses the first part of the logical address as an
index into the page table. The hardware finds the base
address of a frame in the page table and combines it with
the offset in the logical address. This result is the
physical address translation of the logical address.
- There is no external fragmentation with paging but each
allocation can create almost a full page of internal
fragmentation.
- A small page size reduces internal fragmentation. A large
page size keeps the page table smaller and reduces the total
amount of I/O overhead for copying pages to and from the
backing store.
- Memory protection with paging is pretty straightforward. The
OS creates the page table. It works as a set of base-limit
registers.
- The OS has to keep track of all the allocations of the
physical frames.
- The OS keeps track of the page tables for each process.
- Suppose a process gives an address as a parameter when
communicating with the OS. For example the address could be
the base address of an array that the process wants to use
as an I/O buffer. The process gives the OS a logical
address. (The process only knows about logical addresses.)
The operating system needs to know the physical
address. The OS will use the page table of the process to
translate.
- Section 8.4.2 -- Hardware Support
- The copy of the page table used by the hardware might be a
set of dedicated CPU registers.
- In a modern general-purpose system the CPU contains a
page-table base register (PTBR) pointing to a large page table
resident in the main memory.
- Such a 'modern' system also uses an address cache (TLB) so that
the MMU does not usually have to take the time to access the
page table when performing an address translation.
- ASID technology allows the TLB to contain address
translation information for several different processes.
- ASID technology also cuts down on the necessity to do
time-consuming cache flushes during a context switch.
- Effective memory access time is a function of the hit ratio,
memory access time, and TLB search time.
- Section 8.4.3 -- Protection
- Some systems make the page table only as long as is
necessary for the size of the process. Such a system would
typically have a page-table length register (PTLR). A
process attempting to access an address "past the end of the
table" would generate a trap to the OS.
- In any case, the valid bit in "extra" page table entries can
be cleared by the OS so that the process will trap if it
tries to use one of those entries.
- Unfortunately a process generally can access the
internal fragment in its last page.
- Section 8.4.4 -- Structure of the Page Table
- Section 8.4.4.1 --- Hierarchical Paging
- Page tables may be quite large. In that case we may
divide the page table into pages.
- In one scheme, the logical address is partitioned as
(p1|p2|d). P1 is used as an index into an outer
page table. This leads us to one page of the page table.
P2 and d are then used in the "normal way" to complete the
address translation.
- For still larger page tables SPARC's support
three-level paging and Motorola 68030 supports
four-level paging.
- Generally it is not considered appropriate to map a
64-bit paged address space with "traditional"
hierarchical page tables. It requires a "ridiculous"
number of levels of page tables -- e.g. seven levels.
- Section 8.4.4.2 -- Hashed Page Tables
- Hashed page tables are an alternative to hierarchical
page tables. A hash function is applied to the virtual
address. Collisions are resolved with external
chaining. Each entry on a chain contains a virtual
address, frame number, and pointer for the next item on
the chain.
- Clustered page tables are a variant in which each entry
in the page table refers to several pages.
- Section 8.4.4.3 -- Inverted Page Table
- The UltraSPARC and PowerPC use an inverted page
table. This table has one entry for each frame. The
entry contains the virtual address for the frame and
info on the process that is using the frame.
- There may be some total space savings with this set-up,
but hardware and OS cannot directly index into the table
using the page number, so it takes a long time to do
forward address translation.
- The idea of the hashed page table is used in
conjunction with the inverted page table to speed the
search for the correct table entry.
- Of course if there is a cache hit in the TLB, the page
table is not consulted and effective memory
access time is nearly equal to memory access time. If
the page table is consulted, then address translation
requires one memory access for each probed link in the
hash-overflow chain.
- Section 8.4.5 -- Shared Pages
- The paging paradigm easily supports shared memory (at least
when "traditional" hierarchical page tables are used.)
- If two processes have the same frame number in both their
page tables then they are able to share that frame.
- The OS can use this idea to allow many processes to share
the same read-only program text.
- Writeble memory may be shared as a means of interprocess
communication.
- Inverted page tables are set-up to have just one virtual
page number for each frame, so this makes it difficult to
implement shared memory.
- Section 8.5 -- Segmentation
- Section 8.5.1 -- Basic Method
- Programmers tend to think of their programs as a collection
of named functions, modules, and data structures -- not
arranged in any particular order.
- Maybe it is not so "natural" to think of the program as
occupying a linear array of word cells starting at address 0
and running to some upper limit.
- The segementation memory management scheme makes it a
little easier for the programmer to view the program as that
unordered collection.
- Instead of pages we have variable length named segments of
memory. Logical addresses consist of a segment name
followed by an offset within the segment. (Well,
really we don't use a segment name -- we use a
segment number :-)
- Section 8.5.2 -- Hardware
- A segment table is indexed by segment number (name). Each
entry of the table contains the base address and limit
(length) of a segment.
- To translate an address we compare the offset part with the
limit in the segment table entry. If the offset is not too
large, the physical address is the sum of the segment base
plus the offset. Otherwise we have to trap a violation of a
segment limit.
- Section 8.5.3 -- Protection and Sharing
- Often it does not work well to mark pages as read-only or
execute-only -- because a page may contain different kinds
of things.
- On the other hand a segement is likely just to contain one
kind of thing, and so it usually works out better if we mark
segments as read-only, read/write, execute-only, and so on.
- Since we can define segments to contain precisely what we
want to share, segmentation offers some advantages over
paging with regard to supporting sharing.
- However if shared code contains references to itself then
all processes sharing the code will need to refer to the
segment using the same segment number -- the one used in the
code itself.
- On the other hand there is no problem sharing read-only code
that contain no pointers, or that uses only relative
addresssing.
- Section 8.5.4 -- Fragmentation
- Since segments are variable in size, allocating segments is
a dynamic storage allocation problem subject to external
fragmentation.
- It is not hard to relocate segments so it is pretty easy to
do compaction of segmented memory.
- Nevertheless it remains impractical to do compaction of the
segment images on swap space.
- The smaller the segments the less severe will be the
problems with external fragmentation.
- Section 8.6 -- Segmentation with Paging
- If we use segments, but page the segments, we can get the
advantages of segmentation while eliminating external
fragmentation. However, if there are many segments the internal
fragmentation problem can be non-trivial.
- Address translation is more complicated in this setting.
- Section 8.7 -- Summary