(Latest Revision:
Dec 5, 2014
)
Chapter Ten -- Mass Storage Structure -- Lecture Notes
- 10.0 Objectives
- Describe the physical structure of secondary storage devices
and the effect on the uses of the devices
- Explain the performance characteristics of
mass-storage devices
- Evaluate disk-scheduling algorithms
- Discuss operating-system services provided for mass
storage, including RAID
- 10.1 Overview of Mass-Storage Structure
- 10.1.1 Magnetic Disks
- Terms to know
- platter
- disk head
- disk arm
- track
- sector
- cylinder
- transfer rate
- RPM
- positioning time (random access time)
- seek time
- rotational latency
- head crash
- flash drive
- I/O bus
- advanced technology attachment (ATA)
- serial ATA (SATA)
- eSATA
- universal serial bus (USB)
- fiber channel (FC)
- controller
- host controller
- disk controller
- 10.1.2 Solid State Disks
- Non-volatile memory used like a disk drive
- Advantages over magnetic disks
- Reliable - no moving parts
- Faster - no seek or latency time
- Consume less power
- Smaller and lighter
- Disadvantages compared to magnetic disks
- More expensive
- Less capacious
- Shorter life
- 10.1.3 Magnetic Tapes
- Non-Volatile
- Capacious
- Slow Access Time
- Mainly for backup, storage, and transfer
- Once positioned, can read/write at speeds
comparable to disk
- "Never underestimate the bandwidth of a
station wagon full of tapes hurtling down the highway."
-- Tanenbaum, Andrew S. (1989). Computer Networks.
New Jersey: Prentice-Hall. p. 57. ISBN 0-13-166836-6.
- 10.2 Disk Structure
- Addressed as large 1-D array of logical blocks
of size usually 512 bytes
- The sectors are numbered so they start on the first
sector of the first platter, which is on the outermost
track, on the uppermost surface. The numbering proceeds
around that track, and then around the track immediately
under it, staying on cylinder 0. The numbering proceeds
like that, covering all the sectors on cylinder 0, and
then jumps in to the next cylinder and goes through that
cylinder in the same manner, then the next, and so on
until finishing up on the innermost cylinder.
- On constant linear velocity (CLV) media such as CD-ROM
and DVD-ROM, the density
of bits throughout the medium is uniform. There are
more sectors on the longer outer tracks. To maintain
constant linear velocity, the disk must spin faster as
the head moves inward.
- On constant ANGULAR velocity media (CAV), such as hard
disks, the density of bits decreases from inner tracks
to outer tracks to keep the data rate constant.
- Outer zones of hard drives typically have several hundred
sectors per track and tens of thousands of cylinders.
- 10.3 Disk Attachment
- Disk storage may be host-attached or network-attached
- 10.3.1 Host-Attached Storage
- Accessed through local I/O ports,
possibly IDE, ATA, SATA, FC, FC-AL
- 10.3.2 Network-Attached Storage
- Accessed remotely over a data network, typically
via an RPC-based interface (unix NFS or Windows CIFS)
using TCP or UDP network protocols
- Typically implemented as a RAID array
- Allows multiple computers in a network to share storage
- Network-attached storage tends to be less
high-performance than host-attached
- 10.3.3 Storage Area Network
- Like Network-attached but uses a separate,
private, network. Thus it avoids contention
with traffic on the
computer-to-computer network
- Utilizes special storage protocols on its network
- Multiple hosts and multiple storage arrays
may attach to a common SAN.
- Settings on a SAN switch determine which hosts can
access which storage arrays.
- SANs are high-performance.
- 10.4 Disk Scheduling
- Disk scheduling refers to the order
in which the pending
requests for I/O on the device are served.
- 10.4.1 -- FCFS Scheduling
- "Intrinsically fair" but often results in
long seeks
- 10.4.2 -- Shortest-Seek-Time-First (SSTF) Scheduling
- Keeps seeks shorter than FCFS, but it is like SJF, in
that it can result in starvation
- A continual stream of requests near the position of the R/W head
could cause a request for service at a distant cylinder to starve.
- Unlike SJF, SSTF is NOT optimal - it does not always produce
the least amount of disk arm movement.
- 10.4.3 -- SCAN Scheduling: The Elevator Algorithm
- Starts at one "end", say the outermost cylinder, and
works its way across the disk toward the other end,
servicing requests on cylinders as it arrives on the
cylinders.
- When it reaches the other end, it starts back in the other
direction, continues until arriving back at the start
point, and then repeats.
- To avoid starvation, only the requests that were on the cylinder when the
head arrived there should be serviced. The rest should wait until
the next time the head moves to the cylinder.
- Problem: When the head changes direction, there are not likely to be
very many requests waiting on the cylinders it moves to next, because
the pending requests there have just been serviced.
- 10.4.4 -- C-SCAN Scheduling (Circular SCAN)
- Designed to solve the problem with SCAN Scheduling
- Upon finishing with the innermost cylinder, the head moves immediately
to the outermost cylinder, without servicing any of the requests on
the cylinders over which it travels. It then begins another sweep from
outside to inside, now servicing each cylinder's requests as it arrives there.
- 10.4.5 -- LOOK Scheduling
- There is a version of SCAN called LOOK, and of C-SCAN called C-LOOK.
It's just a simple optimization, the head switches direction as soon
as it is seen that there are no more requests on cylinders further
forward in the direction in which the head is currently moving.
- 10.4.6 -- Selection of a Disk-Scheduling Algorithm
- The amount of seeking required while accessing a file depends on where
on disk the file blocks are allocated.
- In order to access a file, directories and other file meta-data
have to be accessed too, so it's helpful to locate meta-data close
to file data, and also helpful to cache file meta-data in primary
memory while a file is in use.
- Different disk scheduling algorithms may be preferable, depending on
details of how the filesystem is implemented.
- In modern disks there may be "bad blocks" (sectors) that are unused, and
so the logical disk block address may not be the actual physical
disk block number. The disk scheduling algorithm may be implemented
in the controller instead of the operating system, because the
controller is aware of the physical locations of the file blocks,
and therefore is able to better optimize rotational latency and seek time.
- Unfortunately the operating system may have needs to order I/O operations
for other reasons besides optimizing access times. In that case, it can
implement its own algorithm, and just feed I/O requests to the controller
one at a time, waiting for each one to complete before feeding the controller
another.
- 10.5 Disk Management
- Disk initialization, booting from disk, and bad-block recovery
- 10.5.1 Disk Formatting
- Low-level formatting divides the tracks into sectors that
can be read by the controller - sectors get a header,
data area (usually 512 bytes),
and a trailer. Header and trailer contain sector number and
error-correcting code (ECC).
- Usually the low-level formatting is done at the factory.
- The operating system partitions the drive into one or more separate
file systems and performs logical formatting to place
data structures on
the disk that are needed to maintain the file system(s). Examples:
free list, and empty top-level directory.
- Sometimes partitions are used "raw" - without any file
system structure.
- 10.5.2 Boot Block
- Most systems have a bootstrap loader in ROM that
starts when the hardware is powered up, copies
a full bootstrap program from disk, and executes it.
- The full bootstrap program resides in "boot blocks" on
disk, typically on the first track.
- The full bootstrap program copies the OS from disk into
memory, and executes it.
- 10.5.3 Bad Blocks
- It's common for disks to have bad sectors.
- Various methods of detecting bad sectors and assigning
substitutes exist, some more automatic than others.
- Concepts:
- Sector sparing (sector forwarding)
- Sector slipping
- Soft Error
- Hard Error
- 10.6 Swap Space Management
- In modern systems, swap space is designed to get
good throughput in the paging and swapping done by
the virtual memory subsystem.
- 10.6.1 -- Swap-Space Use
- Sometimes swap space is needed to back up
whole processes, sometimes just for subsets
of their pages.
- Allocation of sufficient swap space can be
critical.
- If a system runs out of swap space, it may have
to kill processes, or even crash.
- 10.6.2 -- Swap-Space Location
- Usually it's best to locate swap space on a secondary
memory device separate from where the file systems are
located, which makes for better throughput.
- For the same reasons, it can be helpful to have multiple
swap spaces located on separate devices.
- Swap files are generally inefficient, due to the need to
traverse file system structures in order to use the swap
file, and possibly other factors.
- The use of raw partitions for swapping is typically
quite efficient.
- 10.6.3 -- Swap-Space Management: An Example
- At one time it was common to back up whole processes
in swap space.
- Now program text is usually just paged from the file
system, and swap space is mostly for anonymous memory,
such as stacks, heaps, and uninitialized data.
- It also used to be common to preallocate all the swap space
that might be needed by a process at the time of process
creation.
- Nowadays, systems tend to have abundant primary memory and
swap space. They do less paging. On such systems, it
is more common for swap space to be allocated
only as the need for it arises - when pages need to be
swapped out. Skipping the initial swap space allocation
helps processes to start up more quickly.
- 10.7 RAID Structure (will not be covered in 2014 testing)
- Redundant arrays of independent (formerly: inexpensive) disks (RAID)
can provide higher data reliability and data-rate transfer.
- 10.7.1 -- Improvement of Reliability via Redundancy
- 10.7.2 -- Improvement in Performance via Parallelism
- data striping, bit-level striping, block-level striping
- Striping provides high data-transfer rates.
- 10.7.3 -- RAID Levels
- Various RAID levels utilize 'parity bits' and/or
mirroring with striping to achieve both high
reliability and high data-transfer rates.
- 10.7.4 -- Selecting a RAID Level
- 10.7.5 -- Extensions
- Arrays of tapes
- Data broadcasts over wireless systems
- 10.7.6 -- Problems with RAID
- 10.8 Stable Storage Implementation
- Replication
- Multiple storage devices
- Independent failure modes
- Recoverable from all write failures
- Two physical blocks for each logical block