(Latest Revision:
Sun Sep 19, 2005
)
Chapter Sixteen
--
Distributed System Structures
--
Lecture Notes
Joking in a CACM publication of June 1992, Leslie Lamport stated:
"A distributed system is one in which
the failure of a computer you didn't even know existed
can render your own computer unusable."
Introduction --
- A DISTRIBUTED (computing) SYSTEM is a LOOSELY COUPLED collection
of processors.
Such computing systems do not share memory, or a clock.
Each processor has its own local memory and clock.
The processors do communicate via some medium (e.g. busses, a LAN
or a WAN) and they cooperate to perform system functions.
Sun Microsystems' old slogan sums up the idea of a distributed
system pretty well: "The Network is The Computer."
A distributed system may be highly heterogeneous -- including
many different types of computers and devices.
Distributed systems:
- help users share resources and data more easily,
- help users get work done faster, and
- provide greater reliability through redundancy.
DISTRIBUTED SYSTEMS are to be distinguished from MULTIPROCESSOR
SYSTEMS. MULTIPROCESSOR SYSTEMS are TIGHTLY COUPLED systems
which DO share memory and a clock.
A distributed file system is file service "whose users,
servers, and storage devices are dispersed among the sites of a
distributed system."
In a distributed system, there are special problems associated
with providing process synchronization, process communication,
handling deadlock, and handling system failures.
Section 16.1 -- Background
- 16.1.1 Advantages of Distributed Systems
The four main reasons for building distributed systems are:
RESOURCE SHARING, COMPUTATION SPEED-UP, RELIABILITY, and
COMMUNICATION.
- 16.1.1.1 Resource Sharing
Sharing printers and other peripherals, files, and cpu
cycles. Sharing expensive special resources physically
located at just one or a few sites -- e.g. an electron
microscope, super computer, or a special database.
- 16.1.1.2 Computation Speedup
Run jobs on more than one CPU -- get computation speed-up
through parallel processing. Also alleviate heavy loads on
one processor by having another processor do some of the
work -- off-loading and load-balancing.
- 16.1.1.3 Reliability
If enough redundancy exists in the system, parts of the
system can fail, and useful work can continue. Is this
working in our local environment. Why, or why not?
There must be appropriate mechanisms for detecting failure,
transferring work to alternates, executing recovery
procedures, and re-integrating recovered resources.
- 16.1.1.4 Communication
Given communication capability the network of machines can,
in principle, function as one computer.
A distributed system is often a cost-effective replacement
for a mainframe system, having greater flexibility.
Some of the services typically available are file transfer,
electronic mail, and remote login.
- 16.1.2 Types of Distributed Operating Systems
- 16.1.2.1 Network Operating Systems
Characteristics of a network operating system:
- The user is aware of the multiplicity of machines.
- Use of remote login (e.g. ssh) or file transfer (e.g.
ftp)
- 16.1.2.2 Distributed Operating Systems
Characteristics of a distributed operating system:
- Location independence: The user accesses remote
resources in exactly the same manner as local
resources.
- Data Migration: automatic migration of process data
between local and remote hosts as needed.
Whole files may be copied over the net, or just the
portions that are being read and/or modified -- which
method is used is a design decision that may depend on
the type of file operations that are anticipated.
Differences in data representation used on the machines
on the net may make it necessary for file sharing to
involve a lot of format conversion.
- Computation Migration: automatic remote procedure call
or message passing which causes computation to be
carried out remotely and results returned to the local
process.
Remote Procedure Call is an important "catch phrase" in
distributed systems. This is a means for one machine
to cause a function or program to be run on a remote
machine, and for that function or program to send the
results of the computation back to the caller over the
network.
For example, suppose I login to altair and do an
ls of my home directory. My home directory
/user/dept/john is actually an NFS-mount:
eos:/export/home/dept1. Therefore my ls
command causes the execution of some I/O to be done in
my behalf on the computer "eos." The result of the I/O
is the information in my home directory. A process on
eos performs the I/O and sends the output back to my
ls process on altair, which displays it on the
screen.
If one desires to have the results of a computation
performed on some data that resides at a remote site,
there are a couple of ways to get those results:
1. Transfer the data and perform the computation
locally, or
2. Use remote procedure call to have the computation
performed at the remote site, and transfer the
RESULTS to the local site.
In many cases, it is more economical to use the second
method. In other cases, the second method may be the
ONLY practical way to do the computation.
- Process Migration:
Jobs submitted at one site may be sent to another site
for execution. Below are some motivations for this:
- Load balancing
- Computation speed-up -- divide the job up and do
parallel processing of the parts.
- Hardware preference -- special processors more
suited to the task may be employed.
- Software preference -- move the job to a site
where the software needed is running.
- Data Access -- more economical to move the process
to the data than vice-versa.
- 16.1.3 Putting it All Together
Section 16.2 -- Topology
Topologies may be evaluated based on the following criteria:
- Basic cost -- Cost of establishing links
- Communication cost -- Cost needed to send from host A to host B.
- Availability -- How well does system tolerate individual site
failures?
Fully Connected
Basic cost High: number of links is N(N-1)/2
Communication cost Low: all pairs are directly connected.
Reliability High
Partially Connected
Basic cost Lower than fully connected.
Communication cost Higher than fully connected
Reliability Lower than fully connected
Good idea to build it with at least two routes between each pair of nodes.
Hierarchy
Basic cost Low: O(N)
Communication cost Varies
Reliability Low
Low levels in hierarchy are very dependent on "ancestors" -- lack of
redundancy of paths makes for lower reliability.
Star
Basic cost Low
Communication cost Low
Reliability Low
Equivalent to a hierarchy with only two levels.
Ring
Basic cost Low: O(N)
Communication cost High: O(N)
Reliability Low
Reliability can be improved by adding more links -- possibly by using a
"smart hub" that can instantly bypass failed nodes.
Multi-Access Bus
Basic cost Low
Communication cost Low (if channel has very high capacity.)
Reliability High (if channel is dependable.)
Section 16.3 -- Network Types
- 16.3.1 Local-Area Networks
LANS provide a way for a set of small computers with individualized
applications software to share peripherals and files, and to
support communication among users.
To a great extent LAN'S evolved as a more cost-effective and
convenient alternative to mainframe computers.
LAN's tend to have expensive, high-speed links.
- 16.3.2 Wide-Area Networks
WAN's got started during the 1960's, mainly as an academic
enterprise for communication and sharing. The department of
defense funded much of the early development, and furnished a
backbone network called the ARPANET. The ARPANET was phased out,
except for the portion that supports military installations. For a
while the National Science Foundation took up the responsibility of
furnishing a backbone for the academic community. Nowadays the
Internet backbone is maintained by giant communication companies.
See
http://www.navigators.com/internet_architecture.html#ISP_Backbone
and
http://www.isoc.org/
to learn all about the Internet.
The Internet is public. Private WAN's exist too.
WAN's tend to have links with lower speeds than LAN's. It would be
too expensive to extend LAN technology into WAN's.
Section 16.4 -- Communication
- 16.4.1 Naming and Name Resolution
- Typically hosts have names -- for the benefit of human users.
- Software performs "automatic"translation: hostname<-->network
numeric address. This is now done with a distributed database
-- DNS.
- Numeric addresses are two-part -- one part for the host and
another part to identify a particular process on the host.
- 16.4.2 Routing Strategies
- Fixed Routing -- hosts use static routing tables.
- Virtual Routing -- hosts establish a route to be used for each
"session" or "connection."
- Dynamic Routing -- A host chooses a (possibly new) route each
time it sends a message. This helps deal with congestion and
failure but allows messages to arrive out of order. Routers
communicate with each other about the existence of routes.
- 16.4.3 Packet Strategies
- Messages are broken up into packets of equal or near-equal
size. This facilitates the sharing of the network. It works
like time-slicing.
- Networks can use packets with connection-based protocols or
with connectionless protocols.
- 16.4.4 Connection Strategies
- Circuit Switching -- A dedicated physical circuit is set up
for each connection. Usually intermediate switches do most of
the work without human intervention. (telephone paradigm) --
requires set-up and tear-down -- less expensive for longer
messages.
- Message Switching -- Typically permanent links that are
assigned to a process for sending one message, and then
reassigned for another message transfer, and so on. (mailbox
paradigm)
- Packet Switching -- messages are divided up into packets.
Network software dispatches packets according to a post-office
type of paradigm.
- 16.4.5 Contention
- CSMA/CD -- Carrier Sense Multiple Access with Collision
Detection -- Exponential back-off algorithm. (IEEE 802.3
standard) -- You need to limit the amount of traffic on each
segment to keep the number of collisions reasonably low.
Ethernet is most efficient when the network operates at
substantially below capacity data rate.
- Token Passing -- Pass the token around a ring. The holder of
the token is allowed to send a message. If the token is lost
the hosts can start again with an election algorithm.
Some IBM and HP/Apollo systems use token ring. Token ring is
most efficient when networks operate at highest capacity
traffic rates.
- Message Slots -- slots move around a ring and a host may put a
message in a slot.
Section 16.5 -- Communication Protocols
Protocols do things like determine host addresses, establish
connections, and assure reliability of communication.
Protocol "software" may actually be implemented in
software/firmware/hardware.
Typically protocol software is designed in layers -- each layer on the
local host communicates with its peer layer on the remote host.
The International Standards Organization (ISO) model of network
protocols is none-too-practical but nevertheless is often described to
illustrate the concept of layered protocols. There are seven layers:
- Physical Layer -- concerns physical transmission of bits -- is
implemented in the network hardware.
- Data-link Layer -- responsible for organizing data into frames and
performing best-effort at reliable delivery (checksums).
- Network Layer -- the level at which routers work -- responsible for
making connections, addressing & routing packets, maintaining
routing information.
- Transport Layer -- responsible for end-to-end transfer of messages,
division of messages into packets, reassembling messages, flow
control & effective use of bandwidth.
- Session Layer -- responsible for implementing communication
sessions between processes.
- Presentation Layer -- responsible for resolving differences in data
representation and communication styles.
- Application Layer -- responsible for interacting directly with
users.
Each layer performs some function (and may add header or trailer
information) as data moves up and down the protocol stack.
TCP/IP has fewer layers -- in theory more difficult to implement but
having less overhead.
TCP/IP OSI
Application Application, Presentation
Transport (TCP-UDP) Transport
Internet (IP) Network
Network Interface Data Link
Physical Physical
Section 16.6 -- Robustness
The system must deal with failure of hosts, networks, and links. It
must deal with lost and corrupted messages.
- 16.6.1 Failure Detection
- Often it will be difficult or impossible to tell whether
failures are due to a link or host going down, or due to
congestion or some other cause.
- Sites can trade "are you up" messages to determine if
communication is still working.
- Senders of messages can demand acknowledgements
- Messages may be resent when an appropriate timeout expires.
- 16.6.2 Reconfiguration
- When links stop working, routers can tell other routers about
it so that dynamic routes will detour.
- If a server becomes unreachable, clients may elect a new
server. In a partitioned network the result may be two
servers performing conflicting actions.
- 16.6.3 Recovery from Failure
- If sites periodically try links with test messages then sites
can learn when non-functional links once again become
operational. Routers can propagate the information to other
routers so that all sites become aware and take it into
account in their routing algorithms.
- A site that has been cut off from the network may need to
download routing information in order to "catch up."
Section 16.7 -- Design Issues
- In a distributed operating system users access remote
resources in exactly the same way as local resources.
- The user should be able to access the same computing environment
regardless of what host s/he logs into.
- The system should be fault-tolerant -- able to continue
functioning when a subset of its resources fail -- hosts, links,
processes, peripherals, servers.
- It is interesting to ask the degree to which future systems will
depend on RAID technology to insure fault-tolerance versus how
dependent they will be on higher-level operating systems protocols.
- Distributed systems should be scalable -- react
proportionally and not unstably to growth and/or increased load.
- It may be essential that a distributed system be well stocked with
spare resources.
- "The service demand from any component of the system should be
bounded by a constant that is independent of the number of nodes in
the system." If not there is a bound on how large the system can
grow without breaking down.
- Autonomy and symmetry are important goals. (Clustering small
client-server groups that mostly communicate with each other may
approximate autonomy and symmetry sufficiently well.)
- If a service requires blocking system calls, servers using
persistent sets of lightweight threads seems to be a good server
process model.
Section 16.8 -- An Example: Networking
- LAN's typically use an Address Resolution Protocol (ARP) to
translate IP Number <--> Medium Access Address.
Section 16.9 -- Summary