7th ed. chapter 16

(Latest Revision: Sun Sep 19, 2005 )

Chapter Sixteen -- Distributed System Structures -- Lecture Notes

Joking in a CACM publication of June 1992, Leslie Lamport stated:

"A distributed system is one in which 
 the failure of a computer you didn't even know existed
 can render your own computer unusable."

Introduction --
- A DISTRIBUTED (computing) SYSTEM is a LOOSELY COUPLED collection of processors.
  
  Such computing systems do not share memory, or a clock.
  
  Each processor has its own local memory and clock.
  
  The processors do communicate via some medium (e.g. busses, a LAN or a WAN) and they cooperate to perform system functions.
  
  Sun Microsystems' old slogan sums up the idea of a distributed system pretty well: "The Network is The Computer."
  
  A distributed system may be highly heterogeneous -- including many different types of computers and devices.
  
  Distributed systems:
  - help users share resources and data more easily,
  - help users get work done faster, and
  - provide greater reliability through redundancy.
  DISTRIBUTED SYSTEMS are to be distinguished from MULTIPROCESSOR SYSTEMS. MULTIPROCESSOR SYSTEMS are TIGHTLY COUPLED systems which DO share memory and a clock.
  
  A distributed file system is file service "whose users, servers, and storage devices are dispersed among the sites of a distributed system."
  
  In a distributed system, there are special problems associated with providing process synchronization, process communication, handling deadlock, and handling system failures.
Section 16.1 -- Background
- 16.1.1 Advantages of Distributed Systems
  
  The four main reasons for building distributed systems are: RESOURCE SHARING, COMPUTATION SPEED-UP, RELIABILITY, and COMMUNICATION.
  - 16.1.1.1 Resource Sharing
    
    Sharing printers and other peripherals, files, and cpu cycles. Sharing expensive special resources physically located at just one or a few sites -- e.g. an electron microscope, super computer, or a special database.
  - 16.1.1.2 Computation Speedup
    
    Run jobs on more than one CPU -- get computation speed-up through parallel processing. Also alleviate heavy loads on one processor by having another processor do some of the work -- off-loading and load-balancing.
  - 16.1.1.3 Reliability
    
    If enough redundancy exists in the system, parts of the system can fail, and useful work can continue. Is this working in our local environment. Why, or why not?
    
    There must be appropriate mechanisms for detecting failure, transferring work to alternates, executing recovery procedures, and re-integrating recovered resources.
  - 16.1.1.4 Communication
    
    Given communication capability the network of machines can, in principle, function as one computer.
    
    A distributed system is often a cost-effective replacement for a mainframe system, having greater flexibility.
    
    Some of the services typically available are file transfer, electronic mail, and remote login.
- 16.1.2 Types of Distributed Operating Systems
  - 16.1.2.1 Network Operating Systems
    
    Characteristics of a network operating system:
    - The user is aware of the multiplicity of machines.
    - Use of remote login (e.g. ssh) or file transfer (e.g. ftp)
  - 16.1.2.2 Distributed Operating Systems
    
    Characteristics of a distributed operating system:
    - Location independence: The user accesses remote resources in exactly the same manner as local resources.
    - Data Migration: automatic migration of process data between local and remote hosts as needed.
      
      Whole files may be copied over the net, or just the portions that are being read and/or modified -- which method is used is a design decision that may depend on the type of file operations that are anticipated. Differences in data representation used on the machines on the net may make it necessary for file sharing to involve a lot of format conversion.
    - Computation Migration: automatic remote procedure call or message passing which causes computation to be carried out remotely and results returned to the local process.
      
      Remote Procedure Call is an important "catch phrase" in distributed systems. This is a means for one machine to cause a function or program to be run on a remote machine, and for that function or program to send the results of the computation back to the caller over the network.
      
      For example, suppose I login to altair and do an ls of my home directory. My home directory /user/dept/john is actually an NFS-mount: eos:/export/home/dept1. Therefore my ls command causes the execution of some I/O to be done in my behalf on the computer "eos." The result of the I/O is the information in my home directory. A process on eos performs the I/O and sends the output back to my ls process on altair, which displays it on the screen.
      
      If one desires to have the results of a computation performed on some data that resides at a remote site, there are a couple of ways to get those results:
      
      1. Transfer the data and perform the computation locally, or
      
      2. Use remote procedure call to have the computation performed at the remote site, and transfer the RESULTS to the local site.
      
      In many cases, it is more economical to use the second method. In other cases, the second method may be the ONLY practical way to do the computation.
    - Process Migration:
      
      Jobs submitted at one site may be sent to another site for execution. Below are some motivations for this:
      - Load balancing
      - Computation speed-up -- divide the job up and do parallel processing of the parts.
      - Hardware preference -- special processors more suited to the task may be employed.
      - Software preference -- move the job to a site where the software needed is running.
      - Data Access -- more economical to move the process to the data than vice-versa.
- 16.1.3 Putting it All Together

Section 16.2 -- Topology

Topologies may be evaluated based on the following criteria:

Basic cost -- Cost of establishing links
Communication cost -- Cost needed to send from host A to host B.
Availability -- How well does system tolerate individual site failures?


Fully Connected 
Basic cost            		High: number of links is N(N-1)/2
Communication cost		Low: all pairs are directly connected.
Reliability			High

Partially Connected 
Basic cost			Lower than fully connected.
Communication cost		Higher than fully connected
Reliability			Lower than fully connected

Good idea to build it with at least two routes between each pair of nodes.

Hierarchy 
Basic cost			Low:  O(N)
Communication cost		Varies
Reliability			Low

Low levels in hierarchy are very dependent on "ancestors" -- lack of
redundancy of paths makes for lower reliability.

Star 
Basic cost			Low
Communication cost		Low
Reliability			Low

Equivalent to a hierarchy with only two levels.

Ring 
Basic cost			Low:  O(N)
Communication cost		High: O(N)
Reliability			Low

Reliability can be improved by adding more links -- possibly by using a
"smart hub" that can instantly bypass failed nodes.

Multi-Access Bus 
Basic cost			Low
Communication cost		Low (if channel has very high capacity.)
Reliability			High (if channel is dependable.)

Section 16.3 -- Network Types
- 16.3.1 Local-Area Networks
  
  LANS provide a way for a set of small computers with individualized applications software to share peripherals and files, and to support communication among users.
  
  To a great extent LAN'S evolved as a more cost-effective and convenient alternative to mainframe computers.
  
  LAN's tend to have expensive, high-speed links.
- 16.3.2 Wide-Area Networks
  
  WAN's got started during the 1960's, mainly as an academic enterprise for communication and sharing. The department of defense funded much of the early development, and furnished a backbone network called the ARPANET. The ARPANET was phased out, except for the portion that supports military installations. For a while the National Science Foundation took up the responsibility of furnishing a backbone for the academic community. Nowadays the Internet backbone is maintained by giant communication companies.
  
  See http://www.navigators.com/internet_architecture.html#ISP_Backbone and http://www.isoc.org/ to learn all about the Internet.
  
  The Internet is public. Private WAN's exist too.
  
  WAN's tend to have links with lower speeds than LAN's. It would be too expensive to extend LAN technology into WAN's.
Section 16.4 -- Communication
- 16.4.1 Naming and Name Resolution
  - Typically hosts have names -- for the benefit of human users.
  - Software performs "automatic"translation: hostname<-->network numeric address. This is now done with a distributed database -- DNS.
  - Numeric addresses are two-part -- one part for the host and another part to identify a particular process on the host.
- 16.4.2 Routing Strategies
  - Fixed Routing -- hosts use static routing tables.
  - Virtual Routing -- hosts establish a route to be used for each "session" or "connection."
  - Dynamic Routing -- A host chooses a (possibly new) route each time it sends a message. This helps deal with congestion and failure but allows messages to arrive out of order. Routers communicate with each other about the existence of routes.
- 16.4.3 Packet Strategies
  - Messages are broken up into packets of equal or near-equal size. This facilitates the sharing of the network. It works like time-slicing.
  - Networks can use packets with connection-based protocols or with connectionless protocols.
- 16.4.4 Connection Strategies
  - Circuit Switching -- A dedicated physical circuit is set up for each connection. Usually intermediate switches do most of the work without human intervention. (telephone paradigm) -- requires set-up and tear-down -- less expensive for longer messages.
  - Message Switching -- Typically permanent links that are assigned to a process for sending one message, and then reassigned for another message transfer, and so on. (mailbox paradigm)
  - Packet Switching -- messages are divided up into packets. Network software dispatches packets according to a post-office type of paradigm.
- 16.4.5 Contention
  - CSMA/CD -- Carrier Sense Multiple Access with Collision Detection -- Exponential back-off algorithm. (IEEE 802.3 standard) -- You need to limit the amount of traffic on each segment to keep the number of collisions reasonably low. Ethernet is most efficient when the network operates at substantially below capacity data rate.
  - Token Passing -- Pass the token around a ring. The holder of the token is allowed to send a message. If the token is lost the hosts can start again with an election algorithm. Some IBM and HP/Apollo systems use token ring. Token ring is most efficient when networks operate at highest capacity traffic rates.
  - Message Slots -- slots move around a ring and a host may put a message in a slot.
Section 16.5 -- Communication Protocols

Protocols do things like determine host addresses, establish connections, and assure reliability of communication.

Protocol "software" may actually be implemented in software/firmware/hardware.

Typically protocol software is designed in layers -- each layer on the local host communicates with its peer layer on the remote host.

The International Standards Organization (ISO) model of network protocols is none-too-practical but nevertheless is often described to illustrate the concept of layered protocols. There are seven layers:
1. Physical Layer -- concerns physical transmission of bits -- is implemented in the network hardware.
2. Data-link Layer -- responsible for organizing data into frames and performing best-effort at reliable delivery (checksums).
3. Network Layer -- the level at which routers work -- responsible for making connections, addressing & routing packets, maintaining routing information.
4. Transport Layer -- responsible for end-to-end transfer of messages, division of messages into packets, reassembling messages, flow control & effective use of bandwidth.
5. Session Layer -- responsible for implementing communication sessions between processes.
6. Presentation Layer -- responsible for resolving differences in data representation and communication styles.
7. Application Layer -- responsible for interacting directly with users.
Each layer performs some function (and may add header or trailer information) as data moves up and down the protocol stack.

TCP/IP has fewer layers -- in theory more difficult to implement but having less overhead.
```
TCP/IP                       OSI	  

Application               Application, Presentation
Transport (TCP-UDP)       Transport
Internet (IP)             Network
Network Interface         Data Link
Physical                  Physical
```
Section 16.6 -- Robustness

The system must deal with failure of hosts, networks, and links. It must deal with lost and corrupted messages.
- 16.6.1 Failure Detection
  - Often it will be difficult or impossible to tell whether failures are due to a link or host going down, or due to congestion or some other cause.
  - Sites can trade "are you up" messages to determine if communication is still working.
  - Senders of messages can demand acknowledgements
  - Messages may be resent when an appropriate timeout expires.
- 16.6.2 Reconfiguration
  - When links stop working, routers can tell other routers about it so that dynamic routes will detour.
  - If a server becomes unreachable, clients may elect a new server. In a partitioned network the result may be two servers performing conflicting actions.
- 16.6.3 Recovery from Failure
  - If sites periodically try links with test messages then sites can learn when non-functional links once again become operational. Routers can propagate the information to other routers so that all sites become aware and take it into account in their routing algorithms.
  - A site that has been cut off from the network may need to download routing information in order to "catch up."
Section 16.7 -- Design Issues
- In a distributed operating system users access remote resources in exactly the same way as local resources.
- The user should be able to access the same computing environment regardless of what host s/he logs into.
- The system should be fault-tolerant -- able to continue functioning when a subset of its resources fail -- hosts, links, processes, peripherals, servers.
- It is interesting to ask the degree to which future systems will depend on RAID technology to insure fault-tolerance versus how dependent they will be on higher-level operating systems protocols.
- Distributed systems should be scalable -- react proportionally and not unstably to growth and/or increased load.
- It may be essential that a distributed system be well stocked with spare resources.
- "The service demand from any component of the system should be bounded by a constant that is independent of the number of nodes in the system." If not there is a bound on how large the system can grow without breaking down.
- Autonomy and symmetry are important goals. (Clustering small client-server groups that mostly communicate with each other may approximate autonomy and symmetry sufficiently well.)
- If a service requires blocking system calls, servers using persistent sets of lightweight threads seems to be a good server process model.
Section 16.8 -- An Example: Networking
- LAN's typically use an Address Resolution Protocol (ARP) to translate IP Number <--> Medium Access Address.
Section 16.9 -- Summary

Chapter Sixteen -- Distributed System Structures -- Lecture Notes

Joking in a CACM publication of June 1992, Leslie Lamport stated: "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable."