(Latest Revision: 
Sun Sep 19, 2005
)  
Chapter Sixteen
-- 
Distributed System Structures
-- 
Lecture Notes
Joking in a CACM publication of June 1992, Leslie Lamport stated:
"A distributed system is one in which 
 the failure of a computer you didn't even know existed
 can render your own computer unusable."
 Introduction --  
     
     -  A DISTRIBUTED (computing) SYSTEM is a LOOSELY COUPLED collection
	  of processors.  
          Such computing systems do not share memory, or a clock.  
          Each processor has its own local memory and clock.  
          The processors do communicate via some medium (e.g. busses, a LAN
	  or a WAN) and they cooperate to perform system functions.
	  
          Sun Microsystems' old slogan sums up the idea of a distributed
	  system pretty well:   "The Network is The Computer."
	   
          A distributed system may be highly heterogeneous -- including
	  many different types of computers and devices. 
          Distributed systems:
          
          -  help users share resources and data more easily,
          
 -  help users get work done faster, and
          
 -  provide greater reliability through redundancy. 
          
 
 
          DISTRIBUTED SYSTEMS are to be distinguished from MULTIPROCESSOR
	  SYSTEMS.  MULTIPROCESSOR SYSTEMS are TIGHTLY COUPLED systems
	  which DO share memory and a clock. 
          A distributed file system is file service "whose users,
	  servers, and storage devices are dispersed among the sites of a
	  distributed system."  
          In a distributed system, there are special problems associated
	  with providing process synchronization, process communication,
	  handling deadlock, and handling system failures.  
      
 Section 16.1 -- Background  
     
     -  16.1.1 Advantages of Distributed Systems  
          The four main reasons for building distributed systems are:
	  RESOURCE SHARING, COMPUTATION SPEED-UP, RELIABILITY, and
	  COMMUNICATION. 
          
          -  16.1.1.1 Resource Sharing  
               Sharing printers and other peripherals, files, and cpu
	       cycles. Sharing expensive special resources physically
	       located at just one or a few sites -- e.g. an electron
	       microscope, super computer, or a special database. 
           -  16.1.1.2 Computation Speedup  
               Run jobs on more than one CPU -- get computation speed-up
	       through parallel processing.  Also alleviate heavy loads on
	       one processor by having another processor do some of the
	       work -- off-loading and load-balancing. 
           -  16.1.1.3 Reliability  
               If enough redundancy exists in the system, parts of the
	       system can fail, and useful work can continue.  Is this
	       working in our local environment.  Why, or why not?
	       
               There must be appropriate mechanisms for detecting failure,
	       transferring work to alternates, executing recovery
	       procedures, and re-integrating recovered resources.
	       
           -  16.1.1.4 Communication  
               Given communication capability the network of machines can,
	       in principle, function as one computer.  
               A distributed system is often a cost-effective replacement
	       for a mainframe system, having greater flexibility. 
               Some of the services typically available are file transfer,
	       electronic mail, and remote login.  
           
      -  16.1.2 Types of Distributed Operating Systems   
           
     
          -  16.1.2.1 Network Operating Systems  
               Characteristics of a network operating system: 
               
               -  The user is aware of the multiplicity of machines. 
                    
                -  Use of remote login (e.g. ssh) or file transfer (e.g.
		    ftp) 
                
           -  16.1.2.2 Distributed Operating Systems  
               Characteristics of a distributed operating system: 
                
               -  Location independence: The user accesses remote
		    resources in exactly the same manner as local
		    resources.  
                -  Data Migration: automatic migration of process data
		    between local and remote hosts as needed.  
                    Whole files may be copied over the net, or just the
		    portions that are being read and/or modified -- which
		    method is used is a design decision that may depend on
		    the type of file operations that are anticipated.
		    Differences in data representation used on the machines
		    on the net may make it necessary for file sharing to
		    involve a lot of format conversion. 
                -  Computation Migration: automatic remote procedure call
		    or message passing which causes computation to be
		    carried out remotely and results returned to the local
		    process.  
                    Remote Procedure Call is an important "catch phrase" in
		    distributed systems.  This is a means for one machine
		    to cause a function or program to be run on a remote
		    machine, and for that function or program to send the
		    results of the computation back to the caller over the
		    network.   
                    For example, suppose I login to altair and do an
		    ls of my home directory.  My home directory
		    /user/dept/john is actually an NFS-mount:
		    eos:/export/home/dept1.  Therefore my ls
		    command causes the execution of some I/O to be done in
		    my behalf on the computer "eos." The result of the I/O
		    is the information in my home directory. A process on
		    eos performs the I/O and sends the output back to my
		    ls process on altair, which displays it on the
		    screen. 
                    If one desires to have the results of a computation
		    performed on some data that resides at a remote site,
		    there are a couple of ways to get those results:
		    
                   1.  Transfer the data and perform the computation
		       locally, or 
                   
                   2.  Use remote procedure call to have the computation
		       performed at the remote site, and transfer the
		       RESULTS to the local site. 
                   In many cases, it is more economical to use the second
		   method.  In other cases, the second method may be the
		   ONLY practical way to do the computation. 
                -  Process Migration:  
                    Jobs submitted at one site may be sent to another site
		    for execution.  Below are some motivations for this:
		    
                    
                    -  Load balancing
	
	            
 -  Computation speed-up -- divide the job up and do
		         parallel processing of the parts.
	            
 -  Hardware preference -- special processors more
			 suited to the task may be employed.
	
	            
 -  Software preference -- move the job to a site
			 where the software needed is running.
	            
 -  Data Access -- more economical to move the process
			 to the data than vice-versa.
                    
 
                
 
           
  
      -  16.1.3 Putting it All Together  
      
 
      
 Section 16.2 -- Topology  
     Topologies may be evaluated based on the following criteria: 
     
     -  Basic cost -- Cost of establishing links 
      -  Communication cost -- Cost needed to send from host A to host B.
          
      -  Availability -- How well does system tolerate individual site
	  failures? 
      
Fully Connected 
Basic cost            		High: number of links is N(N-1)/2
Communication cost		Low: all pairs are directly connected.
Reliability			High
Partially Connected 
Basic cost			Lower than fully connected.
Communication cost		Higher than fully connected
Reliability			Lower than fully connected
Good idea to build it with at least two routes between each pair of nodes.
Hierarchy 
Basic cost			Low:  O(N)
Communication cost		Varies
Reliability			Low
Low levels in hierarchy are very dependent on "ancestors" -- lack of
redundancy of paths makes for lower reliability.
Star 
Basic cost			Low
Communication cost		Low
Reliability			Low
Equivalent to a hierarchy with only two levels.
Ring 
Basic cost			Low:  O(N)
Communication cost		High: O(N)
Reliability			Low
Reliability can be improved by adding more links -- possibly by using a
"smart hub" that can instantly bypass failed nodes.
Multi-Access Bus 
Basic cost			Low
Communication cost		Low (if channel has very high capacity.)
Reliability			High (if channel is dependable.)
 Section 16.3 -- Network Types   
     
     -  16.3.1 Local-Area Networks  
          LANS provide a way for a set of small computers with individualized
	  applications software to share peripherals and files, and to
	  support communication among users. 
          To a great extent LAN'S evolved as a more cost-effective and
	  convenient alternative to mainframe computers.  
          LAN's tend to have expensive, high-speed links.  
      -  16.3.2 Wide-Area Networks  
          WAN's got started during the 1960's, mainly as an academic
	  enterprise for communication and sharing.  The department of
	  defense funded much of the early development, and furnished a
	  backbone network called the ARPANET.  The ARPANET was phased out,
	  except for the portion that supports military installations.  For a
	  while the National Science Foundation took up the responsibility of
	  furnishing a backbone for the academic community.  Nowadays the
	  Internet backbone is maintained by giant communication companies.
	  
          See
           
          http://www.navigators.com/internet_architecture.html#ISP_Backbone 
          
          and
           
          http://www.isoc.org/
          
         to learn all about the Internet. 
          The Internet is public.  Private WAN's exist too. 
          WAN's tend to have links with lower speeds than LAN's.  It would be
	  too expensive to extend LAN technology into WAN's. 
      
 Section 16.4 -- Communication   
     
     -  16.4.1 Naming and Name Resolution  
          
          -  Typically hosts have names -- for the benefit of human users.
	       
           -  Software performs "automatic"translation: hostname<-->network
	       numeric address. This is now done with a distributed database
	       -- DNS. 
           -  Numeric addresses are two-part -- one part for the host and
	       another part to identify a particular process on the host.
	       
           
      -  16.4.2 Routing Strategies  
          
          -  Fixed Routing -- hosts use static routing tables. 
           -  Virtual Routing -- hosts establish a route to be used for each
	       "session" or "connection." 
           -  Dynamic Routing -- A host chooses a (possibly new) route each
	       time it sends a message. This helps deal with congestion and
	       failure but allows messages to arrive out of order.  Routers
	       communicate with each other about the existence of routes.
	       
           
      -  16.4.3 Packet Strategies   
          
          -  Messages are broken up into packets of equal or near-equal
	       size.  This facilitates the sharing of the network.  It works
	       like time-slicing.  
           -  Networks can use packets with connection-based protocols or
	       with connectionless protocols.  
           
      -  16.4.4 Connection Strategies   
          
          -  Circuit Switching -- A dedicated physical circuit is set up
	       for each connection.  Usually intermediate switches do most of
	       the work without human intervention.  (telephone paradigm) --
	       requires set-up and tear-down -- less expensive for longer
	       messages.  
           -  Message Switching -- Typically permanent links that are
	       assigned to a process for sending one message, and then
	       reassigned for another message transfer, and so on.  (mailbox
	       paradigm) 
           -  Packet Switching -- messages are divided up into packets.
	       Network software dispatches packets according to a post-office
	       type of paradigm.  
           
      -  16.4.5 Contention 
          
          
          -  CSMA/CD -- Carrier Sense Multiple Access with Collision
	       Detection -- Exponential back-off algorithm. (IEEE 802.3
	       standard) -- You need to limit the amount of traffic on each
	       segment to keep the number of collisions reasonably low.
	       Ethernet is most efficient when the network operates at
	       substantially below capacity data rate.  
           -  Token Passing -- Pass the token around a ring.  The holder of
	       the token is allowed to send a message.  If the token is lost
	       the hosts can start again with an election algorithm.
	       Some IBM and HP/Apollo systems use token ring.  Token ring is
	       most efficient when networks operate at highest capacity
	       traffic rates.  
           -  Message Slots -- slots move around a ring and a host may put a
	       message in a slot.  
           
      
  Section 16.5 -- Communication Protocols  
      Protocols do things like determine host addresses, establish
      connections, and assure reliability of communication. 
      Protocol "software" may actually be implemented in
      software/firmware/hardware.  
      Typically protocol software is designed in layers -- each layer on the
      local host communicates with its peer layer on the remote host.
      
      The International Standards Organization (ISO) model of network
      protocols is none-too-practical but nevertheless is often described to
      illustrate the concept of layered protocols.  There are seven layers:
      
     
     -  Physical Layer -- concerns physical transmission of bits -- is
	  implemented in the network hardware.  
      -  Data-link Layer -- responsible for organizing data into frames and
	  performing best-effort at reliable delivery (checksums). 
      -  Network Layer -- the level at which routers work -- responsible for
	  making connections, addressing & routing packets, maintaining
	  routing information.  
      -  Transport Layer -- responsible for end-to-end transfer of messages,
	  division of messages into packets, reassembling messages, flow
	  control & effective use of bandwidth. 
      -  Session Layer -- responsible for implementing communication
	  sessions between processes.  
      -  Presentation Layer -- responsible for resolving differences in data
	  representation and communication styles.  
      -  Application Layer -- responsible for interacting directly with
	  users.  
      
     Each layer performs some function (and may add header or trailer
     information) as data moves up and down the protocol stack. 
     TCP/IP has fewer layers -- in theory more difficult to implement but
     having less overhead. 
TCP/IP                       OSI	  
Application               Application, Presentation
Transport (TCP-UDP)       Transport
Internet (IP)             Network
Network Interface         Data Link
Physical                  Physical
  Section 16.6 -- Robustness  
     The system must deal with failure of hosts, networks, and links.  It
     must deal with lost and corrupted messages. 
     
     -  16.6.1 Failure Detection  
          
          -  Often it will be difficult or impossible to tell whether
	       failures are due to a link or host going down, or due to
    	       congestion or some other cause. 
 
            -  Sites can trade "are you up" messages to determine if
	        communication is still working.  
           -  Senders of messages can demand acknowledgements 
               
           -  Messages may be resent when an appropriate timeout expires.
	       
           
      -  16.6.2 Reconfiguration  
          
          -  When links stop working, routers can tell other routers about
	       it so that dynamic routes will detour.  
           -  If a server becomes unreachable, clients may elect a new
	       server.  In a partitioned network the result may be two
	       servers performing conflicting actions. 
           
      -  16.6.3 Recovery from Failure  
          
          -  If sites periodically try links with test messages then sites
	       can learn when non-functional links once again become
	       operational.  Routers can propagate the information to other
	       routers so that all sites become aware and take it into
	       account in their routing algorithms. 
           -  A site that has been cut off from the network may need to
	       download routing information in order to "catch up."  
           
      
  Section 16.7 -- Design Issues   
     
     -  In a distributed operating system users access remote
	  resources in exactly the same way as local resources. 
      -  The user should be able to access the same computing environment
	  regardless of what host s/he logs into.  
      -  The system should be fault-tolerant -- able to continue
	  functioning when a subset of its resources fail -- hosts, links,
	  processes, peripherals, servers.  
      -  It is interesting to ask the degree to which future systems will
	  depend on RAID technology to insure fault-tolerance versus how
	  dependent they will be on higher-level operating systems protocols.
	  
      -  Distributed systems should be scalable -- react
	  proportionally and not unstably to growth and/or increased load.
	  
      -  It may be essential that a distributed system be well stocked with
	  spare resources.  
      -  "The service demand from any component of the system should be
	  bounded by a constant that is independent of the number of nodes in
	  the system."  If not there is a bound on how large the system can
	  grow without breaking down. 
      -  Autonomy and symmetry are important goals. (Clustering small
	  client-server groups that mostly communicate with each other may
	  approximate autonomy and symmetry sufficiently well.) 
      -  If a service requires blocking system calls, servers using
	  persistent sets of lightweight threads seems to be a good server
	  process model. 
      
  Section 16.8 -- An Example: Networking  
     
     -  LAN's typically use an Address Resolution Protocol (ARP) to
	  translate IP Number <--> Medium Access Address.  
      
 Section 16.9 -- Summary