(Latest Revision: 
Tue Nov 22 01:38 PST 2016
)  
Chapter Eleven -- File-System Interface -- Lecture Notes
     -  The file system consists of files and a directory structure
          
 -  11.0 Objectives 
      
     -  Explain the function of file systems
     
 -  Describe the interfaces to file systems
     
 -  Discuss file-system design tradeoffs, including
          access methods, file sharing, file locking, and 
          directory structures
     
 -  Explore file-system protection
     
 
 -  11.1 File Concept 
     
     -  A file is a logical unit of storage - the smallest allotment
          of logical secondary storage - a named collection of 
          related information that is recorded on secondary storage
          - a clinking, clanking, clattering collection
          of caliginous junk, as it were.  
      -  11.1.1 File Attributes 
           
          -  Typical file attributes include name, 
               unique identifier, type, location, size, 
               protection, time, date, and user (owner) 
               identification. 
           -  File attributes are stored on secondary memory,
               somewhere in the file system directory structure.
               The details of how the information is stored vary
               from system to system. 
           
 
 
      -  11.1.2 File Operations 
           
          -  File operations are implemented with system calls
                
           -  Typical operations are create file, write to file, 
               read from file, reposition file pointer, delete file, 
               and truncate file. 
           -  Many systems require that files be opened before use.
               The open operation places directory information about 
               the file into an open file table data structure in 
               primary memory. That way, processes will be able to 
               make a lot of accesses to the file without needing to
               fetch directory information from the disk each time.
               
           -  In systems that allow multiple processes to have a file open
               simultaneously, it is customary to have two levels of 
               open file tables - a single system-wide table, and multiple
               per-process open file tables. Each per-process table holds information
               having to do with the particular process' use of the file,
               such as the current read and write positions of the process
               in the file. The entry for a file in the per-process table 
               contains a pointer to the entry for the file in the 
               system-wide table.  
               Process-independent things like location of the
               file on disk, access dates, file size, and file open
               count are contained in 
               the file's entry in the system-wide file table. 
           -  File-locking operations may be available, shared 
               and/or exclusive locks, mandatory and/or 
               advisory locks.
           
 
 
      -  11.1.3 File Types 
           
          -  There are various file types, such as text, binary,    
               executable.  Often filename extensions are used to indicate
               the type of a file.  Most file types are not fully supported
               by the operating system.  
           -  It is common for the OS to treat most files simply as an
               unstructured sequence of bytes. 
           
 
 
      -  11.1.4 File Structure 
           
          -  Each OS must fully support at least one executable file type, 
               so that the OS can load and execute programs.
           
 
 
      -  11.1.5 Internal File Structure 
           
          -  All basic I/O functions are performed block by block - 
               physical file blocks - normally the 512-byte data sections
               of disk sectors. 
           -  Files are sequences of logical records that must be mapped
               to file blocks.  Applications may do the mapping, or the 
               OS may do it. 
           -  All files are allocated in whole numbers of physical blocks,
               and therefore all file systems suffer from internal 
               fragmentation. 
            
 
      
 -  11.2 Access Methods 
     
     -  11.2.1 Sequential Access 
           
          -  Sequential access is the simplest and most 
               common file access method 
           -  The file is read as a sequence from start to finish -
               as a tape would be read. 
           -  Writing is similar, each new item is appended to the end 
           -  Typical operations are "read next" and "write next" 
           -  A pointer to the current location is maintained 
           -  Repositioning of the pointer (seek) may be supported 
           
 
 
      -  11.2.2 Direct Access 
           
          -  Direct access allows the user to treat the file as if it
               were an array of file blocks. 
           -  Typical operations are of the form read block #n or 
               write block #m 
           -  Block numbers are usually logical addresses that run
               from 0 contiguously to some upper limit. 
           -  Assume that the first byte in the file is numbered 0,
               that the file is a sequence of records of size L bytes,
               that the records are numbered starting at 0,
               and that we want to request the Nth record in the file.
               In that case we compute the starting byte number N*L
               and fetch the L bytes of the file starting with byte N*L.
           
 
 
      -  11.2.3 Other Access Methods 
           
          -  Other access methods can be built on direct access, often
               using an index to look up file block numbers and 
               then using direct access. 
           
 
      
 -  11.3 Directory and Disk Structure 
     
     -  An entity containing a file system may be referred to as a 
          volume.  A volume may be thought of as a 
          virtual disk. 
      -  A disk can contain one volume, or
          be partitioned and contain multiple volumes.   
      -   Some partitions may contain swap space, or raw disk space.
           
      -   A single volume may also be created as some sort of union
           of multiple disks and/or partitions. 
      -   A volume needs to have certain data structures for use in 
           implementing its file system, notably a 
           device directory (aka the directory),
           which contains such things as names, locations, sizes, and types
           of all files on the volume. 
 
      -  11.3.1 Storage Structure 
          
          -  There can be many file systems of differing types on a 
               computing system.  
           -  Solaris has ufs and zfs as general-purpose file systems
               
           -  Solaris special purpose file systems include tmpfs, 
               objfs, ctfs, lofs, and procfs 
          
 
      -  11.3.2 Directory Overview 
          
          -  The directory is basically a table for looking up
               information about files, using the name of the file
               as the lookup key. 
           -  The directory needs to support certain operations: 
               search for file, create file, delete file, list directory,
               rename file, and traverse file system.
          
 
      -  11.3.3 Single-Level Directory 
          
          -  In this scheme the directory works like a single list
               of entries.  Even if there are multiple users, no two
               files are allowed to have the same name.
          
 
      -  11.3.4 Two-Level Directory 
          
          -  In a two-level directory structure, there is a 
               master file directory that has multiple sub-directories.
               
           -  Each user on a computing system can be assigned his or her
               own 'home' directory.  Users can name files whatever 
               they like without fear of collisions with the filenames
               of other users. 
           -  If a user name and file name within the user's directory are
               specified, this "pathname" uniquely determines which file it is.
               
           -  The OS assumes that a filename without a user name refers to the 
               user's own directory, or to a special directory that contains
               the system files (e.g. programs that are user shell 
               commands) 
           -  The sequence of directories searched when a file is named is
               called a search path.
          
 
      -  11.3.5 Tree-Structured Directories 
          
          -  The tree-structured directory is a generalization of the
               two-level directory structure that allows users to create
               their own tree of subdirectories, and to use this structure
               to group and organize their files. 
           -  The tree has a root, and every file or directory has a unique
               pathname that starts with the root. 
           -  Processes can typically "move around" in the tree, by 
               using a system call to specify which directory is their
               current working directory.  
           -  The accounting file (e.g. passwd file) of a user 
               typically designates which directory should initially
               be made the current working directory when the user 
               logs in. 
           -  Pathnames can be absolute or relative. 
          
 
      -  11.3.6 Acyclic-Graph Directories 
          
          -  This structure allows directories to share a file 
               or subdirectory.  By definition, this is not possible 
               in a tree. 
           -  Shared files and subdirectories may be implemented 
               through the use of (symbolic) links
               [aka soft links].  A symbolic link
               may be thought of as a file that contains a path name.
               The directory entry of the symbolic link has
               a special bit value set that marks the file as
               a link rather than an ordinary file.  For example,
               if /x/y is a file that we wish to share, we can
               put a symbolic link in /z containing the pathname
               "/x/y" and name the symbolic link (file) r.  
               Then all references to /z/r 
               will access the same file as /x/y. 
           -  Going on with the previous example, the original 
               directory entry for /x/y is sometimes referred to 
               as a hard link.  It is just an ordinary
               directory entry, typically consisting of the name
               of the file (y, in this case) and the address on disk of the 
               directory information for the file.  
           -  Another way to implement the sharing of the file
               would be to use another hard link - an entry in the
               /z directory that contains the other name 
               (r, in this case), plus the same address 
               on disk of the directory information
               for the original file (known as /x/y).  This gives us 
               two separate directory entries that point to the
               same file on disk.
          
 
      -  11.3.7 General Graph Directory 
          
      
 -  11.4 File-System Mounting 
     
     -   Typically there is a need for ways to unify a collection
           of volumes into a single logical file system.  Mounting
	   is a term that refers to the incorporation of
	   one file system (sub)structure into another. In the unix
	   operating system, there is one unified file system in the form of
	   a rooted acyclic graph.  Substructures are integrated into the
	   file system using the unix mount command. 
	   
	   Implementation of mount requires insertion
	   of link(s) into the directory structure, and usually the
	   OS has to make some sort of special notation so that it
	   treats that link  properly.  For example, if it is
	   a link to a separate file system, it should not be mistaken
	   for a link to a file or a directory.
      
 -  11.5 File Sharing 
     
     -  There are potential advantages to allowing multiple processes
          and/or users to share files.  However there can be challenging
	  implementation issues such as how to handle concurrent writes to
          shared files.	  
      -  11.5.1 Multiple Users 
          
          -  When there are multiple users on a system, the OS has to
	       do something to prevent the users from doing harmful things
	       to each others files, or misusing file contents.  This is
	       the problem of access control and protection.  
	   -  Oftentimes the OS keeps track of the owner
	       and group of each file and directory.
               The OS allows the owner of the file to control access
	       to the file.  A group is assigned to the file in order
	       to designate a set of users who are allowed to share
	       (usually limited) access to the file.
	       
	   -  There has to be a
	       data structure where the attributes
	       of each file are stored, such as owner, group, file
	       permissions, time and date of creation, size, and so forth.
	       
	       
           -  When a process attempts to access a file or directory,
	       the OS can check the attributes to determine whether to
	       allow the access. 
          
 
      -  11.5.2 Remote File Systems 
          
          -  Files can be shared using networks.  The two main
	       approaches are file transfer (used by ftp and the WWW)
	       and distributed file systems (DFS), such as the Network File
	       System (NFS). 
	   -  11.5.2.1 The Client-Server Model
               
               -  A DFS allows client computers to mount (via a network)
	            directories or file systems that reside on a
		    remote file server computer. 
	            
		    
                -  Care must be taken to assure that clients and servers
	            are properly authenticated lest unauthorized accesses
		    occur.
               
 
	       
           -  11.5.2.2 Distributed Information Systems 
               
	       -  Distributed Information Systems such as Microsoft's
	       active directory and the lightweight
	       directory-access protocal (LDAP) of the Internet
	       Engineering Task Force (IETF) are used to create network
	       accounts that have uniform user names, user id numbers, and
	       passwords for all hosts in, say, an office network (or
	       computer science laboratory network). Such systems provide
	       secure user authentication. 
	       
 
	   -  11.5.2.3 Failure Modes 
               
               -  The rules of how most DFSs work say that if the network
	            'goes down' during the execution of a file operation,
		    the system should wait/delay until the network begins
		    to function properly again.    
	        -  NFS has a stateless protocol.  If the network
	            is interrupted or if a server crashes, service
	            will be restored eventually when the network or
		    the server comes back up. The client is
		    programmed to repeat its requests for file service
		    until it gets a response.  Each request contains all the
		    information the server needs.  Therefore, despite
		    network or server crash, file service continues
		    normally, except that there is delay.
               
 
           
      -  11.5.3 Consistency Semantics 
          
	  -  11.5.3.1 Unix Semantics 
               
               -  Writes to an open file are visible immediately to
	            other users who have this file open.
		    
	        -  There is a mode of unix file sharing in which the file
	            pointer is shared while writing to the file, so when one
	            process writes a byte to the file, the write position
		    in the file moves down one byte for all the processes
		    that have the file open in that mode.
	            
		    
                -  The file semantics are that all the processes share
	            a single copy of the file.  When two or more processes
		    attempt to access the file concurrently, the contention
		    can cause delay.
	            
                
	   -  11.5.3.2 Session Semantics 
               
	       -  The session semantics used in the Andrew file system
	            (OpenAFS) are different from unix semantics.
	            
	        -  When a user X writes to an open file, that write is not
	            immediately visible to other users that have the file
		    open. 
		    
                -  The change made by X will not be seen by other users
	            until after X closes the file.  Even then, only
		    users who open the file after  X closes
		    it will see the change X made. 
	       
                -  Session semantics support the view that there may be
	            multiple copies of the file in existence concurrently,
		    and concurrent reads and/or writes can proceed
		    immediately without contention or delay.
		   
	       
                
	   -  11.5.3.3 Immutable Shared File Semantics 
               
               -  Under immutable shared file semantics, a file
	            cannot be changed at all once it is set up to be
		    shared by two or more processes.  
		    
                -  This is easy to implement, but of course it
	            just supports read-only files. 
                
           
      
 -  11.6 Protection 
     
    -   Administrators should protect the file system by doing backups on a
          regular schedule, and by keeping the physical location of the
	  computer secure.  
     -  11.6.1 Types of Access 
         
	 -   The types of file access that may be controlled include
               read access, write access, execute privilege, append rights, 
               delete rights, the right to list (the name and attributes)
               of a file.
         
 
     -  11.6.2 Access Control 
         
	 -  One approach to controlling access to files would be to keep
              an access control list (ACL) with each file or directory.
              The ACL is a table data structure keyed on user id.  Given the 
              id of any user, one can look up the specific access rights 
              the user has on the file or directory object. It's difficult
              to implement such full ACLs. 
	  -   Many systems are designed with a condensed form of ACLs
               in which access rights are stored just for three entities:
               the owner of the file (or directory), the group, and everybody
               else.  
	  -   Some systems, like Solaris, use the (owner, group, others)
               approach by default, but also allow more detailed access 
               controls to be added to specific file objects. (The term file
               object is meant to include files, directories, and other
               special kinds of items that might exist in the file system.)
               
	  -   Read, write, and execute bits are commonly associated with
               each of the three classes: owner (user), group, and other.
               
           -  For a plain file, a set read bit means a member of the class
               has permission to read the file.  Similarly the write bit gives
               permission to write, and the execute bit gives permission
               to execute the file (presumably the file is a program, 
               script, or the like). 
          -   For directories, the read bit gives permission 
               to list the directory.  The write bit allows creation
               of new files in the directory and, if it is empty, 
               deletion of the directory. The execute bit gives permission
               to cd to the directory, and to access the file objects in the 
               directory, subject to the permissions on the objects
               themselves. 
 
           -  Users who lack either write or execute permission
               on a directory cannot delete a file object in that 
               directory. 
         
 
     -  11.6.3 Other Protection Approaches 
         
         -  One approach is to assign a password with a file or
              directory.