(Latest Revision: 
Sun Nov 24 13:34:16 PST 2002
)  
Chapters Ten-Fourteen 
-- 
Chapter Eleven: File-System Interface 
Chapter Twelve: File-System Implementation 
Chapter Thirteen: I/O Systems 
Chapter Fourteen: Mass-Storage Structure 
-- 
There is material here that relates to chapters 11-14.  These notes were
made based on earlier editions of the course text(s).  I don't have notes
based on edition six yet. 
The OS implements the abstract concept of a file.  In this chapter, we
consider the various ways to map files onto devices.  We look at a variety
of directory structures. 
-   11.1 File Concept  
     A file is a NAMED collection of related information defined by its
     creator -- in general a SEQUENCE of bits, bytes, lines, or records.  A
     file should also be thought of as an ABSTRACT DATA TYPE. 
    Files have various attributes, like type, size, length.  Which, if any,
    of these attributes should the OS be concerned with?  Unix systems have
    opted to treat all files simply as a sequence of 8-bit bytes, thus
    providing maximum flexibility and minimal OS support of file types.
    
    In Unix, the logical file record is one byte.  The file system
    automatically PACKS and unpacks bytes into physical disk blocks (of say
    512 bytes).  Such packing is common to most operating systems.  Files
    may be viewed by us simply as a sequence of blocks (sectors).  The
    conversion from blocks to a sequence of logical records is a simple
    software problem. 
    All file systems suffer from internal fragmentation due to the packing
    of logical records into physical blocks. 
 -   11.1.2 File Operations  
     A file is an ABSTRACT DATA TYPE.  To discuss files we must consider
     the operations that the data type makes available, and how they may be
     implemented.  Note:  the file operations have to be implemented as
     SYSTEM CALLS. 
 
     
     -  Creating a File 
          To create a file, space must be allocated for it, and an entry
	  for it must be made in the disk directory. 
	
      -  Writing a File 
          This operation is a simple write at the end of the file.  The
	  file name and the information to be written are input to this
	  operation.  The OS uses the name to find its entry in the disk
	  directory.  The directory needs a pointer to the next block to be
	  written, and this pointer needs to be updated. 
	
      -  Reading a File 
          Input to this operation will include the file name and the
	  address in memory where the next block of the file is to be
	  copied.  The directory is searched for the file's entry.  The
	  directory will need a pointer to the next block to be read, and
	  this pointer will need to be updated. 
      -  Rewind a File 
          Input to this operation will include the file name.  The
	  operation can update the current pointer so it points to the
	  first block of the file. 
      -  Delete a File 
          Input to this operation will include the file name.  The
	  operation should look up the file in the directory and deallocate
	  all its blocks.  
      
     So that each operation will not have to search for the file's entry in
     the disk directory, the OS may support an OPEN operation, which searches
     the disk directory for the named file, copies its directory entry into a
     memory-resident table of open files, and returns a pointer to the table
     entry for that file.  Subsequent file operations get the needed
     directory information from the table instead of the disk directory, by
     following the pointer.  This reduces the number of disk accesses
     required to do file operations. 
     A file may be CLOSED -- which results in its table entry being 
     removed. 
 
 -   11.2 Access Methods  
     Choosing the right file access method for a particular application is
     important on systems that support more than one method.  Knowing what
     methods are available on a given system may be important when shopping
     for a platform for some specific application, or set of applications.
     (Imagine buying a system to support a large data base, and then finding
     out that all file accesses are sequential only.) 
     
     -   11.2.1  Sequential Access 
 
          In sequential access, the file (current) pointer is positioned at
	  the beginning of the file when it is opened.  After each read (if
	  the file is open for reading, the pointer is advanced to the next
	  file block.  After each write (if the file is open for writing),
	  the pointer is advanced in the same way, past the material just
	  written, to the new end-of-file.  It may be possible to "skip"
	  forward or back n blocks (but not without reading the blocks in
	  between.)  This is based on the TAPE-MODEL of a file. 
      -   11.2.2  Direct Access  
          This allows the file to be treated more like an ARRAY of blocks
	  (though having DIRECT ACCESS does not mean the same thing as having
	  RANDOM ACCESS -- disk hardware does not support true random access
	  -- that's why it's NOT called "RAM" |-)) 
          The user is allowed to access blocks by number in any order
	  whatever, and it is not necessary to read any of the blocks located
	  BETWEEN two blocks that are accessed successively. 
          For direct access, the desired block number must be an input
	  parameter of the read and write operations, or perhaps the same
	  effect can be achieved by implementing an operation (called
	  POSITION or "SEEK") that moves the file pointer to a desired block,
	  after which a simple read or write can be done. 
          The user's view of a direct access file is usually a sequence of
	  blocks numbered from 0 to some max.  The OS usually translates
	  these RELATIVE BLOCK NUMBERS into absolute block numbers that
	  correspond to where the block is really physically located on the
	  disk. 
          It is easy to simulate sequential access on a direct access file,
	  but extremely inefficient to simulate direct access on a sequential
	  file. 
      -   11.2.3  Other Access Methods  
          Index Blocks into files can provide other forms of access.  For
	  example, a large ordered file of alphanumeric strings could be
	  indexed in such a way that the index block would contain one entry
	  for each alphanumeric character, pointing to the first block that
	  contains a string starting with that character (i.e. A pointer to
	  the a's, the b's, and so on.)  This trick is commonly used to
	  provide faster searches for key fields in file records (less I/O
	  required.) 
      
 -   11.3  Directory Structure  
     A directory structure provides a mechanism for organizing a large file
     system.  Many systems have two separate directory structures: the DEVICE
     DIRECTORY and the FILE DIRECTORIES.  The DEVICE DIRECTORY is stored on
     each physical device and describes all files on that device. 
     The device directory mainly concentrates on describing THE PHYSICAL
     PROPERTIES of each file: where it is, how long it is, how it is
     allocated, and so on. 
     THE FILE DIRECTORIES ARE A LOGICAL ORGANIZATION OF THE FILES ON ALL
     DEVICES.  The file directory concentrates on LOGICAL properties of each
     file: name, file type, owning user, accounting information, protection
     access codes, and so on. 
     Things that may be kept in a file directory entry: 
     
     -  file name
     
 -  file type
     
 -  location (pointer to device and address on device.)
     
 -  size
     
 -  current position (read or write position -- if file is open)
     
 -  protection
     
 -  usage count (number of processes that currently have this file open)
     
 -  time, date, and process identification (creation, modification, last use)
     
 
 
     Directory entries may contain from 16 to over 1000 bytes.  In a system
     with a large number of files, the size of the directory itself may be
     hundreds of thousands of bytes.  Thus the file directory may need to be
     stored on the device and brought into primary memory piecemeal, as
     needed. 
     Since the file directory is a data structure that has to be searched
     frequently, and has to have frequent insertions and deletions, tree
     implementations and hash table implementations are fairly common.
     
     The file directory is essentially SYMBOL TABLE. 
     The operations that are to be performed on a directory: 
      
     -  Search 
          Find pointer to device directory for opening.  Check for attempt to
	  create duplicate filename.  Find to list properties. 
	
      -  Create File 
          New entry needs to be created.	 
      -  List Directory 
	
      -  Backup 
      
     
     -   11.3.1  Single-Level Directory  
          Simplest kind to implement -- inconvenient for users with large
	  numbers of files.  Hard to keep file names unique, hard to organize
	  groups of files. 
       -   11.3.2  Two-Level Directory  
           Give each user his own directory.  The user's directory is the
	   default.  All files referenced are assumed to be in the user's
	   directory unless specifically stated otherwise.  System may allow
	   access to other users files through the use of PATH NAMES.  Since
	   some files used must be kept in the master file directory (so they
	   can be shared), and since it is sometimes not practical to expect
	   users to give complete path names, the need arises to have a
	   SEARCH PATH (look in this list of places for files mentioned.)
	   Such a system is not much better than a single-level directory.
	   The user has to put all his files in one place, devise unique
	   names and work too hard to keep related files organized. 
      -   11.3.3  Tree-Structured Directories  
 
          Like the Unix file system. 
      -   11.3.4  Acyclic Graph Directories  
          Allowing files to be contained in more than one directory
	  facilitates file sharing.  This introduces a version of the
	  ALIASING PROBLEM -- files may have many pathnames -- procedures
	  that require traversal are complicated by need to avoid processing
	  nodes more than once.  Deletion has to be managed carefully to
	  avoid dangling references.  
      -   11.3.5  General Graph Directory  
      
 -   11.6  File Protection  
     RELIABILITY is generally assured by doing frequent backups to more
     permanent media. 
     PROTECTION is provided by controlling access.  For instance,
     restrictions may be placed on the following kinds of accesses: 
 
     
     -  Read
     
 -  Write
     
 -  Execute
     
 -  Append
     
 -  Delete
     
 
 
      
     -   Naming  
          Protection can be based on preventing access to a file to those who
	  cannot name it. 
      -   Passwords 
          Passwords can be put on files to restrict access.  Large numbers of
	  passwords then have to be remembered, or, if many files have the
	  same password, they ALL become accessible as soon as the password
	  is known.  Protection on individual files tends to be on an
	  all-or-nothing basis -- either you don't know the password, and you
	  can't do anything with the file, or you DO know it and you CAN do
	  ANYTHING with the file.  Quite often, you'd just as soon give only
	  limited capabilities to users of the file, and quite often they'd
	  much prefer that, because they don't wish to risk doing damage to
	  it.
      -   Access Lists 
          Each user has certain rights to the file that are enforced by the
	  OS.
      -   Access Groups 
          More compact to represent the users in groups (owner, group,
	  others).
      
 -   Chapter 12:  Implementation Issues  
File System Levels:
5.  application programs 
4.  Logical File System		
      	                                
(uses the file directory to provide level 3 with the information it needs.
It translates from symbolic file names to pointers into the device
directory.) 
3.  File Organization Module		
(generates addresses for level 2 by using device directory information, and
knowledge of file allocation method.) 
2.  Basic File System 
(uses level 1 to read blocks using addressing in terms of drive, cylinder,
surface, and sector.) 
1.  Input/Output	
(device drivers and interrupt handlers to actually transfer information)
0.  devices 
Directory information can be cached to increase the speed of file operations,
but that introduces a cache-consistency problem.  (sync, sync, sync.)