7th ed. chapter 10

(Latest Revision: Wednesday, December 03, 2008 )

Chapter Ten -- File-System Interface -- Lecture Notes

There is material here that relates to chapters 10-13.

10.0 Objectives
- Explain the function of filesystems
- Describe the interfaces to filesystems
- Discuss file-system design tradeoffs, including access methods file sharing, file locking, and directory structures.
- Explore file-system protection
10.1 File Concept
- A file is a NAMED collection of related information defined by its creator -- in general a SEQUENCE of bits, bytes, lines, or records. A file should also be thought of as an ABSTRACT DATA TYPE.
- 10.1.1 -- File Attributes
  
  Files have various attributes, like name, unique id, type, location, size, owner, protection, creation date & time, and modification date & time.
  
  Typically there is some sort of directory structure on disk which contains the name and unique id of each file. The rest of the file attributes typically can be found in some way by using the unique id as a key.
  
  10.1.2 -- File Operations
  
  A file is an ABSTRACT DATA TYPE. To discuss files we must consider the operations that the data type makes available, and how they may be implemented. Note: the file operations have to be implemented as SYSTEM CALLS.
  - Creating a File
    
    To create a file, space must be allocated for it, and an entry for it must be made in the disk directory.
  - Writing a File
    
    This operation is a simple write at the end of the file. The file name (or equivalent) and the information to be written are input to this operation. The OS uses the name to find its entry in the disk directory. The directory needs a pointer to the next block to be written, and this pointer needs to be updated.
  - Reading a File
    
    Input to this operation will include the file name and the address in memory where the next block of the file is to be copied. The directory is searched for the file's entry. The directory will need a pointer to the next block to be read, and this pointer will have to be updated.
  - Rewind a File
    
    Input to this operation will include the file name. The operation can update the current pointer so it points to the first block of the file.
  - Delete a File
    
    Input to this operation will include the file name. The operation should look up the file in the directory and deallocate all its blocks.
  So that each operation will not have to search for the file's entry in the disk directory, the OS may support an OPEN operation, which searches the disk directory for the named file, copies its directory entry into a memory-resident table of open files, and returns a pointer to the table entry for that file. Instead of searching the directory, subsequent file operations follow the pointer and get the needed directory information from the table. This saves much time by reducing the number of disk accesses required to do file operations.
  
  A file may be CLOSED -- which results in its table entry being removed.
  
  Because more than one process may open a file concurrently, typically there are two levels of "open-file table" - a global table and per-process table. The system-wide global table would contain process-independent information such as file size, location on disk, open count (the number of processes that have the file open), and latest modification time. An entry for a file in a per-process table would contain the file pointer value, information regarding the access rights of the process, and a pointer to the entry for the file in the global file table.
  
  Operating systems typically support advisory or mandatory file-locking. Shared locks, as well as exclusive locks, may be available.
- 10.1.3 -- File Types
  
  ... may or may not be supported in various ways.
- 10.1.4 -- File Structure
  
  The OS must support at least an executable file type and structure. Each supported file structur/type requires additional operating system code.
- 10.1.5 -- Internal File Structure
  
  The file may be considered a sequence of blocks. Logical records are packed into blocks - either by the OS or by a user application.
  
  All file systems suffer from internal fragmentation due to the packing of logical records into physical blocks.
  
  Unix systems define a file simply as a stream of bytes.
10.2 Access Methods

The OS may provide more than one way to access files.
- 10.2.1 Sequential Access
  
  In sequential access, the file (current) pointer is positioned at the beginning of the file when it is opened. After each read (if the file is open for reading, the pointer is advanced to the next file block. After each write (if the file is open for writing), the pointer is advanced in the same way, past the material just written, to the new end-of-file. It may be possible to "skip" forward or back n blocks (but not without reading the blocks in between.) This is based on the TAPE-MODEL of a file.
- 10.2.2 Direct Access
  
  This allows the file to be treated more like an ARRAY of blocks (though having DIRECT ACCESS does not mean the same thing as having RANDOM ACCESS -- disk hardware does not support true random access -- that's why it's NOT called "RAM" |-))
  
  The user is allowed to access blocks by number in any order whatever, and it is not necessary to read any of the blocks located BETWEEN two blocks that are accessed one after the other.
  
  For direct access, the desired block number must be an input parameter of the read and write operations, or perhaps the same effect can be achieved by implementing an operation (called POSITION or "SEEK") that moves the file pointer to a desired block, after which a simple read or write can be done.
  
  The user's view of a direct access file is usually a sequence of blocks numbered from 0 to some max. The OS usually translates these RELATIVE BLOCK NUMBERS into absolute block numbers that correspond to where the block is really physically located on the disk. This kind of translation differs little from the manner in which virtual primary memory addresses are tranlated into physical addresses.
  
  It is easy to simulate sequential access on a direct access file, but extremely inefficient to simulate direct access on a sequential file.
- 10.2.3 Other Access Methods
  
  Index Blocks into files can provide other forms of access. For example, a large ordered file of alphanumeric strings could be indexed in such a way that the index block would contain one entry for each alphanumeric character, pointing to the first block that contains a string starting with that character (i.e. A pointer to the a's, the b's, and so on.) This trick is commonly used to provide faster searches for key fields in file records (less I/O required.)
10.3 Directory Structure
- 10.3.1 Storage Structure
  
  One device may have many partitions. A partition can contain a file system, swap space, or unformatted (raw) disk space for the use of special applications.
  
  It's also possible to combine partitions into volumes.
- 10.3.2 Directory Overview
  
  There has to be some sort of device directory or volume table of contents for each filesystem. The device directory ("directory" for short) is basically a tables which, for each file, contain such information as name, location, size, type, and so forth.
  
  The directory must support certain operations. It must be possible to search for a file's entry in the directory; create a file & give it an entry in the directory; delete a file and remove it from the directory; list the files in a directory; rename a file and make the appropriate changes in the directory; and traverse the filesystem using directory information.
- 10.3.3 Single-Level Directory
  
  Simplest kind to implement -- inconvenient for users with large numbers of files. Hard to keep file names unique, hard to organize groups of files.
- 10.3.4 Two-Level Directory
  
  Give each user his own directory.
  
  The user's directory is the default. All files referenced are assumed to be in the user's directory unless specifically stated otherwise. System may allow access to other users files through the use of PATH NAMES. Since some files used must be kept in the master file directory (so they can be shared), and since it is sometimes not practical to expect users to give complete path names, the need arises to have a SEARCH PATH (look in this list of places for files mentioned.) Such a system is not much better than a single-level directory . The user has to put all his files in one place, devise unique names and work too hard to keep related files organized.
- 10.3.5 Tree-Structured Directories
  
  Like the Unix file system.
  
  One creates directories to organize files. Directories are special files. There is a concept of a current working directory. An environment variable keeps track of it. There are absolute and relative pathnames. Care must be taken in the implementation of deletion of a directory.
- 10.3.6 Acyclic-Graph Directories
  
  Allowing files to be contained in more than one directory facilitates file sharing. This introduces a version of the ALIASING PROBLEM -- files may have many pathnames -- procedures that require traversal are complicated by need to avoid processing nodes more than once. Deletion has to be managed carefully to avoid dangling references.
  
  A "link" is used to put a file in a second directory. Links can be "hard" or "soft" - each has it's advantages and disadvantages.
  
  Implementation of deletion is a challenge. If we store in the directory the number of remaining hard links to each file, then hard links are no longer an impediment to correct operation of deletion, but soft links can be left dangling.
- 10.3.7 General Graph Directory
  
  Cycles are possible: e.g. directories A and B can contain each other. Incorreclty implemented traversals can result in infinite loops. "Unreachable" files and directories that should be deleted may have non-zero reference counts. "Garbage collection" procedures may be used to find such objects.
10.4 File System Mounting
10.5 File Sharing
10.6 Protection
10.7 Summary