6th ed. chapters 11-14

(Latest Revision: Sun Nov 24 13:34:16 PST 2002 )

Chapters Ten-Fourteen

--

Chapter Eleven: File-System Interface
Chapter Twelve: File-System Implementation
Chapter Thirteen: I/O Systems
Chapter Fourteen: Mass-Storage Structure

--

There is material here that relates to chapters 11-14. These notes were made based on earlier editions of the course text(s). I don't have notes based on edition six yet.

11.1 File Concept

A file is a NAMED collection of related information defined by its creator -- in general a SEQUENCE of bits, bytes, lines, or records. A file should also be thought of as an ABSTRACT DATA TYPE.

Files have various attributes, like type, size, length. Which, if any, of these attributes should the OS be concerned with? Unix systems have opted to treat all files simply as a sequence of 8-bit bytes, thus providing maximum flexibility and minimal OS support of file types.

In Unix, the logical file record is one byte. The file system automatically PACKS and unpacks bytes into physical disk blocks (of say 512 bytes). Such packing is common to most operating systems. Files may be viewed by us simply as a sequence of blocks (sectors). The conversion from blocks to a sequence of logical records is a simple software problem.

All file systems suffer from internal fragmentation due to the packing of logical records into physical blocks.
11.1.2 File Operations

A file is an ABSTRACT DATA TYPE. To discuss files we must consider the operations that the data type makes available, and how they may be implemented. Note: the file operations have to be implemented as SYSTEM CALLS.
- Creating a File
  
  To create a file, space must be allocated for it, and an entry for it must be made in the disk directory.
- Writing a File
  
  This operation is a simple write at the end of the file. The file name and the information to be written are input to this operation. The OS uses the name to find its entry in the disk directory. The directory needs a pointer to the next block to be written, and this pointer needs to be updated.
- Reading a File
  
  Input to this operation will include the file name and the address in memory where the next block of the file is to be copied. The directory is searched for the file's entry. The directory will need a pointer to the next block to be read, and this pointer will need to be updated.
- Rewind a File
  
  Input to this operation will include the file name. The operation can update the current pointer so it points to the first block of the file.
- Delete a File
  
  Input to this operation will include the file name. The operation should look up the file in the directory and deallocate all its blocks.
So that each operation will not have to search for the file's entry in the disk directory, the OS may support an OPEN operation, which searches the disk directory for the named file, copies its directory entry into a memory-resident table of open files, and returns a pointer to the table entry for that file. Subsequent file operations get the needed directory information from the table instead of the disk directory, by following the pointer. This reduces the number of disk accesses required to do file operations.

A file may be CLOSED -- which results in its table entry being removed.
11.2 Access Methods

Choosing the right file access method for a particular application is important on systems that support more than one method. Knowing what methods are available on a given system may be important when shopping for a platform for some specific application, or set of applications. (Imagine buying a system to support a large data base, and then finding out that all file accesses are sequential only.)
- 11.2.1 Sequential Access
  
  In sequential access, the file (current) pointer is positioned at the beginning of the file when it is opened. After each read (if the file is open for reading, the pointer is advanced to the next file block. After each write (if the file is open for writing), the pointer is advanced in the same way, past the material just written, to the new end-of-file. It may be possible to "skip" forward or back n blocks (but not without reading the blocks in between.) This is based on the TAPE-MODEL of a file.
- 11.2.2 Direct Access
  
  This allows the file to be treated more like an ARRAY of blocks (though having DIRECT ACCESS does not mean the same thing as having RANDOM ACCESS -- disk hardware does not support true random access -- that's why it's NOT called "RAM" |-))
  
  The user is allowed to access blocks by number in any order whatever, and it is not necessary to read any of the blocks located BETWEEN two blocks that are accessed successively.
  
  For direct access, the desired block number must be an input parameter of the read and write operations, or perhaps the same effect can be achieved by implementing an operation (called POSITION or "SEEK") that moves the file pointer to a desired block, after which a simple read or write can be done.
  
  The user's view of a direct access file is usually a sequence of blocks numbered from 0 to some max. The OS usually translates these RELATIVE BLOCK NUMBERS into absolute block numbers that correspond to where the block is really physically located on the disk.
  
  It is easy to simulate sequential access on a direct access file, but extremely inefficient to simulate direct access on a sequential file.
- 11.2.3 Other Access Methods
  
  Index Blocks into files can provide other forms of access. For example, a large ordered file of alphanumeric strings could be indexed in such a way that the index block would contain one entry for each alphanumeric character, pointing to the first block that contains a string starting with that character (i.e. A pointer to the a's, the b's, and so on.) This trick is commonly used to provide faster searches for key fields in file records (less I/O required.)
11.3 Directory Structure

A directory structure provides a mechanism for organizing a large file system. Many systems have two separate directory structures: the DEVICE DIRECTORY and the FILE DIRECTORIES. The DEVICE DIRECTORY is stored on each physical device and describes all files on that device.

The device directory mainly concentrates on describing THE PHYSICAL PROPERTIES of each file: where it is, how long it is, how it is allocated, and so on.

THE FILE DIRECTORIES ARE A LOGICAL ORGANIZATION OF THE FILES ON ALL DEVICES. The file directory concentrates on LOGICAL properties of each file: name, file type, owning user, accounting information, protection access codes, and so on.

Things that may be kept in a file directory entry:
1. file name
2. file type
3. location (pointer to device and address on device.)
4. size
5. current position (read or write position -- if file is open)
6. protection
7. usage count (number of processes that currently have this file open)
8. time, date, and process identification (creation, modification, last use)
Directory entries may contain from 16 to over 1000 bytes. In a system with a large number of files, the size of the directory itself may be hundreds of thousands of bytes. Thus the file directory may need to be stored on the device and brought into primary memory piecemeal, as needed.

Since the file directory is a data structure that has to be searched frequently, and has to have frequent insertions and deletions, tree implementations and hash table implementations are fairly common.

The file directory is essentially SYMBOL TABLE.

The operations that are to be performed on a directory:
- Search
  
  Find pointer to device directory for opening. Check for attempt to create duplicate filename. Find to list properties.
- Create File
  
  New entry needs to be created.
- List Directory
- Backup
- 11.3.1 Single-Level Directory
  
  Simplest kind to implement -- inconvenient for users with large numbers of files. Hard to keep file names unique, hard to organize groups of files.
- 11.3.2 Two-Level Directory
  
  Give each user his own directory. The user's directory is the default. All files referenced are assumed to be in the user's directory unless specifically stated otherwise. System may allow access to other users files through the use of PATH NAMES. Since some files used must be kept in the master file directory (so they can be shared), and since it is sometimes not practical to expect users to give complete path names, the need arises to have a SEARCH PATH (look in this list of places for files mentioned.) Such a system is not much better than a single-level directory. The user has to put all his files in one place, devise unique names and work too hard to keep related files organized.
- 11.3.3 Tree-Structured Directories
  
  Like the Unix file system.
- 11.3.4 Acyclic Graph Directories
  
  Allowing files to be contained in more than one directory facilitates file sharing. This introduces a version of the ALIASING PROBLEM -- files may have many pathnames -- procedures that require traversal are complicated by need to avoid processing nodes more than once. Deletion has to be managed carefully to avoid dangling references.
- 11.3.5 General Graph Directory
11.6 File Protection

RELIABILITY is generally assured by doing frequent backups to more permanent media.

PROTECTION is provided by controlling access. For instance, restrictions may be placed on the following kinds of accesses:
- Read
- Write
- Execute
- Append
- Delete
- Naming
  
  Protection can be based on preventing access to a file to those who cannot name it.
- Passwords
  
  Passwords can be put on files to restrict access. Large numbers of passwords then have to be remembered, or, if many files have the same password, they ALL become accessible as soon as the password is known. Protection on individual files tends to be on an all-or-nothing basis -- either you don't know the password, and you can't do anything with the file, or you DO know it and you CAN do ANYTHING with the file. Quite often, you'd just as soon give only limited capabilities to users of the file, and quite often they'd much prefer that, because they don't wish to risk doing damage to it.
- Access Lists
  
  Each user has certain rights to the file that are enforced by the OS.
- Access Groups
  
  More compact to represent the users in groups (owner, group, others).
Chapter 12: Implementation Issues

File System Levels:

5. application programs

4. Logical File System

(uses the file directory to provide level 3 with the information it needs. It translates from symbolic file names to pointers into the device directory.)

3. File Organization Module

(generates addresses for level 2 by using device directory information, and knowledge of file allocation method.)

2. Basic File System

(uses level 1 to read blocks using addressing in terms of drive, cylinder, surface, and sector.)

1. Input/Output

(device drivers and interrupt handlers to actually transfer information)

0. devices

Directory information can be cached to increase the speed of file operations, but that introduces a cache-consistency problem. (sync, sync, sync.)

Chapters Ten-Fourteen -- Chapter Eleven: File-System Interface Chapter Twelve: File-System Implementation Chapter Thirteen: I/O Systems Chapter Fourteen: Mass-Storage Structure --

Chapters Ten-Fourteen

--

Chapter Eleven: File-System Interface
Chapter Twelve: File-System Implementation
Chapter Thirteen: I/O Systems
Chapter Fourteen: Mass-Storage Structure

--