(Latest Revision:
Sun Sep 19, 2005
)
Chapters Ten-Fifteen
--
Chapter Ten: File-System Interface
Chapter Eleven: File-System Implementation
Chapter Twelve: Mass-Storage Structure
Chapter Thirteen: I/O Systems
Chapter Fourteen: Protection
Chapter Fifteen: Security
--
There is material here that relates to chapters 10-13.
The OS implements the abstract concept of a file. In this chapter, we
consider the various ways to map files onto devices. We look at a variety
of directory structures.
- 11.1 File Concept
A file is a NAMED collection of related information defined by its
creator -- in general a SEQUENCE of bits, bytes, lines, or records. A
file should also be thought of as an ABSTRACT DATA TYPE.
Files have various attributes, like type, size, length. Which, if any,
of these attributes should the OS be concerned with? Unix systems have
opted to treat all files simply as a sequence of 8-bit bytes, thus
providing maximum flexibility and minimal OS support of file types.
In Unix, the logical file record is one byte. The file system
automatically PACKS and unpacks bytes into physical disk blocks (of say
512 bytes). Such packing is common to most operating systems. Files
may be viewed by us simply as a sequence of blocks (sectors). The
conversion from blocks to a sequence of logical records is a simple
software problem.
All file systems suffer from internal fragmentation due to the packing
of logical records into physical blocks.
- 11.1.2 File Operations
A file is an ABSTRACT DATA TYPE. To discuss files we must consider
the operations that the data type makes available, and how they may be
implemented. Note: the file operations have to be implemented as
SYSTEM CALLS.
- Creating a File
To create a file, space must be allocated for it, and an entry
for it must be made in the disk directory.
- Writing a File
This operation is a simple write at the end of the file. The
file name and the information to be written are input to this
operation. The OS uses the name to find its entry in the disk
directory. The directory needs a pointer to the next block to be
written, and this pointer needs to be updated.
- Reading a File
Input to this operation will include the file name and the
address in memory where the next block of the file is to be
copied. The directory is searched for the file's entry. The
directory will need a pointer to the next block to be read, and
this pointer will need to be updated.
- Rewind a File
Input to this operation will include the file name. The
operation can update the current pointer so it points to the
first block of the file.
- Delete a File
Input to this operation will include the file name. The
operation should look up the file in the directory and deallocate
all its blocks.
So that each operation will not have to search for the file's entry in
the disk directory, the OS may support an OPEN operation, which searches
the disk directory for the named file, copies its directory entry into a
memory-resident table of open files, and returns a pointer to the table
entry for that file. Subsequent file operations get the needed
directory information from the table instead of the disk directory, by
following the pointer. This reduces the number of disk accesses
required to do file operations.
A file may be CLOSED -- which results in its table entry being
removed.
- 11.2 Access Methods
Choosing the right file access method for a particular application is
important on systems that support more than one method. Knowing what
methods are available on a given system may be important when shopping
for a platform for some specific application, or set of applications.
(Imagine buying a system to support a large data base, and then finding
out that all file accesses are sequential only.)
- 11.2.1 Sequential Access
In sequential access, the file (current) pointer is positioned at
the beginning of the file when it is opened. After each read (if
the file is open for reading, the pointer is advanced to the next
file block. After each write (if the file is open for writing),
the pointer is advanced in the same way, past the material just
written, to the new end-of-file. It may be possible to "skip"
forward or back n blocks (but not without reading the blocks in
between.) This is based on the TAPE-MODEL of a file.
- 11.2.2 Direct Access
This allows the file to be treated more like an ARRAY of blocks
(though having DIRECT ACCESS does not mean the same thing as having
RANDOM ACCESS -- disk hardware does not support true random access
-- that's why it's NOT called "RAM" |-))
The user is allowed to access blocks by number in any order
whatever, and it is not necessary to read any of the blocks located
BETWEEN two blocks that are accessed successively.
For direct access, the desired block number must be an input
parameter of the read and write operations, or perhaps the same
effect can be achieved by implementing an operation (called
POSITION or "SEEK") that moves the file pointer to a desired block,
after which a simple read or write can be done.
The user's view of a direct access file is usually a sequence of
blocks numbered from 0 to some max. The OS usually translates
these RELATIVE BLOCK NUMBERS into absolute block numbers that
correspond to where the block is really physically located on the
disk.
It is easy to simulate sequential access on a direct access file,
but extremely inefficient to simulate direct access on a sequential
file.
- 11.2.3 Other Access Methods
Index Blocks into files can provide other forms of access. For
example, a large ordered file of alphanumeric strings could be
indexed in such a way that the index block would contain one entry
for each alphanumeric character, pointing to the first block that
contains a string starting with that character (i.e. A pointer to
the a's, the b's, and so on.) This trick is commonly used to
provide faster searches for key fields in file records (less I/O
required.)
- 11.3 Directory Structure
A directory structure provides a mechanism for organizing a large file
system. Many systems have two separate directory structures: the DEVICE
DIRECTORY and the FILE DIRECTORIES. The DEVICE DIRECTORY is stored on
each physical device and describes all files on that device.
The device directory mainly concentrates on describing THE PHYSICAL
PROPERTIES of each file: where it is, how long it is, how it is
allocated, and so on.
THE FILE DIRECTORIES ARE A LOGICAL ORGANIZATION OF THE FILES ON ALL
DEVICES. The file directory concentrates on LOGICAL properties of each
file: name, file type, owning user, accounting information, protection
access codes, and so on.
Things that may be kept in a file directory entry:
- file name
- file type
- location (pointer to device and address on device.)
- size
- current position (read or write position -- if file is open)
- protection
- usage count (number of processes that currently have this file open)
- time, date, and process identification (creation, modification, last use)
Directory entries may contain from 16 to over 1000 bytes. In a system
with a large number of files, the size of the directory itself may be
hundreds of thousands of bytes. Thus the file directory may need to be
stored on the device and brought into primary memory piecemeal, as
needed.
Since the file directory is a data structure that has to be searched
frequently, and has to have frequent insertions and deletions, tree
implementations and hash table implementations are fairly common.
The file directory is essentially SYMBOL TABLE.
The operations that are to be performed on a directory:
- Search
Find pointer to device directory for opening. Check for attempt to
create duplicate filename. Find to list properties.
- Create File
New entry needs to be created.
- List Directory
- Backup
- 11.3.1 Single-Level Directory
Simplest kind to implement -- inconvenient for users with large
numbers of files. Hard to keep file names unique, hard to organize
groups of files.
- 11.3.2 Two-Level Directory
Give each user his own directory. The user's directory is the
default. All files referenced are assumed to be in the user's
directory unless specifically stated otherwise. System may allow
access to other users files through the use of PATH NAMES. Since
some files used must be kept in the master file directory (so they
can be shared), and since it is sometimes not practical to expect
users to give complete path names, the need arises to have a
SEARCH PATH (look in this list of places for files mentioned.)
Such a system is not much better than a single-level directory.
The user has to put all his files in one place, devise unique
names and work too hard to keep related files organized.
- 11.3.3 Tree-Structured Directories
Like the Unix file system.
- 11.3.4 Acyclic Graph Directories
Allowing files to be contained in more than one directory
facilitates file sharing. This introduces a version of the
ALIASING PROBLEM -- files may have many pathnames -- procedures
that require traversal are complicated by need to avoid processing
nodes more than once. Deletion has to be managed carefully to
avoid dangling references.
- 11.3.5 General Graph Directory
- 11.6 File Protection
RELIABILITY is generally assured by doing frequent backups to more
permanent media.
PROTECTION is provided by controlling access. For instance,
restrictions may be placed on the following kinds of accesses:
- Read
- Write
- Execute
- Append
- Delete
- Naming
Protection can be based on preventing access to a file to those who
cannot name it.
- Passwords
Passwords can be put on files to restrict access. Large numbers of
passwords then have to be remembered, or, if many files have the
same password, they ALL become accessible as soon as the password
is known. Protection on individual files tends to be on an
all-or-nothing basis -- either you don't know the password, and you
can't do anything with the file, or you DO know it and you CAN do
ANYTHING with the file. Quite often, you'd just as soon give only
limited capabilities to users of the file, and quite often they'd
much prefer that, because they don't wish to risk doing damage to
it.
- Access Lists
Each user has certain rights to the file that are enforced by the
OS.
- Access Groups
More compact to represent the users in groups (owner, group,
others).
- Chapter 12: Implementation Issues
File System Levels:
5. application programs
4. Logical File System
(uses the file directory to provide level 3 with the information it needs.
It translates from symbolic file names to pointers into the device
directory.)
3. File Organization Module
(generates addresses for level 2 by using device directory information, and
knowledge of file allocation method.)
2. Basic File System
(uses level 1 to read blocks using addressing in terms of drive, cylinder,
surface, and sector.)
1. Input/Output
(device drivers and interrupt handlers to actually transfer information)
0. devices
Directory information can be cached to increase the speed of file operations,
but that introduces a cache-consistency problem. (sync, sync, sync.)