(Latest Revision:
Wednesday, December 03, 2008
)
Chapter Ten -- File-System Interface -- Lecture Notes
There is material here that relates to chapters 10-13.
The operating system is responsible for implementing files and providing a
directory structure.
- 10.0 Objectives
- Explain the function of filesystems
- Describe the interfaces to filesystems
- Discuss file-system design tradeoffs, including access methods
file sharing, file locking, and directory structures.
- Explore file-system protection
- 10.1 File Concept
- A file is a NAMED collection of related information defined by its
creator -- in general a SEQUENCE of bits, bytes, lines, or records.
A file should also be thought of as an ABSTRACT DATA TYPE.
- 10.1.1 -- File Attributes
Files have various attributes, like name, unique id, type, location,
size, owner, protection, creation date & time, and modification date
& time.
Typically there is some sort of directory structure on disk
which contains the name and unique id of each file. The rest of the
file attributes typically can be found in some way by using the
unique id as a key.
10.1.2 -- File Operations
A
file is an ABSTRACT DATA TYPE.
To discuss files we must consider the operations that the data type
makes available, and how they may be implemented. Note:
the file operations have to be implemented as SYSTEM CALLS.
-
Creating a File
To create a file, space must be allocated for it, and an entry
for it must be made in the disk directory.
-
Writing a File
This operation is a simple write at the end of the file. The
file name (or equivalent) and the information to be written are
input to this operation. The OS uses the name to find its
entry in the disk directory. The directory needs a pointer to
the next block to be written, and this pointer needs to be
updated.
-
Reading a File
Input to this operation will include the file name and the
address in memory where the next block of the file is to be
copied. The directory is searched for the file's entry. The
directory will need a pointer to the next block to be read, and
this pointer will have to be updated.
-
Rewind a File
Input to this operation will include the file name. The
operation can update the current pointer so it points to the
first block of the file.
-
Delete a File
Input to this operation will include the file name. The
operation should look up the file in the directory and
deallocate all its blocks.
So that each operation will not have to search for the file's entry in
the disk directory,
the OS may support an OPEN operation, which searches the disk
directory for the named file, copies its directory entry into a
memory-resident table of open files, and returns a pointer to the
table entry for that file. Instead of searching the directory,
subsequent file operations follow the pointer and get the needed
directory information from the table. This saves much time by
reducing the number of disk accesses required to do file operations.
A file may be CLOSED -- which results in its table entry being
removed.
Because more than one process may open a file concurrently,
typically there are two levels of "open-file table" - a global table
and per-process table. The system-wide global table would contain
process-independent information such as file size, location on disk,
open count (the number of processes that have the file open), and
latest modification time. An entry for a file in a per-process
table would contain the file pointer value, information regarding
the access rights of the process, and a pointer to the entry for the
file in the global file table.
Operating systems typically support advisory or mandatory
file-locking. Shared locks, as well as exclusive locks, may be
available.
- 10.1.3 -- File Types
... may or may not be supported in various ways.
- 10.1.4 -- File Structure
The OS must support at least an executable file type and structure.
Each supported file structur/type requires additional operating
system code.
- 10.1.5 -- Internal File Structure
The file may be considered a sequence of blocks. Logical records
are packed into blocks - either by the OS or by a user application.
All file systems suffer from internal fragmentation due to the
packing of logical records into physical blocks.
Unix systems define a file simply as a stream of bytes.
- 10.2 Access Methods
The OS may provide more than one way to access files.
- 10.2.1 Sequential Access
In sequential access, the file (current) pointer is positioned at
the beginning of the file when it is opened.
After each read (if the file is open for reading, the pointer is
advanced to the next file block.
After each write (if the file is open for writing), the pointer is
advanced in the same way, past the material just written, to the new
end-of-file.
It
may be possible to "skip" forward or back n blocks (but not without
reading the blocks in between.) This is based on the TAPE-MODEL of
a file.
- 10.2.2 Direct Access
This allows the file to be treated more like an ARRAY of blocks
(though having DIRECT ACCESS does not mean the same thing as having
RANDOM ACCESS -- disk hardware does not support true random access
-- that's why it's NOT called "RAM" |-))
The user is allowed to access blocks by number in any order
whatever, and
it
is not necessary to read any of the blocks located BETWEEN
two blocks that are accessed one after the other.
For direct access, the desired block number must be an input
parameter of the read and write operations, or perhaps the same
effect can be achieved by implementing an operation (called
POSITION or "SEEK") that moves the file pointer to a desired block,
after which a simple read or write can be done.
The user's view of a direct access file is usually a sequence of
blocks numbered from 0 to some max.
The OS usually translates these RELATIVE BLOCK NUMBERS into absolute
block numbers that correspond to where the block is really
physically located on the disk.
This kind of translation differs little from the manner in which
virtual primary memory addresses are tranlated into physical
addresses.
It is easy to simulate sequential access on a direct access file,
but extremely inefficient to simulate direct access on a sequential
file.
- 10.2.3 Other Access Methods
Index Blocks into files can provide other forms of access. For
example, a large ordered file of alphanumeric strings could be
indexed in such a way that the index block would contain one entry
for each alphanumeric character, pointing to the first block that
contains a string starting with that character (i.e. A pointer to
the a's, the b's, and so on.) This trick is commonly used to
provide faster searches for key fields in file records (less I/O
required.)
- 10.3 Directory Structure
- 10.3.1 Storage Structure
One device may have many partitions. A partition can contain a file
system, swap space, or unformatted (raw) disk space for the use of
special applications.
It's also possible to combine partitions into volumes.
- 10.3.2 Directory Overview
There has to be some sort of device directory or volume table of
contents for each filesystem. The device directory ("directory" for
short) is basically a tables which, for each file, contain such
information as name, location, size, type, and so forth.
The directory must support certain operations. It must be possible
to search for a file's entry in the directory; create a file & give
it an entry in the directory; delete a file and remove it from the
directory; list the files in a directory; rename a file and make the
appropriate changes in the directory; and traverse the filesystem
using directory information.
- 10.3.3
Single-Level Directory
Simplest kind to implement -- inconvenient for users with large
numbers of files. Hard to keep file names unique, hard to organize
groups of files.
- 10.3.4
Two-Level Directory
Give each user his own directory.
The user's directory is the default. All files referenced are
assumed to be in the user's directory unless specifically stated
otherwise. System may allow access to other users files through
the use of
PATH NAMES.
Since some files used must be kept in the
master file directory
(so they can be shared), and since it is sometimes not practical to
expect users to give complete path names,
the need arises to have a SEARCH PATH
(look in this list of places for files mentioned.) Such a system
is
not much better than a single-level directory .
The user has to put all his files in one place, devise unique names
and work too hard to keep related files organized.
- 10.3.5
Tree-Structured Directories
Like the Unix file system.
One creates directories to organize files. Directories are special
files. There is a concept of a current working directory. An
environment variable keeps track of it. There are absolute and
relative pathnames. Care must be taken in the implementation of
deletion of a directory.
- 10.3.6
Acyclic-Graph Directories
Allowing files to be contained in more than one directory
facilitates file sharing. This introduces a version of the ALIASING
PROBLEM -- files may have many pathnames -- procedures that require
traversal are complicated by need to avoid processing nodes more
than once. Deletion has to be managed carefully to avoid dangling
references.
A "link" is used to put a file in a second directory. Links can be
"hard" or "soft" - each has it's advantages and disadvantages.
Implementation of deletion is a challenge. If we store in the
directory the number of remaining hard links to each file, then hard
links are no longer an impediment to correct operation of deletion,
but soft links can be left dangling.
- 10.3.7 General
Graph Directory
Cycles are possible: e.g. directories A and B can contain each
other. Incorreclty implemented traversals can result in infinite
loops. "Unreachable" files and directories that should be deleted
may have non-zero reference counts. "Garbage collection" procedures
may be used to find such objects.
- 10.4 File System Mounting
- 10.5 File Sharing
- 10.6 Protection
- 10.7 Summary