(Latest Revision: Sun May 14, 2023)
[2023/05/14: fleshed out the notes here and there]
[2020/05/08: fleshed out notes, partly with archives]
[2019/06/07: inserted figures and captions]
[2019/05/04: starting first full version for 10th edition]
Delve into the details of file systems and their implementation
Explore booting and file sharing
Describe remote file systems, using NFS as an example
15.1 File Systems
As we have seen, a computing system can have multiple
secondary storage device, each of which
may be sliced up into multiple partitions.
A volume containing a file system can be within one partition,
or can span multiple partitions and/or devices.
There is a great variety of types of file systems, for example,
these special Solaris file systems
tmpfs: a temporary file system in primary memory
objfs: an interface to the kernel for debuggers
ctfs: a virtual file system for managing processes
lofs: a loop back file system to access one file system
in place of another
procfs: a virtual file system with information on all current
processes
However, our text considers only general-purpose file systems, such as these
two Solaris file systems
ufs
zfs
Figure 15.1: A typical storage device organization
15.2 File-System Mounting
Typically users want a unified logical file system that contains all
file-like objects. Mounting is a term that refers to the
incorporation of one file system (sub)structure into another.
In the unix operating system, there is one unified file system
name space, in the form of a rooted acyclic graph. There is a
unix mount command for integrating a substructure
into a file system.
Parameters to the mount command include the name of the device
containing the file system to be mounted, and the path to the directory
which is to be the mount point. The OS verifies that the device contains
a valid file system. (See the next section for more details.)
It is common to mount one file system over an empty directory
of another. However unix file semantics allow any directory
to be used as a mount point.
The figure shows the effect of mounting a file system containing
directories "sue" and "jane" over directory "users" of another
file system. According to unix file semantics, all contents
of "users" would be obscured while the other file system is
mounted over it.
Figure 15.3: File system. (a) Existing system. (b) Unmounted volume.Figure 15.4: Volume mounted at /users.
With other operating systems, such as MacOS and Windows, mounting is
done in similar ways, although procedures and semantics vary. For
example, when the Mac OS detects an attached device, it automatically
mounts its file system(s).
15.3 Partitions and Mounting
We call a partition raw if it doesn't contain a file system,
and cooked if it does contain a file system. Things like swap space
or a database may reside on a raw partition.
A bootable partition contains a bootstrap loader which is a series of
blocks that is loaded into memory at boot time and executed.
The loader finds the kernel, usually on secondary memory, loads it, and
executes it.
Some bootstrap loaders offer a choice of which operating system to boot.
The root partition, which contains the OS, is mounted at boot time.
Other volumes may be mounted automatically at a later stage in the
boot process.
More volumes may be booted automatically and/or manually after the
system is completely up.
As part of the mount process, the OS verifies that the device
contains a valid file system by asking the device driver to read
the device directory and verify it has a correct format. If there
is a problem, the OS may run a consistency checker.
The OS writes information about the mounted system in an internal
mount table.
On unix systems, there is a flag in the in-memory copy of the
inode of a directory to
indicate that a file system is mounted there. The inode points
to the mount table, which points to the volume control block
of the mounted file system (for example a superblock).
The OS is thus able "to traverse its directory structure,
switching seamlessly among file systems of varying types."
15.4 File Sharing
There are potential advantages to allowing multiple processes
and/or users to share files. However there can be challenging
implementation issues such as how to handle concurrent writes to
shared files.
15.4.1 Multiple Users
When there are multiple users on a system, the OS has to
do something to prevent the users from doing harmful things
to each others files, or misusing file contents. This is
the problem of access control and protection (covered in
section 13.4).
Oftentimes the OS keeps track of the owner
and group of each file and directory.
The OS allows the owner of the file to control access
to the file. A group is assigned to the file in order
to designate a set of users who are allowed to share
(usually limited) access to the file.
There has to be a
data structure where the attributes
of each file are stored, such as owner, group, file
permissions, time and date of creation, size, and so forth.
When a process attempts to access a file, directory, or
other file system object, the OS can check for a match
between the ID of the process and the permission
attributes of the file object to determine whether to
allow the access.
This works well for file systems on permanently-attached storage,
but it can be challenging to keep IDs matched up with external
devices and remotely-mounted file systems.
15.5 Virtual File Systems
With a virtual file system design, we can provide support
for multiple types of file systems and integrate them into a
unified directory structure.
To accomplish this, most operating systems use object-oriented techniques.
Figure 15.5 below, based on unix, illustrates
at a high level,
a file-system interface that supports
generic open(), read(),
write(), and close() calls on file
descriptors.
Below that there is a
virtual file system (VFS) layer
that examines the object to which the descriptor
refers, and then
chooses the appropriate operation (method)
for implementing the generic operation requested. The VFS
can distinquish among local file objects belonging to
different file system types, and between
local and remote file objects.
Figure 15.5: Schematic view of a virtual file system.
15.6 Remote File Systems
Systems can share files using networks. The two main
approaches are file transfer (used by ftp and the WWW)
and distributed file systems (DFS), such as the Network File
System (NFS) and the Andrew File System (OpenAFS).
15.6.1 The Client-Server Model
A DFS allows client computers to mount (via a network)
directories or file systems that reside on a
remote file server computer.
Care must be taken to assure that clients and servers
are properly authenticated lest unauthorized accesses
occur.
IP spoofing can result in clients not properly authorized.
Clients may be authenticated using an encrypted key algorithm.
With NFS, errors and security failures can result when user IDs
do not match on client and server.
15.6.2 Distributed Information Systems
System administrators use distributed information
systems such as Microsoft's
active directory and the lightweight
directory-access protocal (LDAP) of the Internet
Engineering Task Force (IETF) to create network
accounts that give each user the same user name, user id number, and
password for all hosts in, say, an office network (or
computer science laboratory network). Such systems provide
secure user authentication.
The Domain Name System (DNS) is a distributed information system
that provides host-name-to-network-address translation for the
Internet.
15.6.3 Failure Modes
The rules of how most DFSs work say that if the network
'goes down' during the execution of a file operation,
the system should wait/delay until the network begins
to function properly again. This allows appropriate
handling of common DFS failure scenarios.
NFS 3 has a stateless protocol. If the network
is interrupted or if a server crashes, service
will be restored eventually when the network or
the server comes back up. The client is
programmed to repeat its requests for file service
until it gets a response. Each request contains all the
information the server needs. Therefore, despite
network or server crash, file service continues
normally, except that there is delay. Servers assume
that client requests are legitimate.
NFS 3 is simple and resilient, but lacks security. For example,
it is possible that a server could accept a forged read or write
request. NFS 4
is a stateful version with improved security, performance,
and functionality. State information includes tracking which
files are open, which file systems are exported, and which
are remotely mounted by which clients.
15.7 Consistency Semantics
If multiple processes access a shared file concurrently,
and one of the processes makes a change to the file, when will
the change be observable by other users of the file? In general,
how do things work when there is concurrent file access?
Consistency semantics specify answers to such questions.
15.7.1 Unix Semantics
Writes to an open file are visible immediately to
other users who have this file open.
There is a mode of unix file sharing in which the file
pointer is shared while writing to the file, so when one
process writes a byte to the file, the write position
in the file moves down one byte for all the processes
that have the file open in that mode.
The file semantics support the view that all the processes share
a single copy of the file. When two or more processes
attempt to access the file concurrently, the contention
can cause delay.
15.7.2 Session Semantics
The session semantics used in the Andrew file system
(OpenAFS) are different from unix semantics.
When a user X writes to an open file, that write is not
immediately visible to other users that have the file
open.
The change made by X will not be seen by other users
until after X closes the file. Even then, only
users who open the file after X closes
it will see the change X made.
Session semantics support the view that there may be
multiple temporary copies of the file in existence
concurrently, and (therefore) concurrent reads and/or
writes can proceed immediately without contention or delay.
15.7.3 Immutable-Shared-Files Semantics
Under immutable shared file semantics, a file
cannot be changed at all once it is set up to be
shared by two or more processes. (It's name
cannot be reused either.)
This is a simple way to implement shared read-only files.
15.8 NFS
Here NFS 3 is described as an example of a distributed file system.
(General descriptions refer to the NFS 3 standard. Some of the
more detailed description refers to the Solaris NFS 3
implementation.)
15.8.1 Overview
An NFS server can export any of its file systems or directories
to a client host. The client is able to mount the exported file
system or directory over any local directory (mount point).
After the client system performs the mount operation,
users on the client can access the remote files
transparently (without needing to make any reference to
the network, the client, or the server) with normal file
accesses that utilize the pathname of the mount point.
If a collection of client machines mounts home directories from a
server via NFS, each user on each of the machines can access
their home directory by logging in to any of the client machines.
Figure 15.6 illustrates three independent file systems. Figure 15.7(a)
illustrates the effect of an NFS mount of the "shared" directory
of S1 onto the "local" directory of U. (Note that U:/usr/local
retains its name. In other words, it does not become U:/usr/shared.)
If afterwards, we mount the "dir2" directory of S2 over
U:/usr/local/dir1, then figure 15.7(b) illustrates the
resulting view the users have of filesystem U.
Figure 15.6: Three independent file systemsFigure 15.7: Mounting in NFS. (a) Mounts. (b) Cascading mounts.
In order to support a heterogeneous environment of different
machines in a network,
the implementation of NFS uses remote procedure call (RPC)
primitives built on top of an external data representation (XDR)
protocol.
15.8.2 The Mount Protocol
There is a separate protocol for mounting a remote
directory.
A client sends a mount request to a server, naming the remote
directory desired. A server maintains an export list specifying
which filesystems it is willing to export to which clients,
and in what access modes.
When a mount request is authorized, the server returns a file
handle to the client to be used for subsequent accesses. For
a unix system, the file handle would consist of a file-system
identifier and inode number, thus identifying the exact mounted
directory within the exported file system.
15.8.3 The NFS Protocol
The NFS protocol provides RPCs for searching for a file within
a directory, reading directory entries, manipulating links and
directories, accessing file attributes, and reading & writing files.
The RPCs can only be invoked using a file handle returned by
a successful mount operation.
NFS (NFS3) is essentially a stateless protocol.
The server does not keep track of what the client is doing
with the file - nothing like an open file table. Files
are not opened or closed by the NFS protocol.
Each client request has to identify the file and
the byte offset being accessed.
An advantage of the statelessness of NFS is that
it is robust across server crashes. The client need only
repeat a request to a server that has rebooted.
For protection of data and metadata, NFS is required to
flush writes synchronously to the server's secondary memory.
The server must wait until a write is complete before
sending the return value of a client's write request.
The NFS protocol does not provide for the locking required for
concurrent processes to make atomic writes to NFS-mounted files.
The OS may provide a locking mechanism outside the NFS protocol.
(Solaris does this.)
NFS is integrated into the virtual file system of Solaris, as
one of the layers below a VFS interface.
Figure 15.8: Schematic view of the NFS architecture
15.8.4 Path-Name Translation
When a path name is given as an argument of an NFS operation,
each component must be checked individually. Any of them
may be a mount point. Once a mount point is crossed, the
appropriate server must be consulted, via an NFS operation,
to look up the vnode of the directory or file.
To speed things up, clients cache the vnode information for
remote directory names.
15.8.5 Remote Operations
NFS utilizes client-side caching of inode information and
file blocks.
Clients check with servers to determine whether cached contents
remain valid.
NFS does not preserve the unix semantics that call for
writes to be visible immediately to all processes
that have a file open.
The semantics of NFS are not the session semantics
of the Andrew file system either.