10th ed. chapter 15

(Latest Revision: Sun May 14, 2023)
[2023/05/14: fleshed out the notes here and there]
[2020/05/08: fleshed out notes, partly with archives]
[2019/06/07: inserted figures and captions]
[2019/05/04: starting first full version for 10th edition]

Chapter Fifteen -- File-System Internals -- Lecture Notes

Intro
15.0 Objectives
- Delve into the details of file systems and their implementation
- Explore booting and file sharing
- Describe remote file systems, using NFS as an example
15.1 File Systems
- As we have seen, a computing system can have multiple secondary storage device, each of which may be sliced up into multiple partitions.
- A volume containing a file system can be within one partition, or can span multiple partitions and/or devices.
- There is a great variety of types of file systems, for example, these special Solaris file systems
  - tmpfs: a temporary file system in primary memory
  - objfs: an interface to the kernel for debuggers
  - ctfs: a virtual file system for managing processes
  - lofs: a loop back file system to access one file system in place of another
  - procfs: a virtual file system with information on all current processes
  However, our text considers only general-purpose file systems, such as these two Solaris file systems
  - ufs
  - zfs
Figure 15.1: A typical storage device organization
15.2 File-System Mounting
- Typically users want a unified logical file system that contains all file-like objects. Mounting is a term that refers to the incorporation of one file system (sub)structure into another.
- In the unix operating system, there is one unified file system name space, in the form of a rooted acyclic graph. There is a unix mount command for integrating a substructure into a file system.
- Parameters to the mount command include the name of the device containing the file system to be mounted, and the path to the directory which is to be the mount point. The OS verifies that the device contains a valid file system. (See the next section for more details.)
- It is common to mount one file system over an empty directory of another. However unix file semantics allow any directory to be used as a mount point.
- The figure shows the effect of mounting a file system containing directories "sue" and "jane" over directory "users" of another file system. According to unix file semantics, all contents of "users" would be obscured while the other file system is mounted over it.
  
  Figure 15.3: File system. (a) Existing system. (b) Unmounted volume.
  
  Figure 15.4: Volume mounted at /users.
- With other operating systems, such as MacOS and Windows, mounting is done in similar ways, although procedures and semantics vary. For example, when the Mac OS detects an attached device, it automatically mounts its file system(s).
15.3 Partitions and Mounting
- We call a partition raw if it doesn't contain a file system, and cooked if it does contain a file system. Things like swap space or a database may reside on a raw partition.
- A bootable partition contains a bootstrap loader which is a series of blocks that is loaded into memory at boot time and executed.
- The loader finds the kernel, usually on secondary memory, loads it, and executes it.
- Some bootstrap loaders offer a choice of which operating system to boot.
- The root partition, which contains the OS, is mounted at boot time.
- Other volumes may be mounted automatically at a later stage in the boot process.
- More volumes may be booted automatically and/or manually after the system is completely up.
- As part of the mount process, the OS verifies that the device contains a valid file system by asking the device driver to read the device directory and verify it has a correct format. If there is a problem, the OS may run a consistency checker.
- The OS writes information about the mounted system in an internal mount table.
- On unix systems, there is a flag in the in-memory copy of the inode of a directory to indicate that a file system is mounted there. The inode points to the mount table, which points to the volume control block of the mounted file system (for example a superblock).
- The OS is thus able "to traverse its directory structure, switching seamlessly among file systems of varying types."
15.4 File Sharing
- There are potential advantages to allowing multiple processes and/or users to share files. However there can be challenging implementation issues such as how to handle concurrent writes to shared files.
- 15.4.1 Multiple Users
  - When there are multiple users on a system, the OS has to do something to prevent the users from doing harmful things to each others files, or misusing file contents. This is the problem of access control and protection (covered in section 13.4).
  - Oftentimes the OS keeps track of the owner and group of each file and directory. The OS allows the owner of the file to control access to the file. A group is assigned to the file in order to designate a set of users who are allowed to share (usually limited) access to the file.
  - There has to be a data structure where the attributes of each file are stored, such as owner, group, file permissions, time and date of creation, size, and so forth.
  - When a process attempts to access a file, directory, or other file system object, the OS can check for a match between the ID of the process and the permission attributes of the file object to determine whether to allow the access.
  - This works well for file systems on permanently-attached storage, but it can be challenging to keep IDs matched up with external devices and remotely-mounted file systems.
15.5 Virtual File Systems
- With a virtual file system design, we can provide support for multiple types of file systems and integrate them into a unified directory structure.
- To accomplish this, most operating systems use object-oriented techniques.
- Figure 15.5 below, based on unix, illustrates at a high level, a file-system interface that supports generic open(), read(), write(), and close() calls on file descriptors.
- Below that there is a virtual file system (VFS) layer that examines the object to which the descriptor refers, and then chooses the appropriate operation (method) for implementing the generic operation requested. The VFS can distinquish among local file objects belonging to different file system types, and between local and remote file objects.
Figure 15.5: Schematic view of a virtual file system.
15.6 Remote File Systems
- Systems can share files using networks. The two main approaches are file transfer (used by ftp and the WWW) and distributed file systems (DFS), such as the Network File System (NFS) and the Andrew File System (OpenAFS).
- 15.6.1 The Client-Server Model
  - A DFS allows client computers to mount (via a network) directories or file systems that reside on a remote file server computer.
  - Care must be taken to assure that clients and servers are properly authenticated lest unauthorized accesses occur.
  - IP spoofing can result in clients not properly authorized. Clients may be authenticated using an encrypted key algorithm. With NFS, errors and security failures can result when user IDs do not match on client and server.
- 15.6.2 Distributed Information Systems
  - System administrators use distributed information systems such as Microsoft's active directory and the lightweight directory-access protocal (LDAP) of the Internet Engineering Task Force (IETF) to create network accounts that give each user the same user name, user id number, and password for all hosts in, say, an office network (or computer science laboratory network). Such systems provide secure user authentication.
  - The Domain Name System (DNS) is a distributed information system that provides host-name-to-network-address translation for the Internet.
- 15.6.3 Failure Modes
  - The rules of how most DFSs work say that if the network 'goes down' during the execution of a file operation, the system should wait/delay until the network begins to function properly again. This allows appropriate handling of common DFS failure scenarios.
  - NFS 3 has a stateless protocol. If the network is interrupted or if a server crashes, service will be restored eventually when the network or the server comes back up. The client is programmed to repeat its requests for file service until it gets a response. Each request contains all the information the server needs. Therefore, despite network or server crash, file service continues normally, except that there is delay. Servers assume that client requests are legitimate.
  - NFS 3 is simple and resilient, but lacks security. For example, it is possible that a server could accept a forged read or write request. NFS 4 is a stateful version with improved security, performance, and functionality. State information includes tracking which files are open, which file systems are exported, and which are remotely mounted by which clients.
15.7 Consistency Semantics
- If multiple processes access a shared file concurrently, and one of the processes makes a change to the file, when will the change be observable by other users of the file? In general, how do things work when there is concurrent file access? Consistency semantics specify answers to such questions.
- 15.7.1 Unix Semantics
  - Writes to an open file are visible immediately to other users who have this file open.
  - There is a mode of unix file sharing in which the file pointer is shared while writing to the file, so when one process writes a byte to the file, the write position in the file moves down one byte for all the processes that have the file open in that mode.
  - The file semantics support the view that all the processes share a single copy of the file. When two or more processes attempt to access the file concurrently, the contention can cause delay.
- 15.7.2 Session Semantics
  - The session semantics used in the Andrew file system (OpenAFS) are different from unix semantics.
  - When a user X writes to an open file, that write is not immediately visible to other users that have the file open.
  - The change made by X will not be seen by other users until after X closes the file. Even then, only users who open the file after X closes it will see the change X made.
  - Session semantics support the view that there may be multiple temporary copies of the file in existence concurrently, and (therefore) concurrent reads and/or writes can proceed immediately without contention or delay.
- 15.7.3 Immutable-Shared-Files Semantics
  - Under immutable shared file semantics, a file cannot be changed at all once it is set up to be shared by two or more processes. (It's name cannot be reused either.)
  - This is a simple way to implement shared read-only files.
15.8 NFS
- Here NFS 3 is described as an example of a distributed file system. (General descriptions refer to the NFS 3 standard. Some of the more detailed description refers to the Solaris NFS 3 implementation.)
- 15.8.1 Overview
  - An NFS server can export any of its file systems or directories to a client host. The client is able to mount the exported file system or directory over any local directory (mount point).
  - After the client system performs the mount operation, users on the client can access the remote files transparently (without needing to make any reference to the network, the client, or the server) with normal file accesses that utilize the pathname of the mount point.
  - If a collection of client machines mounts home directories from a server via NFS, each user on each of the machines can access their home directory by logging in to any of the client machines.
  - Figure 15.6 illustrates three independent file systems. Figure 15.7(a) illustrates the effect of an NFS mount of the "shared" directory of S1 onto the "local" directory of U. (Note that U:/usr/local retains its name. In other words, it does not become U:/usr/shared.) If afterwards, we mount the "dir2" directory of S2 over U:/usr/local/dir1, then figure 15.7(b) illustrates the resulting view the users have of filesystem U.
    
    Figure 15.6: Three independent file systems
    
    Figure 15.7: Mounting in NFS. (a) Mounts. (b) Cascading mounts.
  - In order to support a heterogeneous environment of different machines in a network, the implementation of NFS uses remote procedure call (RPC) primitives built on top of an external data representation (XDR) protocol.
- 15.8.2 The Mount Protocol
  - There is a separate protocol for mounting a remote directory.
  - A client sends a mount request to a server, naming the remote directory desired. A server maintains an export list specifying which filesystems it is willing to export to which clients, and in what access modes.
  - When a mount request is authorized, the server returns a file handle to the client to be used for subsequent accesses. For a unix system, the file handle would consist of a file-system identifier and inode number, thus identifying the exact mounted directory within the exported file system.
- 15.8.3 The NFS Protocol
  - The NFS protocol provides RPCs for searching for a file within a directory, reading directory entries, manipulating links and directories, accessing file attributes, and reading & writing files.
  - The RPCs can only be invoked using a file handle returned by a successful mount operation.
  - NFS (NFS3) is essentially a stateless protocol. The server does not keep track of what the client is doing with the file - nothing like an open file table. Files are not opened or closed by the NFS protocol. Each client request has to identify the file and the byte offset being accessed.
  - An advantage of the statelessness of NFS is that it is robust across server crashes. The client need only repeat a request to a server that has rebooted.
  - For protection of data and metadata, NFS is required to flush writes synchronously to the server's secondary memory. The server must wait until a write is complete before sending the return value of a client's write request.
  - The NFS protocol does not provide for the locking required for concurrent processes to make atomic writes to NFS-mounted files. The OS may provide a locking mechanism outside the NFS protocol. (Solaris does this.)
  - NFS is integrated into the virtual file system of Solaris, as one of the layers below a VFS interface.
  Figure 15.8: Schematic view of the NFS architecture
- 15.8.4 Path-Name Translation
  - When a path name is given as an argument of an NFS operation, each component must be checked individually. Any of them may be a mount point. Once a mount point is crossed, the appropriate server must be consulted, via an NFS operation, to look up the vnode of the directory or file.
  - To speed things up, clients cache the vnode information for remote directory names.
- 15.8.5 Remote Operations
  - NFS utilizes client-side caching of inode information and file blocks.
  - Clients check with servers to determine whether cached contents remain valid.
  - NFS does not preserve the unix semantics that call for writes to be visible immediately to all processes that have a file open.
  - The semantics of NFS are not the session semantics of the Andrew file system either.
15.9 Summary