(Latest Revision:
Sun May 06, 2018
)
Chapter Eleven -- File-System Interface -- Lecture Notes
- The file system consists of files and a directory structure
- 11.0
Objectives
- Explain the function of file systems
- Describe the interfaces to file systems
- Discuss file-system design tradeoffs, including
access methods, file sharing, file locking, and
directory structures
- Explore file-system protection
- 11.1 File Concept
- A file is a logical unit of storage -
the smallest allotment of logical secondary storage
- a named collection of
related information that is recorded on secondary storage
- a clinking, clanking, clattering collection
of caliginous junk, as it were.
- 11.1.1 File Attributes
-
Typical file attributes include name,
unique identifier, type, location, size,
protection, time, date, and user (owner)
identification.
-
An OS stores file attributes on secondary memory,
somewhere
in the file system directory structure.
The details of how the information is stored vary
from system to system.
- 11.1.2 File Operations
-
File operations are implemented with system calls
-
Typical operations are create file, write to file,
read from file, reposition file pointer, delete file,
and truncate file.
- Many systems require that files be opened before use.
The open operation places directory information about
the file into an open file table data structure in
primary memory. That way, processes will be able to
make a lot of accesses to the file without needing to
fetch directory information from the disk each time.
- In systems that allow multiple processes to have a file open
simultaneously,
it is customary to have two levels of open file tables
- a single system-wide table, and multiple
per-process open file tables. Each per-process table holds information
having to do with the particular process' use of the file,
such as the current read and write positions of the process
in the file. The entry for a file in the per-process table
contains a pointer to the entry for the file in the
system-wide table.
Process-independent things like location of the
file on disk, access dates, file size, and file open
count are contained in
the file's entry in the system-wide file table.
- File-locking operations may be available, shared
and/or exclusive locks, mandatory and/or
advisory locks.
- 11.1.3 File Types
- There are various file types, such as text, binary,
executable. Often filename extensions are used to indicate
the type of a file. Most file types are not fully supported
by the operating system.
-
It is common for the OS to treat most files simply as an
unstructured sequence of bytes.
- 11.1.4 File Structure
-
Each OS must fully support at least one executable file type,
so that the OS can load and execute programs.
- 11.1.5 Internal File Structure
-
All basic I/O functions are performed block by block -
physical file blocks - normally the 512-byte data sections
of disk sectors.
-
Files are sequences of logical records that must be mapped
to file blocks. Applications may do the mapping, or the
OS may do it.
- All files are allocated in whole numbers of physical blocks,
and therefore
all file systems suffer from internal fragmentation.
- 11.2
Access Methods
- 11.2.1 Sequential Access
- Sequential access is the simplest and most
common file access method
-
The file is read as a sequence from start to finish -
as a tape would be read.
- Writing is similar, each new item is appended to the end
- Typical operations are "read next" and "write next"
- A pointer to the current location is maintained
- Repositioning of the pointer (seek) may be supported
- 11.2.2 Direct Access
-
Direct access allows the user to access blocks in arbitrary order
- to treat the file as if it
were an array of file blocks.
- Typical operations are of the form read block #n or
write block #m
- Block numbers are usually logical addresses that run
from 0 contiguously to some upper limit.
- Assume that the first byte in the file is numbered 0,
that the file is a sequence of records of size L bytes,
that the records are numbered starting at 0,
and that we want to request the Nth record in the file.
In that case we compute the starting byte number N*L
and fetch the L bytes of the file starting with byte N*L.
- 11.2.3 Other Access Methods
- Other access methods can be built on direct access, often
using an index to look up file block numbers and
then using direct access.
- 11.3 Directory and Disk Structure
-
An entity containing a file system may be referred to as a
volume. A volume may be thought of as a
virtual disk.
- A disk can contain one volume, or
be partitioned and contain multiple volumes.
- Some partitions may contain swap space, or raw disk space.
- A single volume may also be created as some sort of union
of multiple disks and/or partitions.
- A volume needs to have certain data structures for use in
implementing its file system, notably
a volume needs a device directory
(aka the directory),
which contains such things as names, locations, sizes, and types
of all files on the volume.
- 11.3.1 Storage Structure
- There can be many file systems of differing types on a
computing system.
- Solaris has ufs and zfs as general-purpose file systems
- Solaris special purpose file systems include tmpfs,
objfs, ctfs, lofs, and procfs
- 11.3.2 Directory Overview
-
The directory is basically a table for looking up
information about files, using the name of the file
as the lookup key.
- The directory needs to support certain operations:
search for file, create file, delete file, list directory,
rename file, and traverse file system.
- 11.3.3
Single-Level Directory
- In this scheme the directory works like
a single list of entries.
Even if there are multiple users, no two
files are allowed to have the same name.
- 11.3.4
Two-Level Directory
- In a two-level directory structure,
there is a
master file directory that has multiple sub-directories.
- Each user on a computing system can be assigned his or her
own 'home' directory. Users can name files whatever
they like without fear of collisions with the filenames
of other users.
- If a user name and file name within the user's directory are
specified, this "pathname" uniquely determines which file it is.
- The OS assumes that a filename without a user name refers to the
user's own directory, or to a special directory that contains
the system files (e.g. programs that are user shell
commands)
- The sequence of directories searched when a file is named is
called a search path.
- 11.3.5
Tree-Structured Directories
-
The tree-structured directory is a generalization of the
two-level directory structure that allows users to create
their own tree of subdirectories, and to use this structure
to group and organize their files.
- The tree has a root, and every file or directory has a unique
pathname that starts with the root.
- Processes can typically "move around" in the tree, by
using a system call to specify which directory is their
current working directory.
- The accounting file (e.g. passwd file) of a user
typically designates which directory should initially
be made the current working directory when the user
logs in.
- Pathnames can be absolute or relative.
- 11.3.6
Acyclic-Graph Directories
-
This structure allows directories to share a file
or subdirectory. By definition, this is not possible
in a tree.
- Shared files and subdirectories may be implemented
through the use of (symbolic) links
[aka soft links]. A symbolic link
may be thought of as a file that contains a path name.
The directory entry of the symbolic link has
a special bit value set that marks the file as
a link rather than an ordinary file. For example,
if /x/y is a file that we wish to share, we can
put a symbolic link in /z containing the pathname
"/x/y" and name the symbolic link (file) r.
Then all references to /z/r
will access the same file as /x/y.
- Going on with the previous example, the original
directory entry for /x/y is sometimes referred to
as a hard link. It is just an ordinary
directory entry, typically consisting of the name
of the file (y, in this case) and the address on disk of the
directory information for the file.
- Another way to implement the sharing of the file
would be to use another hard link - an entry in the
/z directory that contains the other name
(r, in this case), plus the same address
on disk of the directory information
for the original file (known as /x/y). This gives us
two separate directory entries that point to the
same file on disk.
- 11.3.7
General Graph Directory
- 11.4
File-System Mounting
- Many operating systems provide a way to unify a collection
of volumes into a single logical file system. Mounting
is a term that refers to
the incorporation of
one file system (sub)structure into another. In the unix
operating system, there is one unified file system in the form of
a rooted acyclic graph. Substructures are integrated into the
file system using the unix mount command.
Implementation of
mount involves insertion
of link(s) into the directory structure. An external
filesystem is often mounted 'over' a link to an empty subdirectory.
Links to mounted file systems are specially marked,
so they won't be mistaken for something else, such as a link
to an ordinary subdirectory, or a link to a file.
- 11.5
File Sharing
- There are potential advantages to allowing multiple processes
and/or users to share files. However there can be challenging
implementation issues such as how to handle concurrent writes to
shared files.
- 11.5.1 Multiple Users
- When there are multiple users on a system, the OS has to
do something to prevent the users from doing harmful things
to each others files, or misusing file contents. This is
the problem of access control and protection.
- Oftentimes the OS keeps track of the owner
and group of each file and directory.
The OS allows the owner of the file to control access
to the file. A group is assigned to the file in order
to designate a set of users who are allowed to share
(usually limited) access to the file.
-
There has to be a data structure where the
attributes of each file are stored,
such as owner, group, file
permissions, time and date of creation, size, and so forth.
-
When a process attempts to access a file or directory,
the OS can check the attributes to determine whether to
allow the access.
- 11.5.2
Remote File Systems
-
A network of computers can share files.
The two main approaches are file transfer (used by ftp and the WWW)
and distributed file systems (DFS), such as the Network File
System (NFS).
- 11.5.2.1
The Client-Server Model
- On a shared network, a distributed file system (DFS)
server maintains file system(s) that remote clients can mount.
- Designers of a DFS must take care to assure that clients and
servers authenticate each other properly, so that unauthorized
accesses don't happen.
- 11.5.2.2
Distributed Information Systems
- Distributed Information Systems such as Microsoft's
active directory and the lightweight
directory-access protocal (LDAP) of the Internet
Engineering Task Force (IETF) are used
to create network
accounts that have uniform user names, user id numbers, and
passwords for all hosts in, say, an office network (or
computer science laboratory network). Such systems provide
secure user authentication.
- 11.5.2.3
Failure Modes
- The rules of how most DFSs work say that if the network
'goes down' during the execution of a file operation,
the system should wait/delay until the network begins
to function properly again.
-
NFS has a stateless protocol. If the network
is interrupted or if a server crashes, service
will be restored eventually when the network or
the server comes back up. The client is
programmed to repeat its requests for file service
until it gets a response. Each request contains all the
information the server needs. Therefore, despite
network or server crash, file service continues
normally, except that there is delay.
- 11.5.3
Consistency Semantics
- 11.5.3.1
Unix Semantics
-
Writes to an open file are visible immediately to
other users who have this file open.
- There is a mode of unix file sharing in which the file
pointer is shared while writing to the file, so when one
process writes a byte to the file, the write position
in the file moves down one byte for all the processes
that have the file open in that mode.
- The file semantics are that all the processes share
a single copy of the file. When two or more processes
attempt to access the file concurrently, the
contention can cause delay.
- 11.5.3.2
Session Semantics
- The session semantics used in the Andrew file system
(OpenAFS) are different from unix semantics.
- When a user X writes to an open file, that write is not
immediately visible to other users that have the file
open.
-
The change made by X will not be seen by other users
until after X closes the file. Even then, only
users who open the file after X closes
it will see the change X made.
- Session semantics support the view that there may be
multiple copies of the file in existence concurrently,
and concurrent reads and/or writes can proceed
immediately without contention or delay.
- 11.5.3.3
Immutable Shared File Semantics
- Immutable means unchangeable.
Under immutable shared file semantics, after a file
has been set up to be shared by two or more processes,
changes of any kind to the file are not allowed.
- It is easy to implement immutable shared files, but obviously
that limits access to read-only.
- 11.6 Protection
- Administrators should
protect the file system by doing backups on a regular schedule,
and by keeping the physical location of the
computer secure.
- 11.6.1 Types of Access
-
The types of file access that may be controlled include
read access, write access, execute privilege, append rights,
delete rights, the right to list (the name and attributes)
of a file.
- 11.6.2
Access Control
- One approach to controlling access to files would be to keep
an access control list (ACL) with each file or directory.
The ACL is a table data structure keyed on user id. Given the
id of any user, one can look up the specific access rights
the user has on the file or directory object. It's difficult
to implement such full ACLs.
- Many systems are designed with a condensed form of ACLs
in which access rights are stored just for three entities:
the owner of the file (or directory), the group, and everybody
else.
- Some systems, like Solaris, use the (owner, group, others)
approach by default, but also allow more detailed access
controls to be added to specific file objects. (The term file
object is meant to include files, directories, and other
special kinds of items that might exist in the file system.)
- Read, write, and execute bits are commonly associated with
each of the three classes: owner (user), group, and other.
- For a plain file, a set read bit means a member of the class
has permission to read the file. Similarly the write bit gives
permission to write, and the execute bit gives permission
to execute the file (presumably the file is a program,
script, or the like).
- For directories, the read bit gives permission
to list the directory. The write bit allows creation
of new files in the directory and, if it is empty,
deletion of the directory. The execute bit gives permission
to cd to the directory, and to access the file objects in the
directory, subject to the permissions on the objects
themselves.
- Users who lack either write or execute permission
on a directory cannot delete a file object in that
directory.
- 11.6.3
Other Protection Approaches
- One approach is to assign a password with a file or
directory.