(Latest Revision: 
Sep 21, 2014
)  
Chapter Four -- Threads -- Lecture Notes
-  4.0 Objectives
     
     -  Understand the notion of a thread
     
 -  Discuss various API's for threads
     
 -  Explore implicit threading
     
 -  Examine multithreaded programming
     
 -  Cover operating system support for threads
     
 
 -  4.1 Overview
     
     -  What was the old-fashioned concept of a process?  The sequence of
	  executing instructions - the thread of execution - was one aspect of
	  a process.  However, a process also had a great many associated
	  resources: program counter, registers, run-time stack, primary
	  memory allocation, data section, text (code) section, open files,
	  allocated devices, and so forth.
          
     
 -  As the description above implies, processes were "heavyweight" and
	  as demand evolved for more and more processes in modern systems,
	  designers began looking for ways to be more economical in the manner
	  in which resources were used by resources.
     
 -  One early innovation was to organize processes so that more than one
          process could share the same program text.  There was also
	  experimentation with having processes share the same data section of
	  primary memory.
     
 -  These developments led designers to consider threads --
          "lightweight processes" which would share about as much process
	  context as possible.  
     
     
 -  Threads thus are a kind of process that help save on resource
          utilization.  If we have a need or desire to use, say 10, processes
	  to solve a problem, we don't necessarily have to replicate 10 copies
	  of each "process building block" - many of them may be
	  sharable by all 10 of the processes.
     
 -  Each thread needs the exclusive use of some resources: e.g.
	  ID, program counter, register set and runtime stack.
     
 -  In client-server applications, it is often more efficient for 
          the server to create a new (lightweight) thread to provide 
          a service to a client, rather than to fork a whole new copy
          of the server process to do that.  For example RPC servers may
          be multithreaded.
     
 -  Most operating system kernels are multithreaded.
     
     
 -  4.1.2 Benefits of threads:
          
	  -   Responsiveness:  work is divided among threads and some
	       threads can continue working while others are blocked (e.g.
	       waiting for I/O to complete) [ Note this is a type of 
               concurrent processing that applies to a uniprocessor ]
	  
 -   Resource Sharing:  e.g. sharing of code and data -
	       better utilization of primary memory.
	  
 -   Economy: It typically takes much less time to create 
               and context-switch threads compared with a heavy-weight process.
	  
 -   Scalability:  On a multiprocessor, multiple threads 
               can do work on a problem in parallel.
	  
 
     
      -  #1 and #4 are achievable using heavy-weight processes, but at a
          higher cost in terms of time and resources.
     
 
 -  4.2 Multicore Programming
     
     -  Multicores are multiple CPUs on a single chip.  They are 
          very common now.  Thus modern computers have the potential
          for true parallel processing, not merely the concurrency 
          that can be implemented on uni-processors.
     
 -  4.2.1 It's a challenge to write mutlti-threaded programs.  
          There are the questions of dividing the work, load balancing, 
          division of data, data dependency, testing, and debugging.
     
     
 -  4.2.2 Parallelism can be achieved through a combination of
          assigning different threads to operate on different parts of
          the data, and/or to perform different tasks on the same 
          parts of the data.
     
 
 -  4.3 Multithreading Models
     
     
     -  There are kernel level threads and user level threads.  Kernel
          level threads are supported directly by the kernel - each is
	  scheduled by the kernel.  Each is represented by a "thread control
	  block" Each kernel level thread can wait in the run queue, or a
	  device queue, an event queue, and so forth.  Each is treated as a
	  separate entity by the kernel.  Thus for example one kernel thread
	  can wait for I/O while another uses the CPU.
     
     
 -  User level threads are not represented individually at the kernel
          level. A package of library functions implements - you might say
	  'simulates' - the threads outside the kernel.  For example the
	  effect of the library might be to multiplex several threads using
	  just one kernel thread to support them.
     
     
 -  There has to be a correspondence between user and kernel level
          threads. The mapping can be many-to-one, one-to-one, or
	  many-to-many.
     
 -  4.3.1. In the many-to-one model, many user-level threads are 
          supported by a single kernel-level thread.  Contxt switching 
          is extremely fast among these user-level threads, and the 
          model supports programmers that want to organize software as a group 
          of concurrent threads.  However true parallelism is not possible 
          with this model, and all user threads are blocked if any one 
          of them makes a blocking system call.
     
 -  4.3.2 Use of the one-to-one model allows threads to block
          independently and operate in parallel on multiprocessors.  
          However the creation of large numbers of threads may tax system
          resources, there is no assurance that the OS will schedule
          threads to operate in parallel optimally, and context switches 
          are slower generally.
     
 -  4.3.3 The many-to-many model allows creating a large 
          multiplicity of user-level threads that may switch context
          with great rapidity.  The application can have greater control 
          over the scheduling of the user-level threads.  Much of the 
          advantage of the one-to-one model remains: parallelism and 
          independent blocking.
     
 
 -  4.4 Thread Libraries
     
     -  Posix thread (pthread) implementation varies from system to system -
	  could be user-level or kernel-level.
     
 -  Win32 threads are kernel-level 
     
 -  The Java thread API is typically implemented using a native thread
	  package on the host system (e.g. Pthreads or Win32).
     
 -  In asynchronous threading, the parent creates 
          one or more child threads and then executes concurrently with them.
     
 -  In synchronous threading, the parent creates one or more
          child threads and waits for all the child threads to exit
	  before resuming execution.
     
 -  Students: study the example "Multithreaded C program using the
	  Pthreads API" carefully because the style of pthreads programming we
	  will do later is similar.
     
     
 -  Section 4.4 contains three examples in which a parent thread 
          create a child thread to execute a function.  The parent 
          blocks until the child has exited, and then the parent 
          resumes execution. In class exercises, we will work with program
          that have multiple threads active simultaneously, including the 
          parent.
     
     
 
 -  4.5 Implicit Threading 
     
     -  Implicit threading is a methodology for coping with some of the
          difficulties of programming multithreaded applications through the
	  use of such tools as compilers and run-time libraries.
     
 -  4.5.1 Thread Pools: Generally it is faster to use an existing
          thread to service a request, rather than create one and destroy it
	  after it performs the service.  Using a pool of threads also
	  builds in a limit on the number of threads a server can utilize
	  - protecting the system from too much thread proliferation.  The
	  server creates a number of threads at the time of process start up
	  and assigns threads from the pool to service clients. 
     
 -  4.5.2 OpenMP: A programmer can insert labels in the code that
          identify  certain sections that should be executed by parallel
	  threads. The compiler responds to the labels by generating code
	  that creates threads that execute those sections of code
	  in parallel.
	  
     
 -  4.5.3 Grand Central Dispatch is comprised of extensions to C, an
          API, and a run-time library.  Like OpenMP, it provides
	  parallel processing, although details of the implementation differ.
     
 -  4.5.4 Other Approaches include Threading Building Blocks (Intel),
          products from Microsoft, and the jave.util.concurrent package.
     
 
 -  4.6 Threading Issues
     
     
     -  4.6.1 The fork() and exec() system calls
          When an application is multi-threaded, should the fork() 
          system call duplicate all threads, or just the calling thread?  
          Some API's provide both options. 
     
      -  Implementations of exec() typically overwrite the entire process
          of the calling thread.  Therefore, if the child created by a 
          fork() is going to call exec() immediately, there's no point 
          in having the fork() duplicate all the threads in the process.
     
     
 -  4.6.2 Signal Handling 
          Signals are a simple form of interprocess communication in some
          operating systems, primarily versions of unix.  Signals
          behave something like interrupts but they are not 
          interrupts.
      -  The OS delivers and handles signals.  Delivering signals and
          handling (responding to) signals are routine 
          tasks the OS performs as opportunities arise.  Sometimes delivery 
          of a signal to a process (or thread) is required as part of the 
          OS performance of interrupt service, or a system call.
          The OS delivers signals to a process (thread) by setting a 
          bit in a context variable of the process (thread).  Just 
          before scheduling a process (thread) to execute, the OS checks 
          to see if any signals have been delivered to the process (thread)
          that have not been handled yet.  If so, the OS will cause the 
          signal to be handled properly.  Sometimes it does this by 
          executing code in kernel mode,  and sometimes it handles a 
          signal by jumping into the user process at the start address 
          of a special handler routine the process has for responding 
          to the signal.  The exact way of handling depends on the 
          nature of the signal.
     
 -  Multithreading complicates the problem of implementing signal
          delivery.  Should a signal be delivered to all the threads in a
	  process or just some?
          
     
 -  Often the handler for a signal should run only once.  A signal sent
	  to a process may be delivered only to the first thread that is not
	  blocking it.
     
 -  The OS may provide a function to send a signal to one particular
	  thread.
     
 -  4.6.3 Thread Cancellation
          Sometimes a thread starts work but it should be cancelled 
          before it finishes - for example if two threads are searching
          a database for a record and one of them finds it, the other 
          thread should be cancelled.
          Thread cancellation can be implemented in a manner similar to how 
          signals work.  In fact it may be implemented using signals.
          Since problems could be caused by instantly 
          cancelling a thread in a task that is in the midst of doing some
          work, the implementation of cancellation typically includes ways 
          for threads to defer their cancellation so that they have time 
          to 'clean up' first - for example to deallocate resources they 
          are holding, or to finish updating shared data.  
          
      -  4.6.4 Thread Local Storage
          Typically threads have some need for thread-specific data 
          (thread local storage).  This
          is data not shared with other threads.  In Pthreads processing,
	  local variables can play this role, but local variables exist
          only within one function, so provision for thread local storage
          that is more 'global' may be needed.  Most thread APIs provide
          support for such thread local storage.
      -  4.6.5 Scheduler Activations
          This section describes some rather arcane details of how
          the relationship between user-level threads and kernel-level
          threads may be implemented.  We won't cover this.
      
 -  4.7 Operating-System Examples
     
     -  4.7.1 Windows Threads
          Applications run as separate processes, which may have multiple 
          threads.  Per-thread resources include, ID number, register set, 
          user stack and kernel stack (for the use of the OS when executing 
          in behalf of the process, for example when executing a system
          call for the process), and private storage used by library code.
     
 -  4.7.2 Linux Threads
          Linux has a traditional fork() system call that creates an
          exact duplicate of the parent.  Linux also has a clone() 
          system call with flag parameters that determine what 
          resources will be shared between parent and child.
          
     
 
 -  4.8 Summary
     
     -  "A thread is a flow of control within a process."  There
          may be many threads within a single process.  
     
 -  Possible advantages of threading include responsiveness, resource
          sharing, economy, scalability, and efficiency.
     
 -  Programmers can manipulate user-level threads that are not
          visible to the kernel.  There are many-to-one, one-to-one, and 
          many-to-many models for mapping user-level threads to kernel-level
          threads.
     
 -  POSIX Pthreads, Windows threads, and Java threads are provided
          as libraries and APIs to support threading in most modern operating
          systems. Compilers and run-time libraries exist that provide
          implicit threading - which frees programmers from explicitly 
          writing code to create and manage threads.
     
 -  "Multithreaded programs introduce many challenges for programmers"