Lecture 9

Big Picture (review/preview)

   processes (threads) -- independent tasks

   communication (for now) -- shared variables

   synchronization -- critical sections; conditions

   multithreaded program -- threads take turns

   parallel program -- usually one thread/processor

      goal is speedup (see intro to Part 3 of text)
         T1 -- time for a SEQUENTIAL program on 1 processor
         Tp -- time for a parallel program on p processors
         speedup = T1 / Tp
            linear; sublinear; superlinear
            mention that superlinear happens, usually due to cache effects

      impediments to speedup
         inherently sequential parts
         load imbalance
         synchronization overhead:  critical sections, delays, fork/join


Review of Two Major Parallel Programming Styles (from Chapter 1)

   iterative -- matrix multiplication   C = A x B, for n x n matrices

      suppose p << n and p is a factor of n  (p = number of processors)

      assign each process a strip of n/p row of C (see p. 16)
      describe overheads
      this is an example of an EMBARRASSINGLY PARALLEL application

   recursive -- adaptive quadrature

      parallel recursive calls
      describe overheads -- these are very large in general
         can limit recursion depth, but still a challenge


Bag of Tasks Paradigm (Section 3.6)

   P workers that share a bag of tasks
   useful for independent tasks and to implement recursive parallelism
   provides load balancing pretty much automatically

   shared:  variables and locks for bag

   process Worker[w = 1 to P] {
     while (true) {
       get a task from the bag;
       do it, possibly generating new tasks and putting them in the bag;
     }
   }

   challenge:  detecting termination
     termination when bag is empty AND all tasks are done
        all tasks are done when all workers are waiting to get a new task


Examples

   (1) matrix multiplication by rows -- Figure 3.20

       shared bag:  int nextRow = 0;

       get a task:  << row = nextRow; nextRow++; >>

          (implement the atomic action using a CS solution, such as sems)

       worker code is simple -- don't need to program strips or such
          also easy to make tasks larger or smaller

   (2) adaptive quadrature -- Figure 3.21

       shared bag:  records of the form (a, b, f(a), f(b), area)

       get a task:  remove a record from the bag

       also produce new tasks (by adding them to the bag) whenever you
          would have done recursion in the parallel recursive program

       for efficiency, would want to do "pruning" when there are enough
       tasks;  for example, keep track of how many there are, and quit
       generating tasks when there are enough.  Instead, just use the
       basic recursive algorithm at this point.

       what is enough tasks:  heuristic is 2-3 times the number of workers
       this spreads out the load and should be pretty balanced


Assign Homework 2

   get started; come with questions next time
   problem 3 is on barriers, which we will cover starting next time


Pthreads POSIX threads (Section 4.6)

   overview of what it is

   available on lots of machines

   handouts for sample programs -- go over transparencies

      simple.c, shows basic structure
      pc.busy.c, producer/consumer using busy waiting
      pc.sems.c, producer/consumer using semaphores
      clock.c, shows how to do timing with pc.sems.c

      (Note:  These programs work on our Sun/Solaris machines.)