Lecture 10

Review

   questions on Pthreads?  on Homework 2?

   bag of tasks -- the primes problem
      get task -- want to avoid contention for the bag
      results -- need sorted list; be careful about contention
                 there is also a subtle synchronization problem
                 (need to be sure there are no "holes" in the list of known primes)

      advice:  use arrays rather than lists
               think about load balancing; don't necessarily need just one bag
               make it work, then make it faster:  correctness then speed


Barrier Synchronization (Section 3.4)

   a BARRIER is a point that all processes must reach before any proceed
   very common in iterative parallelism

   examples:

      (1) "co inside while" style of parallelism
             while () {
               co ... oc   # "oc" is essentially a barrier, but an expensive one
             }

      (2)  initialize
           ...         # barrier often needed to ensure initialization done
           compute

      (3)  data parallel algorithms -- Section 3.6; we'll look at this next time

      (4) scientific computing -- Chapter 11; we'll look at this in Lecture 12


Counter Barrier -- for n processes (Section 3.4.1)

   int count = 0;

   Barrier:  << count++ >>             # record arrival
             << await (count==n); >>   # wait for everyone to arrive

   implementation:  increment -- use FA or critical section
                    delay loop -- use spin loop

   problems:
      (1) contention -- single shared counter
      (2) cannot be reused, but barriers are usually used inside loops
            why is this a problem?  how do we reset count?

   solving the reuse problem

      try counting up then counting down (called reverse sense)

         odd barriers:   << count++ >>
                         << await(count==n) >>

         even barriers:  << count-- >>
                         << await(count==0) >>

         this still doesn't work.  why?

      use TWO counters AND reverse their senses

         up1, up2, down1, down2, repeat

         why does this work?  key is to have at least 3 stages


Coordinator Barrier -- using flags (Section 3.4.2)

   idea:  distribute the single counter above  (a time/space tradeoff)

   diagram of interaction

                           Coordinator
                   
                Worker1                   WorkerN

          arrows from Workers to Coordinator and from Coordinator to Workers
          each arrow represents a signal from one process to another
          represent each signal by a flag variable

   shared variables:  int arrive[1:n] = ([n] 0);
                          go[1:n] = ([n] 0);

   the basic signaling scheme is then implemented as follows:

      Worker[i]:  arrive[i] = 1;        # announce arrival
                  << await(go[i]==1);   # wait for permission to go
                  ...                   # leave space for later (see below)


      Coordinator:
         # wait for all workers to arrive
         for[i = 1 to n] {
            << await(arrive[i]==1); >>  
            ...                         # leave space for later (see below)
         }
         # tell all workers they can go on
         for [i = 1 to n]
            go[i] = 1;

   what about the reset problem?
      solve by clearing flags at the ... points above
      be sure to follow the Flag Synchronization Principles (3.14).  why?

   why 2n flags?  would n+1 be enough?  [no]
      can we make do with n flags [yes; reverse sense; avoids reset]

   what about contention?  [not a problem; separate flags; spin on cached values]

   what about total time in best case (all workers arrive at once)
      time is O(n) because of the loops in the coordinator
      we also need a separate coordinator process (although one of
         the workers could serve as the coordinator)


Combining Tree Barriers (end of Section 3.4.2)

   briefly sketched the idea and the structure (see Figure 3.13)

   signaling is more complex that for a coordinator, but time is O(log n)

   [note:  a tree is often used in distributed programs, especially if
    the communications network lets messages be sent in parallel along
    different paths.]