Lecture 11

Review of Barriers

   all processes must arrive before any leave
   applications to come
   flags:  one per edge in the "signaling graph"
   kinds of barriers so far:
      counter -- symmetric, but reset problem and O(n)
      coordinator -- simple, but asymmetric and O(n)
      tree -- O(log n), but asymmetric and harder to program
   today:  efficient, symmetric barriers


Symmetric Barriers (Sec 3.4.3)

   basic building block for two processes:

       Worker1  <-->  Worker2   # signal each other

   shared vars:  int ar[n] = ([n] 0);
                     go[n] = ([n] 0);      

   Worker[i]:   ...
                ar[i] = 1;
                << await(ar[j]==1); >>
                ...

   Worker[j]:   ...
                ar[j] = 1;
                << await([ar[i]==1); >>
                ...

   what about reset?
   remember the flag synchronization principles:  waiter clears; don't set until clear

   add code for the ... above.  clear flag at end.  await at front to make
   sure "my" flag is clear.  [See (3.15) on p. 121 of the text.]

   performance in the best case (all arrive at the same time):
      both workers move through TOGETHER, setting, checking, and clearing flags


Butterfly Barrier -- log[[2]] n stages of 2 process barriers

   idea is to replicate work:  each worker "barriers" with log n others

   interaction diagram -- see Figure 3.15

   reuse:  use multiple flags (arrays) or better yet, use stage counters
           as shown on p. 123

   performance:  processes move through together
                 but watch out for FALSE SHARING
      caches use blocks (lines) that often contain more than one word
      arrays of flags will get packed together
      a write into one flag will invalidate an entire cache line,
         and this leads to invalidates in OTHER caches
      the solution is to use padding (blank space between flags)
         another time/space tradeoff


Dissemination Barrier

   a different way to connect the processes
   simpler to program and works for any value of n

   show connection diagram -- see Figure 3.16

   easiest to program if you set another process's flag and wait for
   your own flag

   again use incrementing stage counters to avoid reset problem


Parallel Computing (again!)

   task parallelism -- processes run on own; execute asynchronously

   data parallelism -- processes do same thing (on different parts of data)
                       execute synchronously, in lock step

      languages (synchronous semantics):  HPF, ZPL, NESL (see Chapter 12)
      machines (synchronous execution):  Illiac, CM, MasPar
         [no commercial offerings today; special purpose]


Data Parallel Algorithms (Section 3.5)

   I gave a pretty brief (and rushed) introduction to this

   covered the parallel prefix program in Section 3.5.1

   first developed the synchronous algorithm on page 131,
   then presented the asynchronous equivalent (using barriers) in Figure 3.17

   briefly mentioned list algorithms in Section 3.5.2;
   we'll see larger examples from Chapter 11 next time