pad yes .. the other options do not specifially deal with it as they are controlling access to a single variable, however they are overcoming the problem
when p0 writes to a memory that is part of the cache line, that memory has to be sent to shared cache and other cores do not get access to their cache memory until the other cores memory is put out on the shared cache and can be read in
sync solves problem as the programmer explicitly tells compiler that acess to this memory must be controlled, in original it is not told and can do nothing about what happens, which is why what needs to happen ends up killing the performance
reduction is going to be implemented just like the synch, it is just a convenient feature that openmp provides the programmer with to write an operation that is very common.
you can do all this in c++, what is avail in c is avail in c++
have a look at the following:
https://www.youtube.com/watch?v=h58X-PaEGng