K 10
svn:author
V 6
marius
K 8
svn:date
V 27
2012-01-28T23:24:03.018638Z
K 7
svn:log
V 722
MFC: r225889, r228222

In total store which we use for running the kernel and all of the userland
atomic operations behave as if they were followed by a CPU memory barrier
so there's no need to include ones in the acquire variants of atomic(9) and
it's sufficient to just use include compiler memory barriers to satisfy
the requirements of atomic(9). Removing the CPU memory barriers results in
a small performance improvement, specifically this is sufficient to
compensate the performance loss seen in the worldstone benchmark seen when
using SCHED_ULE instead of SCHED_4BSD.
This change is inspired by Linux even more radically doing the equivalent
thing some time ago.
Thanks go to Peter Jeremy for additional testing.

END