10:30 Asymmetries in Multi-Core Systems – T Gross (ETHZ)

Future exascale systems will be based on multi-core processors, but even

today’s multi-core processors can be asymmetric and exhibit limitations

and bottlenecks that are different from those found on a symmetric

multiprocessor.  We investigate the performance of a cluster node based

on the Intel Xeon E 5520 quad-core processor and note that despite the

symmetry implied by the programming model, the available memory

bandwidth is not shared equally among all threads. Consequently,

applications experience substantial performance variance and slow-downs

when the tasks (threads) are mapped to cores in a naive manner. An

operating system scheduler could mitigate these effects by taking into

account the memory system structure but needs accurate information from

the performance monitoring unit as the asymmetry is not directly exposed

in the processor’s instruction set manual.  Current performance

monitoring units are quite inflexible and change from one processor to

the next, so higher levels of the software tool chain are discouraged to

use them.  The development of portable performance monitoring units will

be crucial if applications want to use the performance potential of

exascale systems.