| Technology Overview |
|
Page 1 of 4 Performance mattersTodays complex computer architectures and their deep memory hierarchies are a poor match for most applications. Due to the wide memory gap processors often devote more than half of their time waiting for data to arrive. This problem is expected to become worse with the introduction of multicore processors for two reasons: decreased cache area per thread and more concurrent threads contending for bandwidth. These effects are generally considered to remain the major bottlenecks for many years to come. By removing these bottlenecks you could improve the value of an application in many different ways:
Until now, however, performance analysis tools have forced developers to wade through a mass of data before they have any idea where the performance problems are. Even then, the process requires as much magic as engineering skills to identify the specific nature of the problem. And worst of all, they may then spend a vast amount of time trying to identify and fix problems without guarantees at the outset that there will be significant performance boosts to your application when you have finished. Enter: Acumem SlowSpotterAcumem SlowSpotter is the first of a new generation of performance analysis tools from Acumem. When other performance tools dump a haystack of data in your front yard, Acumem SlowSpotter will lift the haystack to point out the needles, classify them and help the programmer by explaining ways to remove the problems. Acumem SlowSpotter makes performance experts more productive, and it educates programmers in which programming techniques work well with the hardware. Acumem SlowSpotter offers a solid detailed understanding of cache performance problems to allow quick resolution, and still enables an architect to work globally to single out those modules that make others suffer in a representative execution environment. What is in your cache?It is a well-known fact that memory accesses take a disproportionately long time compared to arithmetic operations. During the time it takes to access memory, the CPU can easily finish many hundreds of other instructions. Caching techniques can often hide latency by storing recently used data in much smaller, yet much faster cache memories. However, this is only effective if the right data happens to be in the cache at the right time. The two largest enemies to achieving good cache utilization and minimizing bandwidth consumption are wasted space and lack of data reuse, also known as locality. Wasted space means that data, which is not needed by the application, occupies precious cache space. Lack of locality implies that data is not reused enough while residing in the cache. While the importance of these two concepts may be simple to understand at this level, understanding their impact on an application may be less obvious. It comes down to reasoning about memory layout,
access patterns and data sizes in relationship to hardware properties,
which is far from normal abstractions for programmers.
|
||||||





