The framework starts with analysis to identify the vectorisation and threading opportunities, and then looks at optimisation by making sure that the code uses sufficient precision, employs type constants, and uses optimal configuration settings. Vectorisation is introduced where possible to make the most of the single instruction multiple data (SIMD) features in the processor architecture. Profiling is carried out to check for thread synchronisation issues or inefficient memory access. Finally, the application is scaled from multicore to manycore using distributed memory rank parallelism.
For profiling, Intel vTune Amplifier provides insight into CPU performance, threading performance, bandwidth utilisation, and caching effectiveness. Intel Advisor XE helps to identify memory access patterns that can affect vectorisation, as well as loops that have dependencies that prevent vectorisation. If you use both vectorisation and threading, you could speed your application up by as much as 175 times.
The worked example for the Weather Research and Forecasting (WRF) model also uses the OpenMP API. OpenMP uses a master thread that forks a series of slave threads, which can run independently on multiple processors. In the weather model, different threads work on different aspects of weather forecasting.
Whatever industry you’re in, reading about the five step framework will give you tips you can use to optimise your code. The article is part of Intel’s Modern Code initiative, providing tools and tutorials to help you optimise the performance of your data centre applications.
For further information visit: https://software.intel.com/en-us/modern-code?06062016