Clay Breshears has written a great introductory article about the three levels of parallel programming: multithreading, distributed parallelism and vectorisation.
Multithreading, as used by CADEX, for example, is a good starting point and is ideal where your code has independent computations that can run simultaneously, and where your data fits on a node. Distributed parallelism is a good fit when your data sets are too large to fit on a single machine and the computations can be distributed using subsets of the data. Vectorisation helps with compute-intensive workloads and enables identical computations to be performed on different pieces of data at the same time. You can use vectorisation together with multithreading, too.
There’s a quick reference guide to building multithreaded applications here.
To find tools and training to help with optimising your software for modern processor architectures, visit the Intel® Modern Code programme.