I knew that heterogeneous architectures are the main solution to reach the exascale computing by year 2020. It is not only me! Any researcher or developer trying to utilize any of the emerging many-core processors such as the GPUs and the Cell B.E knows that heterogeneous computing provides great performance improvement. Once you grasp the knowledge and skill to build accelerated parallel applications, you will be able to develop on these systems relatively fast.

The interesting thing I see happening sooner than expected are the demos that AMD and Intel are making for their fused CPU and GPU architectures.

It is also interesting to recap David Turek’s words that the Cell processor will appear in different forms or maybe in different architectures!

Right now you can build a heterogeneous system by plugging into your PCI express port a card that has either a GPU or the Cell processor with on-card memory. Most researchers were racing in the last two to three years to show how GPUs are really faster than CPUs given that all data you have is copied into the GPU’s on-card global memory. However, if large systems will be built using these architectures, the PCI express port will be a serious bottleneck. CPUs can move data four to five times faster than GPUs can move data to or from the systems main memory. GPUs will lose their niche as strong candidates to accelerate parallel applications in both HPC and end-user media related applications if this performance bottleneck is not improved.

That’s why now AMD and Intel are moving really fast to market their new processors that have both CPU and GPU cores integrated in one chip. The communication bottleneck is greatly improved. I think it will be also interesting as the ratio between the number of CPUs and GPU cores on this chip change. As the number of CPU cores increase, one program can now offload parts of its execution over some GPU cores. It can be done pretty much like the OpenMP programming model. You can annotate parts to execute in parallel. The on-chip GPUs can do this role since they are now sharing cache and have fast interconnection with the CPU cores. It will also accelerate the integration between the coarse and fine-grained parallel programming models.

Have a quick a look at AMD’s Fusion and Intel’s Many Integrated Core Projects. I think AMD’s project is more promising and interesting for research. It is completely heterogeneous architecture. However, Intel is competing with the many-core architecture that is based on their x86 design. Intel’s architecture is innovative in the way cores communicate and interconnected with each other.

A quick update, I came across this video demonstrating the new AMD’s APU: http://www.ustream.tv/recorded/7382763