Compute Unified Device Architecture

Processing non-graphics tasks on GPUs spurred the development of programming models. The GPU (Graphical Processing Unit) is turning into GPGPU (General Purpose Graphical Processing Unit). That means “all at the same time” technique is being applicable to the real time applications other than graphics as well. Various interfaces for high-performance, data-parallel computations exist, among others NVIDIA’s CUDA [NVI07b], AMD’sCTM [PSG06], Brook [BFH * 04] and Sh[MQP02]and their spin-offs Peak Stream and Rapid-Mind. All expose the intrinsic parallelism of GPUs to the user and provide means to perform general-purpose computations. Out of the above interfaces, Nvidia’s CUDA scores over others in various aspects like super scalability and efficiency and design simplicity.

Evolution and Need for CUDA:

Before going into the technical details of CUDA, let’s first discuss the evolution phase of the CUDA. Late 80’s and early 90’s is called “Golden Age of Parallel Processing “.That was the era where a huge interest was created in the field of parallel computing. Fine granularity of data is introduced based on which the then Super Computers were made. Connection machine, MasPar, Cray are some of the machines which were built based on the fine granularity technique. But they were all very expensive and were very hard for the masses to afford them. Then came the era of GPUs. GPUs are massively multithreaded many core chips which consist hundreds of scalar processors. Tens of Thousands of concurrent threads run parallel on the GPUs and have a peak performance of even 1 TFLOP. But this multithreading may be a boon to the users, but from the programming sense, it is quite a complex task. So making the life easy for the parallel-programmers easy is a challenging task, which Nvidia’s dealt with style by introducing CUDA in 2007.


