ATI Radeon

Nowadays, there are a lot of graphics cards in the market. In the past it did not matter which card or GPU to get because the applications were not demanding. On the other hand, currently the cards are more advanced and are capable of providing a better graphical experience. The cards evolved from just being able to provide the poor colors 2D graphics of the past to the high detail 3D graphics of the present.

ATI is producing a series of cards called Radeon. This modern series followed a series called Rage. The Canadian company was able to establish itself in the market by either providing some of the best known graphics cards. The ATI GPUs can be found in gaming consoles such as Wii and Xbox. They also can be found in laptops.

Therefore, it became to be necessary to know how to choose a graphics card. The factors that a graphics card should be judged or selected are:

  • The use: simple like web surfing and document writing or more advanced like 3D animation and gaming.
  • The price: Does the price of the card suitable for the user budget? The price being high does not mean the card is good.
  • Core Clock: the core clock is the speed at which the graphics processor on the card operates.
  • Stream Processors: the stream processors are responsible for rendering. A big number of stream processors must exist on the  graphics card. The stream processors can be called other names as shader cores or thread processors.
  • FLOPS: The number of floating point operation per second. There are a single (32-bit) precision operations and double precision (64-bit) operations.
  • Memory :
  1. Memory Latency: the delay until the processor can access the memory. The problem that in many cases the processor is  faster than the memory.
  2. Bus width: The number of bits required to access the memory.
  3. Memory Clock: The speed at which the card can access memory.
  4. Type: DDR, DDR2, GDDR3, GDDR4, and GDDR5.
  5. Memory bandwidth: The speed at which the card can access memory. The size of the memory bus multiplied by the speed memory  core clock.
  • Power Consumption.

ATI Radeon’s Evergreen series:

The Canadian company created these series in 2009.

Products Code name Examples
HD 5400


Cedar Radeon HD 5450
HD 5500, HD 5600


Redwood Radeon HD 5670
HD 5700


Juniper Radeon HD 5770
HD 5800


Cypress Radeon HD 5870
HD 5900


Hemlock Radeon HD 5970

The architectures of the cards are related to each other and this can be viewed in the next figure.

Evergreen series

The Evergreen series architectures

Therefore speaking about any of the components in any of the architecture won’t differ. The Hemlock architecture is highly relevant to the Cypress. It can be said that the Hemlock contain two cypress.

The architecture that will be addressed in the rest of the document is the Cypress architecture.

Cypress architecture

The architecture of the Cypress

The Cypress consists of:

  • Command processor: issues commands and give it to graphics engine to translate into simpler forms.
  • Graphics engine: this engine is responsible for converting the polygons and meshes to a simpler form of data which is pixels.
  • Ultra threaded dispatch processor:Maximize utilization and efficiency by dividing the workload on the processing engines. For example if the engine is only capable of dealing with 4×4 pixel blocks and the frame size is about 16×16.

Then the frame will be divided on 16*16/4*4=16 engine.

SIMD engines

SIMD engines

  • SIMD ENGINES:

The SIMD stand for Single instruction, multiple data”, it is applied here because many components can process multiple data using the same operation.

Each SIMD engine contains 16 stream units and 4 texturing units. Each stream unit consists of five 32-bit stream processors. There exist 20 SIMD engines in the Cypress.

So To get the total number of stream processors, multiply 16*5*20=1600 stream processors. And the number of texturing units =4*20=80.

  • Caches:

For each SIMD engine there exist a 8 KB L1 cache. The total size of L1 cache is 8*20=160 Kb. Each of the 8 KB caches stores unique data for each SIMD engine.

L1 cache bandwidth is 1 TB/sec while bandwidth between L1 and L2 is 435 GB/sec.

  • Stream Core:

Each Stream core has five processors in total; four of them capable of providing single precision computing while only one is used for special functions such sin ,cos , tan , and Exponential.

The stream processors are 32-bit , they can perform up 2.7 teraflops but for 64-bit operations, but the number of teraflops drops to 544 gigaflops.

Stream core

The stream core

  • Memory:
  1. 256-bit bus width.
  2. 1 GB GDDR5
  3. Memory clock speed: 2400 MHz
  4. 153.6GB/sec Bandwidth
  • Price: The price of HD 5870 is 410 $.

So the cost of one Teraflop for one precision operations=410/2.7=151.8$ and for double precision operations is 410/0.544=753.6$.As expected the double precision operation is expensive.

  • Power Consumption:

ATI Radeon aimed to reduce power consumption. The Cypress consumes idle power of 27 Watts. While at the worst case a maximum power of 188 Watts. The maximum power occurs when the user pushes the card to its highest performance by using over clocking. This doesn’t occur for users who just use the card for simple purposes.

The power consumed to provide one teraflops. The worst case is at maximum power.

Single precision Double precision
Maximum power 188/2.7=69.6 Watts 188/0.544=345.5 Watts

CrossFireX technology:

The CrossFireX enables the user to put from one to Four ATI Radeon cards on the same motherboard. The Evergreen series cards support the CrossFireX.

CrossFireX

CrossFireX

The technology must be also supported in the motherboard. The next figure illustrates the combination of cards and which motherboard can provide this technology.

Compatibility chart

Compatibility chart for the combination of the cards

The technology operates using one of the following modes:

  • Scissors: If there is one frame to be processed by two cards, if the cards are the same, there will be no problem. Each card will process a half of the screen. But what happens if the two cards are different?

The portions of the frame will be divided according to the capabilities of the card. The faster card will render a larger portion than the slower card. This will make the cards finish at the same time.

  • SuperTiling: If the frame is for example 4×4 pixels. The frame then is divided into tiles like the chessboard. One card will render frames 1,3,5 until 15 and the other will render 2,4,6 until 16.
  • Alternate Frame Rendering: When a card is processing the present frame, another card is processing the next (future) frame.
  • Super AA: AA stand for anti aliasing. It provides ant aliasing to increase image quality.

The CrossFireX technology is highly similar to nVidia SLI. In the cards produced by both companies, there is a high similarity in components and architecture which makes it easy to compare and decide which card to purchase.

The next series after the Evergreen is called northern islands and will be available in the market in the end 2010 or 2011. The major difference that the northern island has 32 nm fabrication processes while the Evergreen has 40 nm.

The northern island will exceed Evergreen in performance but with the CrossFireX introduced and using more than one Evergreen card together. This will help in keeping up the performance and not upgrading to a new card but not for long.

References:

Cedar:

Redwood:

Junpier:

Cypress:

Hemlock:

CrossFireX: