Personal Blog of Mohamed F. Ahmed

March 3, 2010

Posted by ismailsobhi under Uncategorized
[11] Comments

ATI Radeon

Nowadays, there are a lot of graphics cards in the market. In the past it did not matter which card or GPU to get because the applications were not demanding. On the other hand, currently the cards are more advanced and are capable of providing a better graphical experience. The cards evolved from just being able to provide the poor colors 2D graphics of the past to the high detail 3D graphics of the present.

ATI is producing a series of cards called Radeon. This modern series followed a series called Rage. The Canadian company was able to establish itself in the market by either providing some of the best known graphics cards. The ATI GPUs can be found in gaming consoles such as Wii and Xbox. They also can be found in laptops.

Therefore, it became to be necessary to know how to choose a graphics card. The factors that a graphics card should be judged or selected are:

The use: simple like web surfing and document writing or more advanced like 3D animation and gaming.

The price: Does the price of the card suitable for the user budget? The price being high does not mean the card is good.

Core Clock: the core clock is the speed at which the graphics processor on the card operates.

Stream Processors: the stream processors are responsible for rendering. A big number of stream processors must exist on the graphics card. The stream processors can be called other names as shader cores or thread processors.

FLOPS: The number of floating point operation per second. There are a single (32-bit) precision operations and double precision (64-bit) operations.

Memory :

Memory Latency: the delay until the processor can access the memory. The problem that in many cases the processor is faster than the memory.
Bus width: The number of bits required to access the memory.
Memory Clock: The speed at which the card can access memory.
Type: DDR, DDR2, GDDR3, GDDR4, and GDDR5.
Memory bandwidth: The speed at which the card can access memory. The size of the memory bus multiplied by the speed memory core clock.

Power Consumption.

ATI Radeon’s Evergreen series:

The Canadian company created these series in 2009.

Products	Code name	Examples
HD 5400	Cedar	Radeon HD 5450
HD 5500, HD 5600	Redwood	Radeon HD 5670
HD 5700	Juniper	Radeon HD 5770
HD 5800	Cypress	Radeon HD 5870
HD 5900	Hemlock	Radeon HD 5970

The architectures of the cards are related to each other and this can be viewed in the next figure.

The Evergreen series architectures

Therefore speaking about any of the components in any of the architecture won’t differ. The Hemlock architecture is highly relevant to the Cypress. It can be said that the Hemlock contain two cypress.

The architecture that will be addressed in the rest of the document is the Cypress architecture.

The architecture of the Cypress

The Cypress consists of:

Command processor: issues commands and give it to graphics engine to translate into simpler forms.

Graphics engine: this engine is responsible for converting the polygons and meshes to a simpler form of data which is pixels.

Ultra threaded dispatch processor:Maximize utilization and efficiency by dividing the workload on the processing engines. For example if the engine is only capable of dealing with 4×4 pixel blocks and the frame size is about 16×16.

Then the frame will be divided on 16*16/4*4=16 engine.

SIMD engines

SIMD ENGINES:

The SIMD stand for “Single instruction, multiple data”, it is applied here because many components can process multiple data using the same operation.

Each SIMD engine contains 16 stream units and 4 texturing units. Each stream unit consists of five 32-bit stream processors. There exist 20 SIMD engines in the Cypress.

So To get the total number of stream processors, multiply 16*5*20=1600 stream processors. And the number of texturing units =4*20=80.

Caches:

For each SIMD engine there exist a 8 KB L1 cache. The total size of L1 cache is 8*20=160 Kb. Each of the 8 KB caches stores unique data for each SIMD engine.

L1 cache bandwidth is 1 TB/sec while bandwidth between L1 and L2 is 435 GB/sec.

Stream Core:

Each Stream core has five processors in total; four of them capable of providing single precision computing while only one is used for special functions such sin ,cos , tan , and Exponential.

The stream processors are 32-bit , they can perform up 2.7 teraflops but for 64-bit operations, but the number of teraflops drops to 544 gigaflops.

The stream core

Memory:

256-bit bus width.
1 GB GDDR5
Memory clock speed: 2400 MHz
153.6GB/sec Bandwidth

Price: The price of HD 5870 is 410 $.

So the cost of one Teraflop for one precision operations=410/2.7=151.8$ and for double precision operations is 410/0.544=753.6$.As expected the double precision operation is expensive.

Power Consumption:

ATI Radeon aimed to reduce power consumption. The Cypress consumes idle power of 27 Watts. While at the worst case a maximum power of 188 Watts. The maximum power occurs when the user pushes the card to its highest performance by using over clocking. This doesn’t occur for users who just use the card for simple purposes.

The power consumed to provide one teraflops. The worst case is at maximum power.

	Single precision	Double precision
Maximum power	188/2.7=69.6 Watts	188/0.544=345.5 Watts

CrossFireX technology:

The CrossFireX enables the user to put from one to Four ATI Radeon cards on the same motherboard. The Evergreen series cards support the CrossFireX.

CrossFireX

The technology must be also supported in the motherboard. The next figure illustrates the combination of cards and which motherboard can provide this technology.

Compatibility chart for the combination of the cards

The technology operates using one of the following modes:

Scissors: If there is one frame to be processed by two cards, if the cards are the same, there will be no problem. Each card will process a half of the screen. But what happens if the two cards are different?

The portions of the frame will be divided according to the capabilities of the card. The faster card will render a larger portion than the slower card. This will make the cards finish at the same time.

SuperTiling: If the frame is for example 4×4 pixels. The frame then is divided into tiles like the chessboard. One card will render frames 1,3,5 until 15 and the other will render 2,4,6 until 16.

Alternate Frame Rendering: When a card is processing the present frame, another card is processing the next (future) frame.

Super AA: AA stand for anti aliasing. It provides ant aliasing to increase image quality.

The CrossFireX technology is highly similar to nVidia SLI. In the cards produced by both companies, there is a high similarity in components and architecture which makes it easy to compare and decide which card to purchase.

The next series after the Evergreen is called northern islands and will be available in the market in the end 2010 or 2011. The major difference that the northern island has 32 nm fabrication processes while the Evergreen has 40 nm.

The northern island will exceed Evergreen in performance but with the CrossFireX introduced and using more than one Evergreen card together. This will help in keeping up the performance and not upgrading to a new card but not for long.

References:

Cedar:

http://www.rage3d.com/reviews/video/ati_hd5450/index.php?p=2

Redwood:

http://www.rage3d.com/reviews/video/ati_hd5570/index.php?p=2

Junpier:

http://www.pcper.com/article.php?aid=795

Cypress:

Hemlock:

http://www.rage3d.com/reviews/video/ati_hd5970/index.php?p=2

CrossFireX:

11 Responses to “”

ahmedlabib Says:

March 3, 2010 at 10:32 pm
Hey Ismail, I have a couple of questions regarding the ATI Radeon you talked about.

1- In the picture you provided of the Hemlock, there is two Cypress cores connected to each other through an intermediate chip. What is that chip ? What are its functions ? or In general how are the two Cypress core connected to each other ?

2- When using more than one ATI Radeon card in CrossFire, is it possible to output to more than 1 monitor for a multi-monitor installation ?

Reply
ahmedattia Says:

March 4, 2010 at 10:59 am
Ismail,
I want to ask you something because i have a conflict now “when more than one card are in CrossFire (as you said its possible) which card is responsible for the output from the cards inserted ?”

Thanks.

Reply
Mustapha Abdallah Says:

March 22, 2010 at 10:45 am
Dear Ismail,

Regarding the high/low card memory trade-off associated with choosing the appropriate ATI Radeon card.

Is there a possibility that the ATI card share with the CPU with the RAM, as this might result in 2 benefits; 1) RAM memory is cheaper from ATI’s memory in a relative way as far as I know, 2) ATI’s memory will be dynamically expanding/contracting giving the chance to buy a bit cheaper ATI with less memory.

So, what is the impact of using the RAM’s memory on both the latency and throughput compared to having enough memory on the ATI.

Thanks in Advance,
Best Regards,
Mustapha Abdallah (7-4478)

Reply
ismailsobhi Says:

April 5, 2010 at 7:25 am
Dear All,

-As for Ahmed Labib’s questions:

The intermediate chip.As you know, in the HD 5970 there exist two HD 5870 GPUs
which are connected by a second generation PLX PCI-E switch(PLX is the company). It is as if they are connected using CrossfireX.

And for connecting to multiple monitors using the CrossFireX, There is a technology developed by ATI Radeon called “Eyefinity” which helps to connect up to 6 monitors to the same card. At the beginning this wasn’t supported by the CrossfireX but now it is fixed.

-As for Ahmed Atya’s questions:

Imagine this, there is as input (frame) which I want to process. Then there is the processing done by one of the previously mentioned modes. As for the output there exist FPGA (Field Programmable Grid Array) chip that actually combines the output of the two or more cards together and then it is transferred to the cable so it will be outputted on the screen.

-As for Mustapha Abdallah questions

•GDDR5 based on normal memory DDR3 but in the graphics memory there exist different specifications than DDR3
.
•Imagine this, the CPU and GPU share memory. If the GPU needs a lot of memory, the CPU performance will decrease and vice versa. They will compete on the memory. The performance of one them will be decreased and so the overall performance will be affected.

•As a user, compare a low end cards such as Intel media accelerator (shared memory) with cards the high end dedicated such as ATI and Nivida .

•The bandwidth will decrease, to communicate with the Ram you will need loner routes compared to connecting to the built in memory on the card.

Reply
ismailsobhi Says:

April 5, 2010 at 7:29 am
The question was asked by yehia mowiena on the other blog.
Dear Ismail,

I have some inquiries

– What are the effects of overclocking on performance and life time of GPUs? Does it have any effective drawbacks on the efficiency of the GPU later?.

– What is the effect of shared & dedicated memories on memory latency, Does the higher costs of implementing a dedicated memory give us an advantage over shared memories ?.

– In the “CrossfireX” technology section under “Scissors” technique, you stated that applying this technique using similar cards shall cause no problem. But, what if one of the GPUs had cache misses more than the other one, can’t that affect the performance specially if they are working on the same frame?

I will update you with any inquiries if exist as soon as possible

Thanks,

Yahya Mowiena

The answer was:

-For overclocking. If not taking care,The lifespan of the GPU may be reduced by making it function at higher operating frequency, increased voltages and heat. So a novice users must consult someone when it comes to use overclocking

-A shared VGA card is built in. It shares memory with the rest of the system. So instead of having only graphics being normally processed alone on the memory, there also exist a lot of other data to be processed. So memory latency will be larger.

-For the situation you mentioned in the third question.Well it is hardly to happen, but if it does and the performance really decreases.you can change the mode to another mode from the four i mentioned in the same section to get a better performance. Change it for example to AFR.So one card will be responsible for all of the current frame and the other will be responsible for the next frame.

Reply
Mohamed Zakaria Khalil Says:

May 1, 2010 at 3:29 pm
Dear Ismail

In CrossfireX if the cards are different, and if they are using the SuperTiling technique, if one if them finishes first, is it going to have a larger portion next frame ? i.e is the division dynamic ? specially that one of them might get over heated which leads to performance degradation, or is it static according to a predefined evaluation scheme ?

Reply
alaaosama Says:

May 6, 2010 at 12:37 am
Dear Ismail

Regarding motherboards, what motherboards support crossfireX as the fig is not clear? also i want to ask about how does Super AA works?

Thanks

Reply
ismailsobhi Says:

May 15, 2010 at 10:26 am
Hey Alaa,

The motherboards that support crossfireX can be found in this site

As For SuperAA. To explain. You have to start from the beginning. We have an image. The smallest thing in the image as we know is the pixel. Each pixel is a sample of an original image, more samples typically provide a more accurate representation of this original image.

From what i understand. SuperAA doubles the number of samples taken. Doubling the sampling rate permits you to increase the pixels of the image. Again, more pixel better quality.

Anti-aliasing is performed in each GPU and then the results are added together.

Reply
ismailsobhi Says:

May 15, 2010 at 11:50 am
Hello Zakria,

See this link, animation for the modes

http://ati.amd.com/technology/crossfire/demos/en/rendering.html

As for your question, An example image is split into 32×32 pixel squares, half of the squares(odd) are rendered by one graphics card, another half(even) is rendered by the other GPU. ATI claims that this loads the balance by diving the frame into two equal parts on the 2 cards is efficient and there will be no performance degradation nor overheating.

Reply
amasisbrauch Says:

May 15, 2010 at 4:11 pm
Hello Ismail,
let me again congratulate you to the completion of your presentation, it was very nice. Being an ATI user myself it was of interest to me to learn more about the technologies available in todays ATI graphics cards. I just have a couple of question which are relatively general:

What is in your opinion the main advantage/disadvantage of ATI graphics cards over those designed by NVIDIA;
in terms of power consumption, performance and perhaps cost?

Are there any compatibility issues known specifically to ATI graphics cards (which may not exist for NVIDIA cards)?

Best regards,
Amasis

Reply
ismailsobhi Says:

May 21, 2010 at 4:49 pm
Hello Amasis,

Thanks you, I have became a supporter of the ATI after the presentation.

Well in term power consumption and performance. Let’s relate those two terms together. A card with less power consumption that provides the same performance as another card is better. Even there exist cards that have more performance than other cards, but has less power consumption. For example the HD 5670 consumes less power than the GT240 but performs more.

http://www.behardware.com/art/imprimer/784/

However some cards have higher power consumption and higher performance. I am ok with even though the performance is high.

It is clear that ATI dominates when relating power consumption to performance. And this conclusion is not derived by looking on one card only. Also, ATI is cheaper than NVIDIA.

For the compatibility, I think that Both sometimes you might encounter problems with linux. But They fix it regularly.

Ismail

Ismail

Reply