文章基本信息

标题：Ray tracing as multi GPGPU.
作者：Asavei, Victor ; Moldoveanu, Florica ; Moldoveanu, Alin 等
期刊名称：Annals of DAAAM & Proceedings
印刷版ISSN：1726-9679
出版年度：2009
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：Compared to the traditional rendering methods, Ray Tracing provides in a natural way effects such as: realistic lighting, reflections, shadows, etc.
关键词：Graphics coprocessors;Graphics processing units;Multiprocessing;Ray tracing

Ray tracing as multi GPGPU.

Asavei, Victor ; Moldoveanu, Florica ; Moldoveanu, Alin 等

1. INTRODUCTION

Compared to the traditional rendering methods, Ray Tracing provides in a natural way effects such as: realistic lighting, reflections, shadows, etc.

However, achieving these effects comes with high computational requirements and this is the reason why Ray Tracing has not yet been widely used in real time mainstream computer graphics simulations.

The algorithm works by tracing a path using "view" rays from the eye position of the observer through each pixel of a virtual screen and computing the colour of that pixel based on what object is visible (is hit by the view ray) in that pixel.

Each ray is tested for intersection with the objects of the scene. After the nearest object that is hit is found, the algorithm will determinate the colour of the pixel by using the incident light at the point of intersection and the material properties of the object.

If the object hit is reflective or transparent, the primary ray will be recast (will "bounce") according to the properties of the material obtaining the so called "secondary" rays. This process can continue recursively until the reflected or refracted rays will not hit another object.

In the last few years, the computing hardware (CPU) and graphics hardware (Breitbart, 2008) have evolved in a spectacular manner in terms of computing power and architectural design, switching from a "single-core" to a current "many-core" architecture (Fig. 1)

In our proposed solution, for the implementation we will use NVidia GPU hardware (2x8800 GTX Ultra mounted in the same PC) and the NVidia (CUDA) toolkit to implement Ray Tracing as a GPGPU program.

We will use a 1024x768 scene with diffuse and specular lighting, reflections and shadows and we will also compare the results of our solution with a Single GPGPU implementation and CPU implementations (both single-core and quad-core).

2. CURRENT SOLUTIONS

Recently there has been an increased interest in researching Ray Tracing implementation on parallel architectures. On the CPU, Ray Tracing solutions have been researched / implemented on:

--Cell Processors

--Sun Sparcs

--Intel x86 multicore processors (Quake3 and Quake4 RayTraced)

The current GPU architecture has evolved more rapidly in terms of complexity and SIMD processing power (see Fig. 2) and this has resulted in an increased interest to implement Ray Tracing on GPU rather than on CPU.

For example the NVidia GeForce GTX280 GPU contains 240 streaming processors that have a peak rate of approximately 960 gigaflops compared to a high-end (Intel Core2Quad) CPU that has 4 cores and a peak rate of approximately 96 gigaflops (Fathalian & Houston, 2008)

The current GPU Ray Tracing solutions / implementations focus on the following topics:

-Optimizing the basic Ray Tracing algorithm by several methods : Using k-d trees structures (Foley, Sugerman, 2005), Using bounding volume hierarchies, etc

-Running Ray Tracing on the GPU, by programming the graphics pipeline; this usually requires writing shaders (vertex, geometry and fragment)

-Using hybrid approaches where parts of the algorithm are done on the graphics pipeline using shaders

In our solution we will run the Ray Tracing algorithm directly on the streaming processors of 2 GPUs as a GPGPU program thus bypassing the standard usage of the graphics pipeline with shader programs.

[FIGURE 2 OMITTED]

3. OUR SOLUTION

Our researched being focused on implementing the method directly on the streaming processors and to see how well it scales on multiple GPUs, we have decided to use a heavily computational Ray Tracing algorithm with no further optimizations and to compare the results with a single GPGPU program and also with similar CPU implementations.

For the test scene we have used:

--A resolution of 1024x768

--Diffuse and specular lighting coming from 3 sources of light

--Shadow generation

--Reflections (a maximum of 3 secondary ray bounces)

For the CPU Implementations (Single Core--1 Thread and Quad Core--4 Threads) an Intel Core2Quad Q6600 CPU has been used and the Ray Tracing algorithm has been implemented using 1 thread / 4 threads

For the GPGPU Implementations (single and multi) 2 Nvidia 8800 Ultra GPUs, mounted in the same PC and the NVidia CUDA toolkit have been used.

CUDA is a parallel programming model that has been designed to overcome the challenge of developing GPGPU applications that scale with the increase of number of cores available.

CUDA uses three important abstractions: a hierarchy of thread groups, shared memory and barrier synchronization.

This makes it possible to partition a given problem into coarse sub-problems that can be processed independently in parallel and then into finer sub-problems that can be solved cooperatively in parallel.

This partitioning allows the threads to cooperate in the solving of each sub-problem and at the same time provides a transparent scalability because each sub-problem can be scheduled to run on any of the available processors. Our solution to split the workload on the CUDA threads is:

--The initial data is made available in the global GPUs memory

--The set of rays is given by a matrix of pixels that has the size of the screen (Fig. 3)

--The matrix will be split in sub matrixes with a fixed size (8, 16 or 32)

--Each sub matrix will be processed by a block of threads and each thread will have an element from this sub matrix to process (a ray)

--For the Multi GPGPU implementation, GPU0 will process the top half of the screen and GPU1 will process the bottom half of the screen

We have tested all the implementations and the results from the table tab. 1 were obtained.

The results obtained show that Ray Tracing scales very well when running in a parallel manner and that it is suitable to be implemented not only as a GPGPU program but also as a Multi GPGPU program.

[FIGURE 3 OMITTED]

4. CONCLUSIONS AND FUTURE WORK

In the future we foresee that Ray Tracing will begin to be used more and more in real-time computer graphics applications; first using hybrid approaches and as hardware continues to evolve, eventually replacing the actual standard rendering techniques.

Also in terms of future scalability and Multi GPGPU usage we need to consider the following:

--Current workstations motherboards can support up to 4 GPUs, and this number might also increase in the future

--Current GPUs can contain up to 480 streaming processors (NVidia GTX295) and in the future this number will continue to increase

--As a conclusion current workstations can include up to nearly 2000 programmable GPU streaming processors and the next generations might probably include up to hundreds of thousands of streaming processors

As future work we should research how to optimize the memory requirements for the CUDA threads, to develop a more efficient way to access the global memory pool and minimize the synchronization overhead between GPUs.

Also we should research if it is possible to group several workstations in order to distribute the algorithm on several nodes on a fast network.

5. REFERENCES

Breitbart J. (2008). Case studies on GPU usage and data structure design. Dept. of Computer Science and Electrical Engineering, Universitat Kassel, 2008

Che S., Others (2008). A Performance Study of General Purpose Applications on Graphics Processors. Journal of Parallel and Distributed Computing, Volume 68, Issue 10, Pages 1370-1380, 2008

Fathalian K. and Houston M. (2008). A closer look at GPUs. Communications of the ACM, ACM Press, 2008

Foley T. and Sugerman J. (2005). Kd-tree acceleration structures for a GPU Raytracer. Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, ACM Press, Pages 15-22

Seiler L., Others (2008). Larrabee: A Many-Core x86 Architecture for Visual Computing. SIGGRAPH 2008

*** (2009) http://www.nvidia.com/object/cuda_develop.html CUDA Programming Guide, Accesed on:2009-03-01

*** (2009)http://www.intel.com/products/processor/core2quad/ -Intel Core2Quad Processor, Accesed on:2009-03-15

*** (2009) http://www.q3rt.de, http://www.q4rt.de--Quake3 and Quake4 RayTraced, Accesed on:2009-03-15

Tab. 1. Results consisting in frame rates (fps)

Method Min FPS Max FPS AverageFPS

CPU Single Core 1 3 2.083
1 Thread

CPU Quad Core 5 7 5.983
4 Threads

NVidia 8800Ultra 8 10 8.967
128 Processors
GPGPU using
CUDA

2xNVidia 8800Ultra 16 18 16.980
256 Processors
Multi-GPGPU
using CUDA

Fig. 1. Current t CPU and GPU architectures

Type Processor Cores/Chip ALUs/Core

GPU AMD Radeon HD 4870 10 80
 NVIDIA GeForce GTX 280 30 8

CPU Intel Core2Quad 4 8
 Cell 8 4
 Sun UltraSparc 8 1