Ray tracing as multi GPGPU.
Asavei, Victor ; Moldoveanu, Florica ; Moldoveanu, Alin 等
1. INTRODUCTION
Compared to the traditional rendering methods, Ray Tracing provides
in a natural way effects such as: realistic lighting, reflections,
shadows, etc.
However, achieving these effects comes with high computational
requirements and this is the reason why Ray Tracing has not yet been
widely used in real time mainstream computer graphics simulations.
The algorithm works by tracing a path using "view" rays
from the eye position of the observer through each pixel of a virtual
screen and computing the colour of that pixel based on what object is
visible (is hit by the view ray) in that pixel.
Each ray is tested for intersection with the objects of the scene.
After the nearest object that is hit is found, the algorithm will
determinate the colour of the pixel by using the incident light at the
point of intersection and the material properties of the object.
If the object hit is reflective or transparent, the primary ray
will be recast (will "bounce") according to the properties of
the material obtaining the so called "secondary" rays. This
process can continue recursively until the reflected or refracted rays
will not hit another object.
In the last few years, the computing hardware (CPU) and graphics
hardware (Breitbart, 2008) have evolved in a spectacular manner in terms
of computing power and architectural design, switching from a
"single-core" to a current "many-core" architecture
(Fig. 1)
In our proposed solution, for the implementation we will use NVidia
GPU hardware (2x8800 GTX Ultra mounted in the same PC) and the NVidia
(CUDA) toolkit to implement Ray Tracing as a GPGPU program.
We will use a 1024x768 scene with diffuse and specular lighting,
reflections and shadows and we will also compare the results of our
solution with a Single GPGPU implementation and CPU implementations
(both single-core and quad-core).
2. CURRENT SOLUTIONS
Recently there has been an increased interest in researching Ray
Tracing implementation on parallel architectures. On the CPU, Ray
Tracing solutions have been researched / implemented on:
--Cell Processors
--Sun Sparcs
--Intel x86 multicore processors (Quake3 and Quake4 RayTraced)
The current GPU architecture has evolved more rapidly in terms of
complexity and SIMD processing power (see Fig. 2) and this has resulted
in an increased interest to implement Ray Tracing on GPU rather than on
CPU.
For example the NVidia GeForce GTX280 GPU contains 240 streaming
processors that have a peak rate of approximately 960 gigaflops compared
to a high-end (Intel Core2Quad) CPU that has 4 cores and a peak rate of
approximately 96 gigaflops (Fathalian & Houston, 2008)
The current GPU Ray Tracing solutions / implementations focus on
the following topics:
-Optimizing the basic Ray Tracing algorithm by several methods :
Using k-d trees structures (Foley, Sugerman, 2005), Using bounding
volume hierarchies, etc
-Running Ray Tracing on the GPU, by programming the graphics
pipeline; this usually requires writing shaders (vertex, geometry and
fragment)
-Using hybrid approaches where parts of the algorithm are done on
the graphics pipeline using shaders
In our solution we will run the Ray Tracing algorithm directly on
the streaming processors of 2 GPUs as a GPGPU program thus bypassing the
standard usage of the graphics pipeline with shader programs.
[FIGURE 2 OMITTED]
3. OUR SOLUTION
Our researched being focused on implementing the method directly on
the streaming processors and to see how well it scales on multiple GPUs,
we have decided to use a heavily computational Ray Tracing algorithm
with no further optimizations and to compare the results with a single
GPGPU program and also with similar CPU implementations.
For the test scene we have used:
--A resolution of 1024x768
--Diffuse and specular lighting coming from 3 sources of light
--Shadow generation
--Reflections (a maximum of 3 secondary ray bounces)
For the CPU Implementations (Single Core--1 Thread and Quad Core--4
Threads) an Intel Core2Quad Q6600 CPU has been used and the Ray Tracing
algorithm has been implemented using 1 thread / 4 threads
For the GPGPU Implementations (single and multi) 2 Nvidia 8800
Ultra GPUs, mounted in the same PC and the NVidia CUDA toolkit have been
used.
CUDA is a parallel programming model that has been designed to
overcome the challenge of developing GPGPU applications that scale with
the increase of number of cores available.
CUDA uses three important abstractions: a hierarchy of thread
groups, shared memory and barrier synchronization.
This makes it possible to partition a given problem into coarse
sub-problems that can be processed independently in parallel and then
into finer sub-problems that can be solved cooperatively in parallel.
This partitioning allows the threads to cooperate in the solving of
each sub-problem and at the same time provides a transparent scalability
because each sub-problem can be scheduled to run on any of the available
processors. Our solution to split the workload on the CUDA threads is:
--The initial data is made available in the global GPUs memory
--The set of rays is given by a matrix of pixels that has the size
of the screen (Fig. 3)
--The matrix will be split in sub matrixes with a fixed size (8, 16
or 32)
--Each sub matrix will be processed by a block of threads and each
thread will have an element from this sub matrix to process (a ray)
--For the Multi GPGPU implementation, GPU0 will process the top
half of the screen and GPU1 will process the bottom half of the screen
We have tested all the implementations and the results from the
table tab. 1 were obtained.
The results obtained show that Ray Tracing scales very well when
running in a parallel manner and that it is suitable to be implemented
not only as a GPGPU program but also as a Multi GPGPU program.
[FIGURE 3 OMITTED]
4. CONCLUSIONS AND FUTURE WORK
In the future we foresee that Ray Tracing will begin to be used
more and more in real-time computer graphics applications; first using
hybrid approaches and as hardware continues to evolve, eventually
replacing the actual standard rendering techniques.
Also in terms of future scalability and Multi GPGPU usage we need
to consider the following:
--Current workstations motherboards can support up to 4 GPUs, and
this number might also increase in the future
--Current GPUs can contain up to 480 streaming processors (NVidia
GTX295) and in the future this number will continue to increase
--As a conclusion current workstations can include up to nearly
2000 programmable GPU streaming processors and the next generations
might probably include up to hundreds of thousands of streaming
processors
As future work we should research how to optimize the memory
requirements for the CUDA threads, to develop a more efficient way to
access the global memory pool and minimize the synchronization overhead
between GPUs.
Also we should research if it is possible to group several
workstations in order to distribute the algorithm on several nodes on a
fast network.
5. REFERENCES
Breitbart J. (2008). Case studies on GPU usage and data structure
design. Dept. of Computer Science and Electrical Engineering,
Universitat Kassel, 2008
Che S., Others (2008). A Performance Study of General Purpose
Applications on Graphics Processors. Journal of Parallel and Distributed
Computing, Volume 68, Issue 10, Pages 1370-1380, 2008
Fathalian K. and Houston M. (2008). A closer look at GPUs.
Communications of the ACM, ACM Press, 2008
Foley T. and Sugerman J. (2005). Kd-tree acceleration structures
for a GPU Raytracer. Proceedings of the ACM SIGGRAPH/EUROGRAPHICS
conference on Graphics hardware, ACM Press, Pages 15-22
Seiler L., Others (2008). Larrabee: A Many-Core x86 Architecture
for Visual Computing. SIGGRAPH 2008
*** (2009) http://www.nvidia.com/object/cuda_develop.html CUDA
Programming Guide, Accesed on:2009-03-01
*** (2009)http://www.intel.com/products/processor/core2quad/ -Intel
Core2Quad Processor, Accesed on:2009-03-15
*** (2009) http://www.q3rt.de, http://www.q4rt.de--Quake3 and
Quake4 RayTraced, Accesed on:2009-03-15
Tab. 1. Results consisting in frame rates (fps)
Method Min FPS Max FPS AverageFPS
CPU Single Core 1 3 2.083
1 Thread
CPU Quad Core 5 7 5.983
4 Threads
NVidia 8800Ultra 8 10 8.967
128 Processors
GPGPU using
CUDA
2xNVidia 8800Ultra 16 18 16.980
256 Processors
Multi-GPGPU
using CUDA
Fig. 1. Current t CPU and GPU architectures
Type Processor Cores/Chip ALUs/Core
GPU AMD Radeon HD 4870 10 80
NVIDIA GeForce GTX 280 30 8
CPU Intel Core2Quad 4 8
Cell 8 4
Sun UltraSparc 8 1