Highly-scalable server for massive multi-player 3D virtual spaces based on multi-processor graphics cards.
Moldoveanu, Alin
1. INTRODUCTION
Massive multi-player online games and other types of virtual
spaces, like exhibitions, museums, virtual stores, etc., are already
attracting a huge number of users. The technology and its applications
are rapidly evolving and expanding. It is highly probable that 3D
virtual spaces will be the next generation of Internet interfaces,
replacing partially or totally the web browsers or instant messengers.
The architecture for such applications is client-server oriented,
with the server managing the whole content of the virtual world,
providing the connectivity and user interface for the clients
(Bogojevic, 2003).
An important problem that developers face is the amount of workload
such a server must handle. The work is characterized both by the number
of tasks, which depends directly on the number of online users, and the
complexity of the tasks, which depends on the particularities of the
virtual world.
2. CURRENT SOLLUTIONS
Due to the inherent real time and fun oriented aspects of virtual
worlds, which leads to complex clients and minimized communication,
scaling virtual worlds is quite different from scaling other types of
applications (Waldo, 2008).
Current solutions are traditionally designed based on single
processor machines--with very serious limitations to the number of
online users a computer can handle. To overcome this, clusters or
servers, each handling (Nguyen, 2003) distinct parts of the virtual
world, are created (Morillo, 2006). This solution is scalable to some
degree but is very costly as hardware also involves complicated software
architecture (Ferretti, 2005). This is the main reason that current
high-end massive multi-player applications (like MMORPGs) are extremely
costly to develop and also involve some serious operating costs.
All most recently developed RAD toolkits or frameworks for creating
virtual worlds (www.multiverse.net; www.activeworlds.com;
www.openskies.net) still follow this traditional architectural approach.
Another solution, more like a workaround, is to impose limitations
to the content and interactions in the virtual world, therefore reducing
as much as possible the complexity of the tasks the server must handle.
Of course, in the end, this affects the quality and possibilities of
that virtual space.
3. OUR SOLUTION
In this paper we present a brand new architectural solution for a
massive multi-player server which uses the new graphics cards (GPU)
available on market. Unlike previous solutions, the one presented in
this paper is highly multiprocessor-based.
All the NVIDIA GPUs chipsets from the G8X series onwards, including
GeForce, Quadro and the Tesla lines, are targeted specifically not only
for the graphics market but for the computing market as well (especially
Tesla). Similar solutions are offered by competitor companies, such as
ATI. Each such GPU includes a great number of streaming processors (in
the order of hundreds) organized in a SIMD architecture that can be
programmed for general purpose computing. Therefore, such a GPU can
simultaneously run many threads, each of them accessing different areas
in the memory of the GPU and following its independent execution. Many
applications can get a huge performance and scalability boost from this
architecture if their design can accommodate it and we decided to give
it a try to solve the limitations of the current massive-multi-player
servers.
The architecture of a massive multi-player server is quite complex
and is beyond the scope of this article. We focus here on our solution
of using GPU multiprocessing in such a server.
Besides the original concept of using the new GPUs multiprocessing
power for the massive multi-player server, the challenge is to find an
efficient way to distribute the workload to the streaming processors
included in the GPU. To obtain high scalability and speed, several
aspects must be considered:
* Distribution/coordination, which is mainly directed by the CPU,
must be extremely simple and fast.
* The granularity/complexity of individual tasks must not be very
high in order to obtain very good client-server latency.
* The distribution of the workload over the streaming processors
must adapt in real time to the number of online users or the density of
users in a region of the virtual world.
* The tasks and their coordination, benefit from, but also are in
some ways limited by details of the GPU architecture; for example:
** fast shared memory;
** the bus bandwidth and latency between the CPU and the GPU may be
a bottleneck;
** recursive functions are not supported and must be converted to
loops.
We decided to move the highly computational expensive tasks on the
GPU streaming processors. Basically, these are the world logic
computational operations such as collision detection, physics
calculations, etc.
Tasks division:
* The work division rules rely on the fact that a user will mostly
interact with the environment, entities and other users in his proximity
or in a certain region.
* The world will be divided in regions:
** regions results from a logical, server-side division of the
virtual space;
** a region contains the essential information about the
environment, objects and users within;
** a region receives a group of streaming processors from the GPU;
more or less streaming processors are dynamically assigned to a region,
based on its computational needs depending mostly of the number of users
within;
** the division of the world in regions is mostly statically, since
any re-division may take some time with memory transfers, therefore must
be avoided for real-time performance;
** when a user or object moves from one region to another, its
essential data is moved between regions;
** for runtime smoothness and to have the effect of a seamless
world, regions should be partially overlapped at their common boundary,
therefore the users that are moving between such regions will do this in
a transparent manner.
* The server side work-flow is divided into frames. A frame is a
very short time amount, which, as a design choice can be fixed or
variable. During a frame:
** user input is taken from the input queues and passed to the
central GPU control code, which will pass it to the current user's
region;
** the logic for a region will be responsible for creating the
computational tasks for that region, which will be delivered to the
processors assigned to that region; for example, for collision
detection, a tasks will be created for each pair of moving objects;
** the streaming processors will execute the individual tasks;
** once all the tasks finish (in the case of variable length
frames) or the frame duration is elapsed (for fixed frame design),
results are placed into output queues.
The solution is adaptive, as the number of processors assigned to a
region can be changed dynamically, as needed. If more users move into a
region, that region will receive more processors, while the regions from
where users left will lose some processors.
We have tested the proposed solution by creating a server
prototype, during a workshop in the Computer Graphics laboratory from
University POLITEHNICA of Bucharest. The architecture was not difficult
to implement using the CUDA toolkit (http://www.nvidia.com/cuda;
http://en.wikipedia.org/wiki/CUDA).
The test application involved complex physical interactions in a 3D
environment. The tests showed that the system performance doesn't
actually degrade much with the increase in the number of users and
objects, at least to the extent we were able to test in the laboratory.
The GPUs streaming processors were actually able to handle a huge amount
of computing-intensive calculations.
4. RESULTS AND CONSEQUENCES
Since all the computational-extensive tasks that can be
parallelized can be potentially moved in this way to the multiprocessors
GPUs, a single computer with such devices can handle a much bigger
number of users, resulting in huge reduction of costs.
The software architecture is much simpler than in the case of the
classical, multi-computer server-farm approach.
As scalability is concerned, we have to consider that:
* Current GPUs can contain up to 256 streaming processors (NVIDIA)
and future versions will certainly increase this numbers a lot.
* Current PC mainboards can support up to 4 GPUs; this number might
also be increased in the future.
* Hence, current PCs can include up to 1024 programmable GPU
streaming processors and incoming generations might probably include
from thousands to even millions. Limitations can occur even with this
architecture.
For some applications there might be computing intensive tasks that
don't map well on the SIMD GPUs' processing.
Even outside this case, one single PC might not be enough for a
server with a huge number of users. The full virtual world coordination
and other logical processing might get expensive enough to not be
handled by a single PC. The architecture can be extended with the
classical multi-computer approach, but still, the needed number of
computers will be significantly lower, probably even with a degree,
because most of the computational expensive tasks are delivered to the
GPUs thousands of streaming processors.
If the proposed architectural solution will be successful, we can
see, in the near future that the tens of hundreds of computers server
farms for MMORPGs being replaced with a significantly lower number of
groups of computers with multiprocessors GPUs.
5. CONCLUSION AND FUTURE WORK
We must still investigate the maximum number of users that a single
computer with multiple multiprocessor GPUs can handle, for various types
of games and other 3D virtual spaces.
Slightly different solutions regarding the workload distribution
may be needed depending on the type of MMO game/application.
If the results will be good enough, a RAD framework or toolkit for
MMO virtual 3D spaces will be developed based on this basic described
architecture, aiming to allow developers to create, deploy and operate
custom 3D virtual spaces at very low costs.
6. REFERENCES
Bogojevic, S.; Kazemzadeh, M (2003). The Architecture of Massive
Multiplayer Online Games, Available from:
http://graphics.cs.lth.se/theses/projects/mmogarch/ Accessed 2008-07-01
Ferretti, S. (2005). Interactivity Maintenance for Event
Synchronization in Massive Multiplayer Online Games. Technical Report
UBLCS-2005-05, March 2005
Morillo, P.; Orduna, J.; Fernandez, M. (2006). Workload
Characterization in Multiplayer Online Games. ICCSA (1) 2006, pp 490-499
Ta Nguyen; Binh Duong; Suiping Zhou. (2003). A dynamic load sharing
algorithm for massively multiplayer online games. The 11th IEEE International Conference on Computational Science and Engineering, pp
131-136
Waldo, J. (2008). Scaling in Games and Virtual Worlds.
Communications of the ACM, Vol 51,No 08/2008, pp 38-44
Multiverse Platform Architecture. Available from:
http://update.multiverse.net/wiki/index.php, Accessed: 2008-08-21
ActiveWorlds, Available from: http://www.activeworlds.com/,
Accessed: 2008-08-21
Openskies MMOG SDK, Available from: www.openskies.net, Accessed:
2008-08-21