文章基本信息

标题：Highly-scalable server for massive multi-player 3D virtual spaces based on multi-processor graphics cards.
作者：Moldoveanu, Alin
期刊名称：Annals of DAAAM & Proceedings
印刷版ISSN：1726-9679
出版年度：2008
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：Massive multi-player online games and other types of virtual spaces, like exhibitions, museums, virtual stores, etc., are already attracting a huge number of users. The technology and its applications are rapidly evolving and expanding. It is highly probable that 3D virtual spaces will be the next generation of Internet interfaces, replacing partially or totally the web browsers or instant messengers.
关键词：Display adapters;File servers;Graphics boards/cards;Servers (Computers);Software

Highly-scalable server for massive multi-player 3D virtual spaces based on multi-processor graphics cards.

Moldoveanu, Alin

1. INTRODUCTION

Massive multi-player online games and other types of virtual spaces, like exhibitions, museums, virtual stores, etc., are already attracting a huge number of users. The technology and its applications are rapidly evolving and expanding. It is highly probable that 3D virtual spaces will be the next generation of Internet interfaces, replacing partially or totally the web browsers or instant messengers.

The architecture for such applications is client-server oriented, with the server managing the whole content of the virtual world, providing the connectivity and user interface for the clients (Bogojevic, 2003).

An important problem that developers face is the amount of workload such a server must handle. The work is characterized both by the number of tasks, which depends directly on the number of online users, and the complexity of the tasks, which depends on the particularities of the virtual world.

2. CURRENT SOLLUTIONS

Due to the inherent real time and fun oriented aspects of virtual worlds, which leads to complex clients and minimized communication, scaling virtual worlds is quite different from scaling other types of applications (Waldo, 2008).

Current solutions are traditionally designed based on single processor machines--with very serious limitations to the number of online users a computer can handle. To overcome this, clusters or servers, each handling (Nguyen, 2003) distinct parts of the virtual world, are created (Morillo, 2006). This solution is scalable to some degree but is very costly as hardware also involves complicated software architecture (Ferretti, 2005). This is the main reason that current high-end massive multi-player applications (like MMORPGs) are extremely costly to develop and also involve some serious operating costs.

All most recently developed RAD toolkits or frameworks for creating virtual worlds (www.multiverse.net; www.activeworlds.com; www.openskies.net) still follow this traditional architectural approach.

Another solution, more like a workaround, is to impose limitations to the content and interactions in the virtual world, therefore reducing as much as possible the complexity of the tasks the server must handle. Of course, in the end, this affects the quality and possibilities of that virtual space.

3. OUR SOLUTION

In this paper we present a brand new architectural solution for a massive multi-player server which uses the new graphics cards (GPU) available on market. Unlike previous solutions, the one presented in this paper is highly multiprocessor-based.

All the NVIDIA GPUs chipsets from the G8X series onwards, including GeForce, Quadro and the Tesla lines, are targeted specifically not only for the graphics market but for the computing market as well (especially Tesla). Similar solutions are offered by competitor companies, such as ATI. Each such GPU includes a great number of streaming processors (in the order of hundreds) organized in a SIMD architecture that can be programmed for general purpose computing. Therefore, such a GPU can simultaneously run many threads, each of them accessing different areas in the memory of the GPU and following its independent execution. Many applications can get a huge performance and scalability boost from this architecture if their design can accommodate it and we decided to give it a try to solve the limitations of the current massive-multi-player servers.

The architecture of a massive multi-player server is quite complex and is beyond the scope of this article. We focus here on our solution of using GPU multiprocessing in such a server.

Besides the original concept of using the new GPUs multiprocessing power for the massive multi-player server, the challenge is to find an efficient way to distribute the workload to the streaming processors included in the GPU. To obtain high scalability and speed, several aspects must be considered:

* Distribution/coordination, which is mainly directed by the CPU, must be extremely simple and fast.

* The granularity/complexity of individual tasks must not be very high in order to obtain very good client-server latency.

* The distribution of the workload over the streaming processors must adapt in real time to the number of online users or the density of users in a region of the virtual world.

* The tasks and their coordination, benefit from, but also are in some ways limited by details of the GPU architecture; for example:

** fast shared memory;

** the bus bandwidth and latency between the CPU and the GPU may be a bottleneck;

** recursive functions are not supported and must be converted to loops.

We decided to move the highly computational expensive tasks on the GPU streaming processors. Basically, these are the world logic computational operations such as collision detection, physics calculations, etc.

Tasks division:

* The work division rules rely on the fact that a user will mostly interact with the environment, entities and other users in his proximity or in a certain region.

* The world will be divided in regions:

** regions results from a logical, server-side division of the virtual space;

** a region contains the essential information about the environment, objects and users within;

** a region receives a group of streaming processors from the GPU; more or less streaming processors are dynamically assigned to a region, based on its computational needs depending mostly of the number of users within;

** the division of the world in regions is mostly statically, since any re-division may take some time with memory transfers, therefore must be avoided for real-time performance;

** when a user or object moves from one region to another, its essential data is moved between regions;

** for runtime smoothness and to have the effect of a seamless world, regions should be partially overlapped at their common boundary, therefore the users that are moving between such regions will do this in a transparent manner.

* The server side work-flow is divided into frames. A frame is a very short time amount, which, as a design choice can be fixed or variable. During a frame:

** user input is taken from the input queues and passed to the central GPU control code, which will pass it to the current user's region;

** the logic for a region will be responsible for creating the computational tasks for that region, which will be delivered to the processors assigned to that region; for example, for collision detection, a tasks will be created for each pair of moving objects;

** the streaming processors will execute the individual tasks;

** once all the tasks finish (in the case of variable length frames) or the frame duration is elapsed (for fixed frame design), results are placed into output queues.

The solution is adaptive, as the number of processors assigned to a region can be changed dynamically, as needed. If more users move into a region, that region will receive more processors, while the regions from where users left will lose some processors.

We have tested the proposed solution by creating a server prototype, during a workshop in the Computer Graphics laboratory from University POLITEHNICA of Bucharest. The architecture was not difficult to implement using the CUDA toolkit (http://www.nvidia.com/cuda; http://en.wikipedia.org/wiki/CUDA).

The test application involved complex physical interactions in a 3D environment. The tests showed that the system performance doesn't actually degrade much with the increase in the number of users and objects, at least to the extent we were able to test in the laboratory. The GPUs streaming processors were actually able to handle a huge amount of computing-intensive calculations.

4. RESULTS AND CONSEQUENCES

Since all the computational-extensive tasks that can be parallelized can be potentially moved in this way to the multiprocessors GPUs, a single computer with such devices can handle a much bigger number of users, resulting in huge reduction of costs.

The software architecture is much simpler than in the case of the classical, multi-computer server-farm approach.

As scalability is concerned, we have to consider that:

* Current GPUs can contain up to 256 streaming processors (NVIDIA) and future versions will certainly increase this numbers a lot.

* Current PC mainboards can support up to 4 GPUs; this number might also be increased in the future.

* Hence, current PCs can include up to 1024 programmable GPU streaming processors and incoming generations might probably include from thousands to even millions. Limitations can occur even with this architecture.

For some applications there might be computing intensive tasks that don't map well on the SIMD GPUs' processing.

Even outside this case, one single PC might not be enough for a server with a huge number of users. The full virtual world coordination and other logical processing might get expensive enough to not be handled by a single PC. The architecture can be extended with the classical multi-computer approach, but still, the needed number of computers will be significantly lower, probably even with a degree, because most of the computational expensive tasks are delivered to the GPUs thousands of streaming processors.

If the proposed architectural solution will be successful, we can see, in the near future that the tens of hundreds of computers server farms for MMORPGs being replaced with a significantly lower number of groups of computers with multiprocessors GPUs.

5. CONCLUSION AND FUTURE WORK

We must still investigate the maximum number of users that a single computer with multiple multiprocessor GPUs can handle, for various types of games and other 3D virtual spaces.

Slightly different solutions regarding the workload distribution may be needed depending on the type of MMO game/application.

If the results will be good enough, a RAD framework or toolkit for MMO virtual 3D spaces will be developed based on this basic described architecture, aiming to allow developers to create, deploy and operate custom 3D virtual spaces at very low costs.

6. REFERENCES

Bogojevic, S.; Kazemzadeh, M (2003). The Architecture of Massive Multiplayer Online Games, Available from: http://graphics.cs.lth.se/theses/projects/mmogarch/ Accessed 2008-07-01

Ferretti, S. (2005). Interactivity Maintenance for Event Synchronization in Massive Multiplayer Online Games. Technical Report UBLCS-2005-05, March 2005

Morillo, P.; Orduna, J.; Fernandez, M. (2006). Workload Characterization in Multiplayer Online Games. ICCSA (1) 2006, pp 490-499

Ta Nguyen; Binh Duong; Suiping Zhou. (2003). A dynamic load sharing algorithm for massively multiplayer online games. The 11th IEEE International Conference on Computational Science and Engineering, pp 131-136

Waldo, J. (2008). Scaling in Games and Virtual Worlds. Communications of the ACM, Vol 51,No 08/2008, pp 38-44

Multiverse Platform Architecture. Available from: http://update.multiverse.net/wiki/index.php, Accessed: 2008-08-21

ActiveWorlds, Available from: http://www.activeworlds.com/, Accessed: 2008-08-21

Openskies MMOG SDK, Available from: www.openskies.net, Accessed: 2008-08-21