文章基本信息

标题：An architecture for distributed databases on workstations.
作者：Boicea, Alexandru ; Magdalina, Catalin ; Ionescu, Diana-Cristina 等
期刊名称：Annals of DAAAM & Proceedings
印刷版ISSN：1726-9679
出版年度：2011
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：Key words: distributed databases, architecure, workstation, performance

An architecture for distributed databases on workstations.

Boicea, Alexandru ; Magdalina, Catalin ; Ionescu, Diana-Cristina 等

Abstract: Many recent research projects and reports on distributed database systems (DDBS) deal with performance. Organizational aspects, such as the distribution of control, or the management of data in an enterprise, rarely influence the design of these DDBS. Furthermore, architectural proposals for the use of the special properties of workstations, which are largely independent but connected by a local area network are lacking. In this thesis, a simple and flexible system architecture is proposed. This proposal is tailored to the workstation environment and it includes both organizational and technical aspects of data distribution, replication and integrity.

Key words: distributed databases, architecure, workstation, performance

1. INTRODUCTION

The goal of this thesis is the development of a simple, but powerful architecture for database services in a LAN environment with workstations. The architecture emphasizes organizational aspects in addition to the technical discussion (Darabant, 2009).

An attractive working environment for workstations includes different services offered via LAN. Besides laser printers, mass storage media and electronic mail, collections of structured data, i.e. databases, should also be available to the user community. For such a database service to be practically employed, it must be attractive to users. This entails providing access to additional information from others without much impeding operations on own data which are made accessible to others.

2. SERVER ARCHITECTURE

Three main components can be distinguished in figure 1: SERDS is the DBMS server, SNC implements database specific protocols on top of the general LAN software on Lilith, called CIPON, and SAM is the main program coordinating all the activities on the server. A short summary will be given as follows.

[FIGURE 1 OMITTED]

SERDS (server relational data system) is a multi user DBMS which was developed starting from RDS. Similarly to RDS, SERDS offers an interface which allows navigational operations on single tuples of a relation. The fact that SERDS is used as a backend system only somewhat simplified its architecture and interface. The main change regarding RDS was the removal of the implicit dependency from the single user situation, in order to allow SAM to control several concurrent transactions. This included delegating the maintenance of current positions in relations to the clients and implementing an operation-oriented log instead of page-level logging.

SNC (server network controller) implements a protocol which allows the server to receive requests and transmit answers. SNC is based on CIPON which offers a connection-oriented protocol to send and receive fixed length messages.

SAM (server action manager) is the main program on the server. As such, its main task consists of scheduling the requests received from SNC for execution by SERDS and of passing back the corresponding answers. Three special tasks are included in SAM but implemented in separate modules:

* authorization enforcement uses the design of user authorizations as recorded in the meta database to check whether incoming requests are legal;

* refresh control handles requests to refresh duplicates residing on the server and delivers status information to clients, allowing them to decide on the necessity and urgency of a refresh;

* concurrency control.

3. CLIENT ARCHITECTURE

[FIGURE 2 OMITTED]

The server interface component was split into two parts: CNC (client network controller) implements the protocols needed for the exchange of requests and answers, but does not care about the content of the messages transmitted and received. It is the counterpart of SNC on the server side. ASK (Aranea server konnection) translates database-oriented procedure calls to messages that can be transmitted by CNC and unwraps the responses. It thus constitutes the counterpart of SAM on the server side. The interface offered by ASK must be functionally identical to the one of the client DBMS. To this end, ASK has to fulfill some functions which were delegated from the server, such as maintaining current positions in relations.

The decomposition and integration component is implemented by ARES. Its main task consists of mapping the different data descriptions and procedures of ASK and RDS onto a uniform interface. It was decided that the data currency should be shown in this interface (but not the physical location of the data). Thus, a user can specify whether he wants to access an original or a duplicate, but he does not need to care where the data are physically allocated (on the server or locally). To support autonomous operation of user workstations, the default is always the local copy of some data (if one exists at all), be it an original or a duplicate. This allows running as many applications as possible even if the server is not accessible. Another aspect of ARES is refresh handling.

Application programs (implementing predefined transactions to be executed by users who only fill in some parameters) can be written by using the ARES interface only. However, specifying complex queries by using single tuple operations is quite intricate; this is why a relational algebra interface component (April) is implemented. But, at least for updates, a direct access to ARES is still necessary. Finally, Adidas (Aranea distributed database system) is an interactive window-oriented user interface to April and ARES. It includes, in a similar way as on the server, a graphical representation of the software architecture indicating the component currently at work.

4. IMPORTANT CONCEPTS

After this general description of the software architecture, we will now show the implementation of certain important concepts. (Auvray, 2008)

Refresh--A refresh always involves a user machine and the server. The refresh instant depends on the presence of users in the system. It is therefore obvious to have the refresh process controlled by the client. This is also consistent with the nature of the server which never becomes active by itself, but rather reacts to requests it receives from clients.

A refresh takes place in two phases: first, server and client have to agree that a refresh is necessary and that the time is right to do it; second, if a refresh has to take place at all, the client starts a special transaction during which it reads or writes the data fragment to be refreshed, replacing the old data. The user is also given the possibility of requesting an immediate refresh explicitly. In this case, the first phase is skipped.

The goal of the first phase is to determine whether a refresh has to take place at this moment. To come to this decision, cooperation between server and client is necessary. It is the client which initiates the refresh and it must also make the decision when to refresh. To this end, it needs two kinds of information: first, the refresh necessity and urgency conditions and second, the current state of server and client. So, the client has to look up some information in the meta database and to pass a message to the server, requesting the necessary information. In the second phase, the update of some data fragment actually takes place, whole fragments being replaced by new data. In case of refreshing from client to server, special attention must be paid to process effects on the server. Therefore, a special command must be given to the server. In the other direction, regular read requests are issued to the server.

Authorization--The authorization is enforced on two levels: first, the run time data descriptions which are necessary to access server data are only delivered to authorized users (or, more precisely, to their clients). Thus, at the time when a connection between server and client is established, authentication information is passed from client to server, and in turn, some descriptions of data stored on the server are returned. Because a server may not trust clients (they are running on a user workstation and can thus be corrupted), a second step must always be added: every data access request submitted to the server is filtered by SAM through a component checking whether the request is legitimate or not and blurring those attribute values of retrieved tuples which the user is not allowed to see. The first step merely serves to deny some of the unauthorized access attempts already on the user machine, thus discharging the server.

The meta database had to be extended to cover authorization aspects (just as for refresh parameters). This meta information is frequently used at run time in order to determine user rights for certain operations. Using a database for these lookups slows down server operations significantly. So, resident tables should be used for authorization information. (Boicea & al., 2010)

Concurrency Control--The optimistic control method allowing extended use of duplicates within update transactions is not suited for implementation in a DBMS offering navigational access to data, such as SERDS. The reason is the difficulty of consistently describing the read set for operations which are based on some current position in a relation. A set-oriented interface between server and client is necessary to implement this method (Huang et al., 2001).

5. CONCLUSIONS

A functional specialization between server and clients simplifies the system by separating concerns and thus increasing the modularity of the software. It is interesting to see that this seems to be a general trend in distributed database systems: in their attempt to provide an integrated view of the database by replicating functionality over all participating nodes, the SDDS approaches generally stay on a research level.

Most systems which are heavily used, e.g. in banks or airlines, are loose, heterogeneous federations of largely independent systems with well defined, thin interfaces. The reason lies both on the technical, system-oriented and on the managerial, data-oriented level, and is named simplicity on both levels. Functional specialization, together with well defined interfaces and distributed control, can help to build modular systems which can more easily be designed and managed.

The architecture which was described shows that the explicit classification of data with regard to their ownership (private vs. shared) and of different degrees of data currency (originals vs. duplicates) can serve to match existing organizational methods and structures. This helps not only to improve the acceptance of such systems, it also facilitates the management and maintenance of databases.

6. REFERENCES

Auvray, S. (2008). Just Another Distributed Database? Not Really, Available from: http://www.infoq.com/news/2008/ 04/distributed-dbstrokedb Accessed on: 2011-03-15

Boicea, A. Badalau, C., Petcu, R., Nicula, A. (2010) Improving Application Performance Through Database Caching at Application Tier Level, The 21th International DAAAM Symposium, Croatia

Darabant, A. (2009). Proiectarea bazelor de date distribuite, Ed. Casa cartii de Stiinta, ISBN 9789731336057, Cluj Napoca, Romania

Huang, Y & Chen, J. (2001). Fragment Allocation in Distributed Database Design, Journal Of Information Science And Engineering 17, pp. 491-506