文章基本信息

标题：Opening the Vault - information exchange - Technology Information
作者：Peter Fischer
期刊名称：Software Magazine
出版年度：2000
卷号：April 2000
出版社：Rockport Custom Publishing, LLC

Opening the Vault - information exchange - Technology Information

Peter Fischer

Integration and XML provide the keys, but there are many different approaches to information integration. The Challenge is to choose the best one for your needs.

The Internet has had a profound affect on the way we conduct business. The pervasivencss of the technology, as well as its availability and commodity pricing, have positioned the Web as the de facto mechanism to establish trading relationships. To establish these new relationships with customers and suppliers, however, companies have to open up the information stored in their "information vaults." Information integration plus XML provide the keys.

Information integration environments provide an "extend the business" capability while supporting a "keep the business" approach. This allows new applications, most importantly e-business systems, to be deployed and to evolve independently of existing business processing systems. By extending the business, enterprises can establish new relationships throughout the supply chain, while retaining existing ones.

Information integration has always been a challenge. Recent developments in the Enterprise Application Integration (EAI) space and the rise of XML have renewed interest in this area.

An organization typically has multiple sources of information in disparate databases and processing systems. Information is exchanged using a variety of proprietary approaches, such as database extracts, file transfers, and batch updates. Businesses are required to support numerous formats to enable information exchange, both internally as well as with partners. The result is "information spaghetti." (See figure, p. 25.) Internally, business applications establish point-to-point information connections, leading to situations where a single application has to render information in multiple formats. What a mess!!

Information integration controls this "information spaghetti" by "connecting the dots" to an integration information environment. Information integration focuses on collecting and collating data that resides on different data points in an informational network. These data points can represent data that resides in a variety of data storage devices, including RDBMS, flat files, ISAM, and even e-mail.

Solutions and Approaches

Today there are many ways to solve the information integration problem. Information integration can occur at different levels in the computing environment. At the Web server level, information is integrated using a component that extends the Web server processing environment.

Integration at the enterprise server level entails the creation of a separate environment that provides integrated information assets to a requester Note, however, that one of these requesters could be a Web server environment, so the line blurs a bit here. This is especially true when looking at current products in the marketplace.

As a company that specializes in information systems integration, Quantum Enterprise Solutions has coined a set of terms that functionally capture several of the approaches in use today. To put all these approaches in perspective, Quantum has developed the Information Integration [Model.sup.TM] (see figure, p. 28). This "stack" captures the relationship of the different approaches to information integration, along with enabling technologics. Using this stack, you can match your requirements to the appropriate products.

The challenge in information integration is that as you move up the value chain to high-value business intelligence enablement, the level of technology and the complexity required to integrate information increases exponentially.

Integrated Data Access Environments

Integrated data access (IDA) environments create an information network by providing standard access mechanisms to a wide array of back-end data sources. Information connectors, which are targeted to specific information assets, provide the "connectivity" into these data sources. These connectors tend to use native access methods to access the data source.

IDA environments provide the ability to create a single SQL query that spans multiple data sources. This query is then split into multiple SQL sub-queries that query a single data source. Typically, these queries are targeted for a specific database engine, providing native access to the data.

The subqueries contain as much selection, filtering, joining, and sorting as required. The benefit to this approach is that SQL can be written to take advantage of the various optimization techniques for data access for a particular data source. This allows the environment to leverage the capabilities of each database system, while minimizing the amount of data returned and the time needed to return it.

Advantages to this approach include optimization of queries targeted to a specific data source platform, as well as efficiency of data movement, as only the data that supports the query is returned across the network. Performance enhancements are gained as queries that join multiple data sources are processed on the various data servers simultaneously. Most importantly, operational data updates in real time as data is updated at the source and not in a central repository, which then distributes updates to the appropriate data source. This ensures that the next access to the data will have the correct updates.

Data Integration Hub (DIH)

Data integration environments focus on replicating information stored in multiple back-end data sources into an integrated database of information. Typically, the data integration hub (DIH) maintains a bidirectional connection to the back-end source, applying updates to its database to the backend data sources as well as receiving updates from back-end data sources. Remember, these back-end data sources are still being accessed by legacy business processing systems.

The advantage to this approach is that the hub represents a single system image of the entire data model. DIHs are similar to data warehouses conceptually, but operationally different. Data warehouses are typically built to scrub data to support decision support systems and OLAP. Unlike data hubs, they do not provide access to online data, instead relying on batch feeds from the various subscribed back-end data sources to feed the warehouse.

At the heart of DIHs is an ETL (extract, transform, load) engine that takes in data from multiple sources, transforms it into a standard format, and loads it into an "integrated database." This database is typically a data mart that also contains metadata about the data, This metadata provides the "keys to the kingdom," as it maintains the relationships between the data elements in the data marts as well as the relationships back to the data sources.

Semantic transformations add value to the raw data in the data mart by attaching meaning, in the form of business logic, to entities in the data mart. They restore information that may potentially be lost during a record-level transformation. This is key, considering the value of data is not just in the data itself but also in how it is interpreted and used.

Information Portals

Information portals provide a window to the wide array of information stored across a company. Functionally, there are three types of information portals-- reference, collaborative, and interactive. Reference portals provide the window to corporate information. Collaborative portals "activate" this information providing workflow and management tools for creating, updating, publishing, and routing information documents.

Both reference and collaborative portals are valuable in their own right, but the true value of information integration comes with interactive portals. They fill in the information integration gaps left by reference and collaborative portals by allowing corporate data to be merged with documents and presented to an external party for additional input or response. Interactive portals merge data and documents that reside across disparate and heterogeneous environments into integrated information packets that represent a meaningful snapshot of corporate information.

Information portals are typically leveraged to enable e-commerce, including both B2B and B2C. The key to the success and great applicability of information portals, termed portal servers, is the use of XML as an open information exchange medium.

Information portals typically contain XML repositories that store the XML templates that define XML record structure as well as rules for XML documents transformation. These transformations are accomplished using XML engines, which take the XML template and execute the "rules" in the template to create an integrated XML data document by combining information from multiple back-end sources. Finally, information portals may also contain information target engines that allow corporate information to be enhanced with specific information to target a particular market.

Role of XML

XML has emerged as the technology du jour to solve our information integration woes. XML's true value is as an information exchange technology, not as an integration technology.

By separating information content from presentation and delivery, enterprises can achieve economies of scale for information exchange. XML provides the mechanism for capturing integrated informational assets from multiple data sources.

Business Process Distribution -- Getting the Information Across

At the top of the stack for Information Integration, the end result is an integrated dataset. Internally, a company that reaches this point has gained great value in being able to integrate its data assets and gain control over the scattered data points across the enterprise.

However, for most companies this is not the endgame. One of the major reasons for data integration is to provide a single image of data to an external party as an integral component of a B2B relationship. After all, a business transaction is more than just pushing information to a partner.

Providing information as XML documents is only part of the story when it comes to establishing meaningful B2B relationships. In addition to exchanging information via a standard self-describing format, there is the requirement for a delivery mechanism to move the information between companies.

The classical approach has been via some sort of electronic data interchange (EDI) connection over a value-added network (VAN). However, the expense, as well as the dependencies established between partners, does not mesh well with the loosely coupled nature of B2B.

How do we resolve this apparent conflict of the loosely coupled nature of B2B information exchange with the requirements for transactional behavior? The answer is Java Messaging Service (JMS), a new Java technology that provides the foundation for B2B messaging. The combination of JMS with XML allows the enterprise to "extend the business" using open and standard technologies.

SonicMQ from Progress Software Corp., Bedford, Mass., is one of the first JMS implementations available in the marketplace. SonicMQ's JMS implementation supports a variety of communication mechanisms, including request/reply and publish/subscribe. SonicMQ's combination of Java, XML, and JMS make it a very pervasive technology for enabling reliable, interactive, and transactional B2B relationships via the exchange of information in XML documents.

One of the first adopters of SonicMQ is ChanneLinx.com, Greenville, S.C., which specializes in integrating entities into supply chains. ChanneLinx leverages SonicMQ as the basis for its connectivity. SonicMQ provides the necessary quality of service that ChanneLinx requires for B2B XML messaging.

By using SonicMQ, ChanneLinx's products can create information channels in which B2B partners can establish highly interactive relationships by publishing information using a topic-based mechanism. A company will not have to "pull" from a partner. Instead, the information is sent when it becomes available, providing a much richer and functional exchange mechanism between partners.

Integration Choices

As we have seen, there are many approaches to "skinning the information integration cat." The Information Integration Model captures a few points on that spectrum. More importantly, it provides a framework for helping to make the right decision in terms of requirements, technology, and product selection. The high-value information portals provide the biggest bang for the buck in terms of information integration and delivery, while integrated data environments and data integration hubs provide a solid and proven basis for information integration.

Peter Fischer is director of Technical Service for Quantum Enterprise Solutions Inc., a recognized leader in EAI solutions. He specializes in architecting, designing, building, and integrating large-scale distributed systems using application servers, middleware integration, and Java technologies. E-mail him at pfischer@ischer@qtrg.com.

Products, Products, Products

THERE IS NO LACK OF PRODUCTS in the information integration space. There are many different approaches to information integration, as evidenced by the Information Integration Model, and the challenge is to choose the best one for your needs.

* Metagon Technologies, Matthews, N.C. Metagon has a good story to tell in the information integration space. Its flagship offering, DQpowersuite, is a data integration suite that provides solutions for both data information hubs and data integration environments.

DQbroker's global metadata cache is the key to its intelligence is well as its performance. It allows every DQbroker server to know the current state of all the data in a distributed data domain. In addition, this cache makes the retrieval of database properties faster as well as accelerates the entire end-to-end query process, as DQbroker has all the information it needs to distribute subqueries. DQbroker comes with an easy-to-use GUI that makes the creation of distributed data views straightforward. This GUI allows the creation of joins between tables in different data sources and on different platforms.

DQbroker, an enterprise server integration product, is DQPowersuite's advanced integrated data access environment. It provides information connectors in the form of native access to all the major RDBMS products. It even provides ODBC access for those sources not natively supported.

At the heart of DQbroker is its distributed server environment. The architecture is not based on a single hub that contains all the intelligence. Instead, at each "data point" in the environment resides a DQbroker server, a special process that interacts with the specific data source on that host. A client can connect to any DQbroker in the environment to submit a cross-platform, crossdata source query.

DQtransform provides the basis for a data information hub via its ETL capabilities. It layers on top of DQbroker by extracting and transforming any data visible via DQbroker and loading it into any target.

* Scriptics Corp., Mountain View, Calif. Scriptics Connect Server is a solid Web server level integration product. Scriptics Connect extends an, Apache, Microsoft IIS, or Netscape Web server with a Tcl scripting engine. The Tcl/XML engine provides extensions for processing XML documents as well as for communicating with external business applications. Scriptics Connect includes interfaces for ActiveX, ODBC, Java, Oracle, and Unix terminal emulation.

B2B applications are defined as a collection of document handlers. Each document handler contains script that processes a particular XML document for a particular purpose using a particular vocabulary to interpret the document. Each document handler is associated with a specific URL and when an XML document is posted to that URL, it is passed to the scripting engine, which then finds the appropriate document handler and processes the document. As the document handler executes, it uses integration points to communicate with external applications outside of Scriptics. In a sense, the document handler provides the integration workflow in a Web server environment by tying together all the pieces required for an information event.

* Evolutionary Technologies Inc. (ETI), Austin, Texas. ETI Extract Tool Suite is a code generating software product that automates data transformation, data reorganization, and data integration throughout an enterprise's computing environment. Extract accesses data from a multitude of back-end sources, including Oracle, DB2, Sybase, Informix, IMS, and ERP systems such as SAP and PeopleSoft, as well as systems accessible via MQSerics. C or Cobol code is generated that runs on a wide spectrum of operating systems, including all the IBM operating systems, HP-UX, Solaris, and Tandem.

The Tool Suite generates end-to-end conversions for consolidating data. This includes the code, scripts, and utilities necessary to extract, transform, move, and load data between incompatible environments. In a sense, the code it produces provides "data transformation middleware." Extract contains a series of editors that arc used to configure and control this generation. The Conversion Editor is used to map data values from one system to another as well as to specify the business rules that will be used to transform and organize the data.

At the heart of the Extract Tool Suite's functionality is the MetaStore, which is the central repository for information that is used for conversions. In addition, the MetaStore stores information about data conversions as they occur, thereby providing the center of intelligence for the environment. The Data System Libraries (DSLs) provide the information connector functionality for Extract. They contain the information necessary to generate programs that perform read, write, and transformation functions against a specific data source using native programming APIs.

* E.piphany, San Mateo, Calif. E.piphany E.4 Suite, is targeted to integrating and organizing customer information. It is applicable for those companies that are looking to get control of their dispersed customer information assets in order to increase overall customer satisfaction and retention.

E.piphany E.4 combines many of the elements of the Information Integration Model, including data extraction, data mart creation and updates, as well as data analysis into a single package. At the heart of Epiphany's data extraction functionality are extractors. These extractors are information connectors that are responsible for performing record-level transformations that convert data from multiple back-end data sources into the uniform storage format of E.piphany.

* Sequoia Software Corp., Columbia, Md. XML Portal Server is an advanced information portal product that offers a highly scalable approach to information integration. Portal Server includes information connectors that allow users to access, retrieve, transform, and update information stores in a variety of legacy systems and data sources, including 3270, CORBA, database (ODBC), and ActiveX. These connections are automated and extract data from a data source and parse it into an XML document following user-defined mapping.

Portal Server "extends" XML by allowing the ability to associate rich data types with XML elements, overcoming one of the major weaknesses of XML. Information snapshots are captured in aggregate XML objects (AXOs), which are templates that define the XML record structure for the merged information. Information contained within an AXO may be as granular as required and can blend structured, as welt as unstructured, information into a single unit of information distribution and interaction.