文章基本信息

标题：Monitoring the network to keep databases up and running
作者：Chalmers, Andrew
期刊名称：Enterprise Networks and Servers
出版年度：2005
卷号：Jun 2005
出版社：Publications & Communications, Inc.

Monitoring the network to keep databases up and running

Chalmers, Andrew

Yes, I know. You are a database administrator, not a network or security manager. Yet, as the SQL Slammer worm and other recent threats have so clearly demonstrated, databases don't operate in a vacuum. Network slows or outages, viruses, hacks, hardware crashes, application failures or a thousand other problems affect database operations, if not directly, at least from the end users perspective.

Network and systems management tools can solve many of these problems. Yet many companies choose to operate at risk, scared off by such software's well-earned reputation for being expensive to install and even more costly and time consuming to maintain. Fortunately, lower cost products are now extending network and systems management beyond the Fortune 500 so that they are now affordable for many mid-sized and even small enterprises. In this article we take a look at two very different organizations and the problems they solved by using such software.

Early warning system

The typical network has more informants than the FBI, and you don't even have to pay them. Tucked away inside every piece of equipment the firmware is busily tracking and recording every aspect of equipment performance.

Let's take hard drive failure as an example of a problem that could bring down a database. It is such a threat that companies go to great lengths to ensure they are covered if it ever does occur.

They spend money for RAID, real-time remote mirroring and Linux clusters to ensure hardware redundancy so they have a hot backup when a drive does go down. But even with these in place, no one feels confident enough to abandon their nightly backups. Any time or expense is well worth it considering the cost of failure.

Recognizing the need for a better system, drive manufacturers have developed early warning mechanisms to warn of impending disk problems. Most hard drives incorporate a piece of technology called Self Monitoring Analysis and Reporting Technology or S.M.A.R.T, which is based on IBM's Predictive Failure Analysis technology. S.M.A.R.T. tracks up to 30 different disk performance measurements and compares them to parameters the manufacturer sets for that particular drive. Ideally, it will alert the administrator in time to backup and replace the drive.

But S.M.A.R.T. is a passive warning system. It requires another piece of software to request and process the alert.

"When there is a failure coming, the S.M.A.R.T. drive passes that information to the RAID controller," said Paul Santeler, vice president of the management networking and high availability products group at Hewlett-Packard Co. in PaIo Alto, Calif. "But RAID does its own analysis as well, monitoring hundreds or thousands of things on the drive itself to try to see as a whole what might cause failure."

But disk drive controllers are just one element in the entire network. Every server, every firewall, every switch, every router is also busy generating its own performance information and dutifully storing it in log files. A database mainframe, for example, can spit out 30,000 error messages a second when something goes down. Service providers send Forward Explicit Congestion Notifications (FECNs) and Backward Explicit Congestion Notifications (BECNs) alerting whenever a packet may not have made it over the WAN.

After a crash, the administrators can go out to each piece of equipment and look back through those logs to locate the point when errors started to crop up which later led to the crash. That does work but it is not very efficient. It is like taping a piece of paper over the gas gauge in a car, and only removing the paper after the car stops running to see if, perhaps, the tank is empty.

Of course, that example oversimplifies what happens in networks. A car's dashboard only displays a few performance indicators. Even small networks have thousands, and so those logs sit uninspected until there is a problem.

The solution, of course, lies in basic database technology - installing software to aggregate the information from all the syslogs and Windows event logs into a single file with an intelligent interface. Once the information is all assembled, then the administrator can keep tabs on all the events, establishing policies to automatically address routine problems that arise, while immediately alerting the appropriate personnel whenever critical thresholds are exceeded. That way, anomalies can be corrected before they cause trouble, but admins don't drown in log entry overload.

Companies can purchase these types of log monitoring and alerting software as a standalone product or as part of a network management package. Central Maine Power (CMP) uses a feature called Logalot that comes with its network management package, WebNM from Somix Technologies Inc. in Sanford, Maine.

CMP, an Energy East Corp. unit headquartered in Augusta, provides electricity to 540,000 commercial, industrial and residential customers. Its 120 IT staff manage a network linking 20 sites via ATM and private fiber links. The 1,400 desktops primarily run Windows 2000, but the 150 servers run a mix of Windows, AIX and Linux plus there is a mainframe running O/S 390. The databases run on both Windows servers and IBM RS 600Os running AIX.

In addition, they support Palm devices and proprietary handhelds used by the meter readers.

The company was using Cabletron's (acquired by Aprisma, which was acquired by Concord and most recently by Computer Associates) Spectrum network management software, but it was too expensive and cumbersome for CMP's needs.

"For us, Spectrum meant one fulltime person just to keep the system running smoothly," networking specialist Philip Morneault said. "Analysis revealed that we were spending more time and money managing and maintaining it than it gave us valuable information."

WebNM, on the other hand, cost a fraction of what CMP was paying previously for management software, and was much easier to set up and maintain. Morneault reports that a Somix technician set up the software and trained CMP's staff in just four days.

Logalot, which CMP bought as part of a larger network management package, can also be purchased on its own for $695. It monitors system logs and Microsoft event logs for items such as failed applications or services, DHCP lease failures, printers out of paper or the IP address of a device trying to attack the firewall.

It brings the log items from all the monitored devices or services into a central log, where the administrators can view and manage them either individually or by setting policies. Most of the log entries are routine and should be ignored, but when one comes up that needs handling, the software will alert the proper person via page, e-mail, text message, voice message or other assigned method. The program also executes scripts to do such things as automatically resetting a device or a service if that is all that is needed to bring the system back online.

This simplifies network and system management for CMP since admins no longer have to go around from box to box trying to track down the source of a problem.

"Network management software is something to help you do your primary job," Morneault said. "Running software should not become your primary job."

Network Management and Database security

Having network and systems management software in place also helps to ensure database security, as Brigham Young University - Idaho (Rexburg, Idaho) found out in January 2004 when the SQL Slammer worm hit.

The university has a 3,000-node network of Cisco 6509 and 4006 switches, Windows, OS400 and Linux servers, and Windows 2000/XP, Linux and Mac workstations. The backbone consists of gigabit Ethernet, with 10 or 100 Mb Ethernet to the desktops and some common areas served with 802.1 Ib wireless access points. Other than Microsoft Exchange, the main enterprise apps such as admissions and finances are all homegrown. But there was no network management package in place until January 2003 when it installed WebNM.

"We looked at various other network management platforms and software packages, but they have long implementation times and are very expensive," network manager Michael Rydalch said. A technician came out the week of Jan. 20-23 to install the software and configured it to monitor all the servers, core switches, Internet perimeter router and the firewall.

The timing was perfect. That weekend the SQL Slammer worm hit and infected one of the servers in BYUI's Demilitarized Zone (DMZ - used by a company that wants to host its own Internet services without sacrificing unauthorized access to its private network. The DMZ sits between the Internet and an internal network's line of defense).

The university's response center alerted the network operations analyst on call that weekend that there were some problems, but it didn't know the source. By checking the graphs on WebNM, the BYU-I staff tracked it down to high utilization on one of the perimeter routers.

At that point, the server administrator came in and, after checking the e-mails and alarms, logged onto CNN and found out what was known about the worm at that time. Because it was hitting the databases, they used the ' network management tools' Ostivity inventory module to locate all systems that were running SQL Server 7.0/2000 or the Microsoft Data Engine (MSDE). They shut down the TCP and UDP ports at the perimeter and cleaned up all the servers that had been infected.

"One of our SQL DBAs said it would normally have taken us a day or two to solve the problem," Rydalch said. "With WebNM it took us about two hours."

Andrew Chalmers isa free-lance writer. He can be contacted atenterprisenetwofrksandservers.com.