Log management is the missing security performance ingredient
Robb, DrewMaintaining performance and security of IT assets requires a three-pronged approach. Two of them are broadly in use.
The first is up or down status of equipment, services and network connections. End users are one crude way of alerting in this area, but their diagnostic input is generally limited to "my computer doesn't work" and IT staff generally like to know about a problem ahead of time so they can get onto fixing it before the calls start flooding into the help desk. So they use network mapping and alerting software to track status and alert them to which element goes down.
The next step up is performance trending software which alerts them to slows and congestion, not just downtime, and guides them in configuring systems for better performance. It also, often, gives staff a prediction, on when there will be a problem in the future so equipment can be replaced or capacity upgraded before it starts affecting user service levels.
But, while each of these is essential for managing operations, there is a third source of data that is often overlooked. Not because it isn't valuable, but because it has been too cumbersome and time consuming to access and analyze until recently. This is the Syslogs on Unix boxes and Windows Event Logs on Windows machines. These provide a wealth of data, but the problem is searching out the critical events from all the routine activities. This isn't limited to servers, but also switches, routers, firewalls, intrusion detection systems and other devices.
"A client showed me a two-inch stack of paper and said 'this is last week's firewall log, can you tell me what some of this stuff means?'" says Hanover, N.H. security consultant Robert Hillery. "You need to get software that will winnow this down to a usable size for human consumption."
What the logs show
Each device or service typically keeps its own log. What exactly is being logged varies by the type of device or service and the vendor. Just taking a look at it from a security standpoint, some of the data contained in these logs includes:
* Application failures caused by viruses.
* Backup failures - In most cases this is not necessarily a security breach, but just files that were in use at the time of the backup. Nevertheless, administrators should be alerted to any major backup failures that would make it impossible to restore files after an attack.
* Password hacking;
* Log in authentication failures on a router, server or switch. A single failure is simply user error; but repeated errors indicate someone trying to break in;
* Trojan horse attacks;
* Stealth and port scans on firewalls;
* Security policy changes - Is it an authorized change? Is it a worm that shuts off the firewall or antivirus software? Is it a disgruntled employee creating a backdoor?
* Users accessing the Internet without antivirus protection or accessing unauthorized sites;
* Server restarts and shutdowns - Is it an application error or a virus?
* Denial of Service attacks; and
* Excessive errors on a switch or router.
Although these logs should be checked regularly and any errors corrected before they build up into a major problem. But, in practice they are usually only used reactively. It is simply too time consuming to manually log onto each device or application and try to decipher its logs. When there is a problem, the administrators log onto the suspect device or application and make their way through the log entries to try and discover which contains they key to resolving the difficulty.
"The big problem is finding the one message you need in a sea of messages," says Anthony Adams, Chairman and CEO of Egation Communications Inc., an ISP headquartered in Fremont, Calif. "Every day we get two or three thousand Syslog messages from our switches, routers and firewalls, but only two or three of them are important to catch."
Tracking down the culprit
To filter out and bring the essential log entries to the attention of the appropriate administrators, vendors have created event log management (ELM) software.
"These were originally developed to address the problem people were having with intrusion detection system software which create huge amounts of data and are very difficult to get useful information out of," says Gartner Inc. analyst Ant Allan. "They needed to separate out the genuine malicious events or suspicious events that needed manual follow up."
ELM software extracts the log entries from all the devices in the network and aggregates them into a common database for storage, analysis and management. The administrators can then use the ELM to establish rules for those log entries. As Adams explains, it is only a few of the thousands of messages that are important, so the administrator would direct the ELM to simply archive them. But for those messages which do need follow up, the ELM will notify the appropriate staff.
In addition to security requirements, regulatory compliance is also driving companies to adopt ELMs.
"There is now an emphasis on capturing the logs and keeping them for the time period specified in the applicable regulation," says Allan. "But we should also be actively reviewing those logs to see if there is any suspicious behavior, and the last few years the tools that allow organizations to do that have been improving."
It was compliance with the Health Insurance Portability and Accountability Act (HIPAA) that led Occupational Health Research (OHR) to begin using log management software. Headquartered in Skowhegan, Maine, OHR created the first desktop software specifically designed for employee health providers. The software, SYSTOC 7, helps 800 hospitals, occupational health clinics and employee health programs with their patient scheduling, financial management, forms and reports generation, medical records management and electronic claims submission. Most of its customers purchase the software to run locally, however OHR also acts as an ASP for the Ohio Employee Health Partnership, a managed care organization serving 8400 employers.
OHR uses two products, Snort and Logalot, to help keep its systems secure. Snort is an open source network intrusion detection system (IDS) originally written by Martin Roesch that OHR runs on one of its Linux servers. It is available for free download from snort.org and monitors both Unix/Linux and Windows systems.
The other product, Logalot from Somix Technologies Inc. (Sanford, Maine) is and ELM which can be used for systems management and security. It aggregates items contained in all the Simple Network Management Protocol (SNMP) alerts, Syslogs and Windows Event Logs and stores them in a mySQL database. The administrator has a console to review these entries, set thresholds and policies for what actions to take when the thresholds are crossed - page, phone, e-mail or text message the administrator responsible for that particular device or system. Administrators can also write scripts for routine actions such as rebooting a hung machine.
OHR purchased Logalot to ensure that both the software it supplies its clients as well as its internal systems meet HIPAA's standards covering data transaction formats and security. Those standards mandate that healthcare providers perform an information system activity review, including reviewing the logs to see who is accessing the network. The problem is that the logs are scattered among different network devices.
"Trying to bring all the logs together in one spot was my main concern," says Barry Gray, OHR's director of information services. "I use it for tracking IP address of anyone who accesses the network and the firewall logs come into it as well."
Although HIPAA compliance was the initial motivation for installing Logalot, Gray says that now that he has it in place, he wants to also use it to track down and address trouble spots in the network. Beyond simple archiving logs, Logalot has a graphical tool for trending the frequency of similar messages over time.
"I think a lot of network management people could use a way of pulling all this stuff together," he says. "Logalot seems to do a great job solving that issue."
Egation's Adams used the software for that same purpose. He reports that having an ELM in place has helped him to improve response time to clients. When Elation began hosting games such as Counterstrike and Duke Nukkem, it started seeing some performance degradation which it traced to an excessive number of sessions on the firewalls.
Using the ELM rules, Adams could see that some attacks were hitting the firewalls, causing the slowdown. Using this information Elation then tuned its servers and firewalls to ensure it was giving maximum bandwidth to paying customers and selectively blocking the attackers.
"Using an ELM allowed us to really fine tune the equipment and be very surgical in our problem resolution," says Adams. "As a result of the improved service, the company moved up into the top ten out of more then four thousand servers hosting these games."
Drew Robb is a free-lance writer who can be contacted at enterprisenetworksandservers.com.
Copyright Publications & Communications, Inc. Jun 2005
Provided by ProQuest Information and Learning Company. All rights Reserved