文章基本信息

标题：Design challenges for distributed LAN analysis - local area networks, HP ProbeView software, HP LanProbe monitors - includes related article on poor network partitioning - Technical
作者：William W. Crandall
期刊名称：Hewlett-Packard Journal
印刷版ISSN：0018-1153
出版年度：1992
卷号：Feb 1992
出版社：Hewlett-Packard Co.

Design challenges for distributed LAN analysis - local area networks, HP ProbeView software, HP LanProbe monitors - includes related article on poor network partitioning - Technical

William W. Crandall

[NOTE: SOME FORMULA'S HAVE BEEN OMITTED]

The design of a distributed local area network management system is primarily a problem of data reduction, data transmission, and data presentation. HP ProbeView software and LanProbe monitors continuously monitor the health of an Ethernet or IEEE 802.3 network to allow the diagnosis of complicated problems without dispatched equipment.

Intermittent faults that cripple Ethernet local area networks (LANS) often cannot be detected by traditional network problem-solving techniques. By the time a traditional tool like a cable tester or protocol analyzer is rolled out to monitor the network, the problem has vanished. Users suffer from unreliable LANs while network managers suffer from the wrath of angry users for problems the managers are unable to diagnose.

One solution is distributed LAN analysis. A distributed LAN analysis tool taps into a network segment and continuously monitors it, tracking use, logging important events, detecting errors, and raising alarms over major problems. To minimize this tool's interaction with the network, it is independent of other nodes on the network. Using a network management console, network managers can retrieve information gathered by the distributed LAN analysis monitors that have been placed throughout the network. Because the monitors provide a history of network activity, the sources of transient faults can be discovered even after the network has returned to good health. The data gathered by the monitors can be analyzed to plan for the future of the network. Designing a good distributed LAN monitoring system is difficult. With several megabytes of data flowing across a typical Ethernet segment every minute, keeping track of all of the data that appears on the network would be impossible. Thus the distributed monitor must be able to filter and store only important information and send it quickly and efficiently to the network management console. The console must be able to combine the reports of many distributed monitors into an expansive view of the network's health. Only then will the network manager be able to solve specific problems and plan for the network's future growth.

To demonstrate the utility of distributed LAN analysis, we will describe how a typical customer of Hewlett-Packard's 4990S LanProbe distributed LAN analysis system might use the system to find intermittent network faults. We will then discuss the challenges that were faced in designing the LanProbe system.

Case Study: Acme, Inc.

We begin our study of distributed IAN analysis by looking at a typical large Ethernet-based network run by a hypothetical firm: Acme, Inc. Acme's networking problems are a conglomeration of the problems seen on the networks of HPs medium-size and large customers. As shown in Fig. 1, Acme's network is made up of a variety of cabling technologies (thick and thin coaxial cable, twisted pair, and fiber optic cable) that are tied together by different interconnect devices (repeaters, bridges, routers, and gateways). The network is spread around the world in engineering, marketing, manufacturing, and sales offices.

The Acme network is managed with tools like protocol analyzers, cable test equipment, and local diagnostic and monitoring software. Most of these tools are reactive, that is, they are not used until someone calls in with a complaint. Only then, for example, is the protocol analyzer wheeled out of the closet, connected to a network segment in the next building, and the problem found and fixed. Uncovering the source of many problems requires physically breaking the network so that the strategy of "divide and conquer" can be used to pinpoint the exact location of the physical fault or errant node. This type of network management is costly. Users suffer the expense of frequent and unpredictable network downtime. The company pays the price of having the network management group dedicated solely to debugging the network rather than planning for its growth and improving its efficiency.

Acme decided to place an HP 4991A LanProbe on every segment in the network. The LanProbe is a passive monitor that listens to all of the traffic flowing past it on the network. It collects and analyzes this data and sends it to a central management console running the HP ProbeView software. HP 4990A ProbeView is a PC-based, Microsoft Windows 3.0 application that provides a number of tools for examining the data that ProbeView gets from LanProbe. These tools include network and segment maps, statistics, packet trace, event log, cable test, and alert manager. Each of these tools provides useful information. The network map is a logical picture of the layout of the Ethernet segments, repeaters, bridges, routers, and gateways that make up the network. Each segment on the network map can be examined in detail by opening the segment map. This window shows the name and type of each node attached to that segment.

The statistics tool is made up of four charts that provide different views of traffic on the network over time. The QuickView (Fig. 2), trends (Fig. 3), and packet size distribution charts show what has been happening on the segment as a whole while the node traffic chart Fig. 4) breaks down the traffic on a node-by-node basis.

The network manager can capture and analyze raw data flowing across the network with the packet trace tool. Important events and changes on a segment are recorded in the event log tool. The network manager can test the integrity of a coaxial cable with the cable test tool; errors like cable breaks and shorts are quickly found. Finally, the alert manager lists high-priority alert messages sent to ProbeView by LanProbes scattered throughout the network.

Each of these tools is most useful when used interactively and in combination with the other tools, as we will see below.

Finally, an HP 4992A NodeLocator was added to each coaxial segment. Working together, the NodeLocator and LanProbe accurately determine the locations of nodes on coaxial cables. This feature allows ProbeView's segment map to show nodes exactly where they are physically attached to the segment. This information lets the network manager quickly find the locations of bad nodes and physical network faults like cable breaks and shorts.

Finding an Intermittent Problem

While installing LanProbe and ProbeView, the Acme network manager spent time working with the QuickView, part of the statistics tool (see Fig. 2.). QuickView is a "network dashboard" which shows the current, average, and peak values for packets sent on the network, bytes on the network, broadcasts, errors, and so on. The network manager can set thresholds for each category. For example, the manager can tell LanProbe to alert ProbeView if the number of errors on the network is more than 20 in a one-second interval. An icon will ring and flash on the PC's screen, calling the network manager's attention to the event. With threshold alerts, problems come to the manager's attention immediately, as soon as they happen; the manager does not have to go out and pin them down. This is crucial if the network problems are transient.

Soon after the Acme network manager set the thresholds, an alert appeared. Opening the alert manager, the network manager found that the error threshold on the Yellow Segment had been exceeded. The manager connected to the offending segment and looked at the trends window (another part of the statistics tool) to see network activity over the past few minutes. As Fig. 3 shows, the error rate had already dropped back to its normal level. Switching to the daily trends window, the manager saw a disturbing trend: the error level jumped sharply in even intervals throughout the day. Anticipating that the problem would soon reappear, the manager configured node traffic (yet another component of the statistics tool) to gather data for the next several hours and then went out for a long lunch.

The node traffic data identified the source of the errors. As shown in Fig. 4, node traffic provides network statistics broken down on a node-by-node basis: packets sent and received, bytes, broadcasts, multicasts, and errors sent, and utilization. The manager saw that node FFFFFFFFFFFF was generating almost all of the errors. Looking in the ProbeView log, the manager read the log entry "Bad source address FFFFFF-FFFFFF seen on segment" and realized that this Ethernet address is valid only as a broadcast destination address. Suspecting that a node might have been misconfigured with the bad source address, the manager set up the trace tool to capture packets sent from address FFFFFF-FFFFFF, then started the trace and went to a network planning meeting. By the end of the meeting, trace had found many packets whose source address was the invalid address. As presented in Mg. 5, all of the bits in these packets were set to one, the CRC error-checking value was wrong, and the length of the packets was below the minimum length for an Ethernet packet (a runt). With this new information, the Acme manager now suspected a cable problem or a piece of faulty hardware connected to the net. Since the problems were occurring on a coaxial segment, the manager invoked the cable test tool. A one-shot test uncovered no problems, so the manager set the cable test to run continuously, hoping to catch a transient fault. LanProbe's cable test is nondisruptive, unlike other TDR cable tests. The LanProbe can accurately find the location and nature of the coaxial cable problem and reports this information in the ProbeView log. The log will include a message like "Weak short 153 feet away from LanProbe" or "Cable open near LanProbe. (Connected? Missing terminator?)." When used with the NodeLocator, which accurately maps the physical locations of nodes on coaxial segments onto ProbeView's segment map, ProbeView can show exactly where the fault lies, such as "between Mike's workstation and the LaserJet III print server" (see Fig. 6). Traditional cable test techniques leave it up to the user to segment the cable manually to try to figure out where the problem is. It is a time-consuming, tedious procedure that can force network administrators to crawl into the rat's nest of wires stuffed under the floors and up the columns of modem office buildings.

LanProbe and ProbeView do away with the mess and provide quick answers.

Unfortunately, while the trends screen continued to show surges in error rates, the cable test continued to report no errors. To see if the problems experienced by the Yellow Segment were appearing on other parts of the network, the Acme manager connected to LanProbes on other segments that were attached to the Yellow Segment by a bridge. The manager used the same tools (trends, dally trends, node traffic, trace, and cable test) but did not see the same errors. Suspecting that the bridge might be corrupting packets that it was passing between segments, the manager replaced the bridge and the problem went away. The bridge manufacturer identified a faulty board as the source of the problems. What the Scenario Shows This scenario shows many things about distributed LAN analysis tools. The Acme network manager was able to diagnose a complicated problem easily without leaving the office. Solving the problem did not require dispatched equipment; in fact, without distributed equipment in place to monitor the health of the network continuously, the problem may not have been discovered until it had escalated into a serious, disruptive problem. The value of an early warning system is clear.

Furthermore, since ProbeView uses a graphical user interface, many tools can be placed side-by-side on the central console. Thus, different views of the network can be compared simultaneously, permitting faster problem solving as correlations become readily apparent.

Design Issues

The Acme scenario demonstrates how powerful distributed LAN analysis systems can be. It also should be clear that they require careful design. On a typical Ethernet, several megabytes of data can flow across the wire every minute. Unlike a dispatched protocol analyzer, the user's console (ProbeView) is detached from a segment monitor (LanProbe) so the design is constrained by the speed of the link between the console and the monitor. It might seem that the best way to connect those two devices would be over the Ethernet itself. But, because a network manager will most want to connect to the segment monitor when the network is down, the console must be able to reach the monitor via an out-of-band serial line or modem connection during those times. Thus, while the network manager may choose to connect the console to the monitor over a fast 10-Mbit/s Ethernet connection or over a slow 1200-baud modem connection, depending on the network's topology and state, the product design must always assume the worst: a 1200-baud link.

Furthermore, since the console and monitor are not always connected and exchanging data, a distributed LAN analysis monitor must be able to collect data efficiently for long periods without talking to the console. The monitor must be able to alert the console of major problems such as physical network faults and excessively high utilization. The console must be usable without being connected to the monitor.

Thus, the design of a distributed network management system is primarily a problem of data reduction, transmission, and presentation. The network monitor must be able to digest large volumes of data from the network quickly and store only that which is important (data reduction). The monitor must be able to upload stored data to the console compactly and efficiently (data transmission). Finally, the console must be able to uncompress the data to present an expansive and useful view of the network to the user (data presentation). These design issues are illustrated in Fig. 7.

Cost is also a major design factor because the product is distributed and permanent. A network manager can afford a protocol analyzer like the HP 4972A LAN analyzer or HP 4981A network advisor because only a few are needed for even the largest of networks. A protocol analyzer can be rolled to the scene of a problem, while distributed monitors are most useful when they are in place ahead of time. To keep the price down to a point where customers will be able to afford a monitor on most or all of the tens or hundreds of Ethernet segments that one might find in a medium to large network, the design must be constrained by the need to keep the cost of implementation and manufacture low.

Data Reduction in the LanProbe

Consider the problem of node table management on the LanProbe. The LanProbe runs a multitasking, dual-processor, real-time operating system. As packets are pulled off the Ethernet, they are passed to various tasks, which generally correspond to the visible tasks on ProbeView (see Fig. 8).

One of these tasks is the node table manager. This task checks the source address of each packet to see if that node's address already exists in its node table. If not, it adds it. Since Ethernet addresses are six bytes long, there are potentially about 2.8 x 1014 Ethernet addresses, many more than one could fit into a node table. Thus, the number of nodes that the LanProbe can know about must be a subset of that total. Even if the node table is very large, it still can fill up as stray or invalid source addresses are sent across the network and as network equipment is added and moved. Therefore the monitor needs a scheme that periodically thins out the node table to keep it from overflowing.

LanProbe's solution is provided by node aging. The network manager sets a time limit (from 3 minutes to 30 days). If LanProbe does not see a packet from a node within that time period, the node is marked for deletion from the node table. By setting a long time limit, the manager makes sure that outdated nodes are pruned from the network over time. By setting a short time limit, the manager can see if important nodes like a file server have stopped transmitting.

Every 30 minutes, LanProbe also marks for deletion any "stray nodes" in the table. Stray nodes are nodes from which LanProbe has seen less than four packets. This generally means that the source address was part of a garbage or stray packet. When the table is nearly full, the marked nodes will be deleted by a background garbage collection process. If the network is extremely big, the node table will lock when full and no new nodes will be added. Balancing HP's experience with large networks with LanProbe's memory constraints, the size of LanProbe's node table was set at about 2,300 nodes.

LanProbe's node table management is a good example of data reduction in a distributed network monitor. It shows that the monitor needs to know what is important and how to store it neatly. The monitor must have good strategies for dealing with information overload. Furthermore, it must be able to identify what is most and least important so that it can know what can be thrown away safely.

But perfect data reduction is impossible. For example, instead of locking the node table when it fills, it might be better on some networks to toss out some old entries to make room for new ones. The choice is subjective. As LanProbe has been installed at more customer sites and on more varied network topologies, HP has changed some of its data reduction algorithms to fit a wider class of networks.

In an early implementation of the LanProbe firmware, when the node table filled, if the nodal aging scheme did not prune enough nodes from the table, the whole table would be flushed and rebuilt based on data seen later on the network. Since Ethernet segments on well-planned networks generally contain traffic from less than 2,000 nodes, it was difficult for HP engineers to test this fill and flush" node table management algorithm on live networks. But one early customer had a very large network that was poorly partitioned. Thus LanProbe saw packets from more than 4,000 different source addresses. When the node table filled, the table was flushed, only to be rapidly refilled and flushed again. The LanProbe wasted time building up and tearing down the node table and ProbeView was consumed trying to keep up with the volume of changes on the LanProbe.

Based on this customer's experience, new algorithms were explored. One alternative was to keep track of the time at which each node had last sent a packet on the network. When the node table was nearly full, the LanProbe would delete nodes in the reverse order of their last transmission: the nodes that had not sent data for the longest period of time would be deleted first to make room for newer nodes. This algorithm is similar to the least recently used (LRU) algorithm used by many operating system page table managers.

But there were drawbacks to this approach. Keeping track of the time at which each node last transmitted took about 8K bytes of valuable LanProbe memory. Storing and sorting this data and purging nodes from the table was time-consuming. Furthermore, as in the fill and flush" approach, if data from the nodes that were purged did reappear on the network, log entries indicating that a "new" node had been seen on the network would be generated. ProbeView would recognize these "new" nodes as ones that the LanProbe had seen before and would ignore them. But this meant that ProbeView and LanProbe would still be bogged down exchanging information that did not help the network manager.

Another alternative was to increase the size of the node table. While this change does not improve the "fill and flush" algorithm, it reduces the chance of the node table's filling in the first place. Unfortunately, cost constraints on the LanProbe kept this change from being implemented. Therefore, the alternative of locking the node table when it fills was implemented. The great disadvantage to this "fill and lock" approach is that new nodes transmitting on the network will not be tracked by the LanProbe after the node table has filled. But it avoids the memory and performance problems of the other options and keeps the LanProbe from being overwhelmed by processing "new" nodes and then communicating that information to ProbeView. HP's experience is that if data from more than 2,000 nodes is seen on a segment, the network should be repartitioned, using intelligent bridges and routers to replace dumb bridges and repeaters. Given this opinion, this node table management algorithm is a good one.

Data Transmission

As stated earlier, LanProbe and ProbeView can communicate on either a fast network connection or a slow serial or modem link. Having the out-of-band serial or modem connection is vital. Without it, the network manager cannot connect to the distributed monitors when the network is down. However, having a slow-speed alternative means that the distributed LAN analysis system must be designed to work well regardless of the speed of the connection. This requirement holds true especially when large amounts of data are being uploaded from the LanProbe to ProbeView for immediate display to the user. One tool that needs to display its information quickly is packet trace. With the packet trace tool, the network manager can tell ProbeView to have LanProbe capture all or a filtered set of the packets being sent across the network. For example, if a user calls to complain that a diskless workstation is not booting from a server, the network manager can trace all of the packets sent to or from that workstation, decode the packets, and diagnose the problem.

LanProbe can capture up to 600 packets or 250K bytes of data. Depending on the volume of traffic on the network and the types of packets the user has chosen to capture, the LanProbe trace buffer can fill quickly (in less than a second) or slowly (over hours or days). If the buffer fills slowly, packets can be uploaded to ProbeView as they are captured. The speed of the connection between LanProbe and ProbeView is unimportant.

However, if the buffer fills quickly, the connection speed is crucial. 250K bytes of data can be sent over a typical TCP/IP Ethernet file transfer connection in less than a minute. Over a 1200-baud connection, it could take 34 minutes or longer to upload the data to ProbeView, especially if other information is being exchanged between LanProbe and ProbeView. 34 minutes is too long to wait when the network manager wants to see the packets immediately.

Thus the LanProbe system designers had to find a faster way to present the trace data to the user. The solution: as packets are captured, they are not uploaded sequentially in their entirety to ProbeView. Instead: The LanProbe first uploads the time the packet was seen, the total length of the packet, whether it was a valid packet or an error packet, and the first 138 bytes of the packet (which contain the Ethernet source and destination addresses and the packet type or length, depending on whether the packet is an Ethernet or IEEE 802.3 packet, and probably all of the protocol layer headers). ProbeView uses this information to fill in the header and the body of the "notecard" in the trace display (see Fig. 9).

When the fanprobe finishes sending the header information for each captured packet, it starts sending the data in the packets. It divides each packet further into 138-byte chunks and, should there be more data in the packet to send, uploads the second 138-byte chunk of the first packet, then of the second packet, the third packet, and so on. It then goes back to the first packet and, if there is more data from the packet to tipload, sends the third 138-byte chunk, and so on.

If new packets are captured while data chunks are being uploaded, header information for the new packets are sent to ProbeView before any more data chunks are sent. This technique works because it gives the network manager the most important information first. In trying to diagnose the diskless workstation's failure to boot, the network manager wants to know right away if the workstation is sending correctly addressed boot requests and if it is getting any responses. This information is included in the header data sent to ProbeView.

Only when those questions are answered will the network manager want to start looking through the contents of the packet. Since the most revealing data (the headers for the transport and higher layers in the network protocol stack) are at the front of the packet, most network managers will want to look at the data in the first part of each packet before looking at the data in the subsequent part of any packet. Thus uploading the data in chunks makes sense.

Note the difference in the rate at which the network manager sees the header information using a straight upload versus the LanProbe algorithm. If ten 1500-byte packets and then one 64-byte packet were captured and uploaded sequentially in full over a 1200-baud link, the network manager would have to wait for more than two minutes to see anything about the eleventh packet. With the LanProbe algorithm, the manager waits only 12 seconds. Furthermore, if the manager scrolls directly to a packet whose header has not ben updated, ProbeView will request that packet's header immediately from the LanProbe.

But there are times that the network manager would like to see the full contents of a packet immediately, rather than waiting for the upload of the entire trace buffer to complete. Accordingly, the system is designed so that if the user is in Hex/ASCII mode and scrolls through a particular notecard in the trace window, and if all of the data that the user wants to see in that packet has not been fully received yet, ProbeView will request the remaining data from the LanProbe. LanProbe will respond to the urgent request by uploading the data immediately. The trace upload algorithm was implemented because the out-of-band serial or modem connection between LanProbe and ProbeView is too slow for a simpler tipload everything in the order it was received" approach to work. The design of the algorithm was dictated by the needs of network managers: information is sent in order of importance to the manager. The algorithm is flexible enough to recognize when the manager wants to see data that has not been sent yet and to send that data light away. Providing this sort of intelligence and efficiency is key to making data transmissions between distributed LAN monitors and the network management console work well.

Data Presentation

Presentation of the data gathered by the LanProbes is handled by ProbeView. ProbeView is a collection of Microsoft Windows(9 3.0 applications (see Fig. 10). When the network manager starts ProbeView, the ProbeView data link is also started; this application provides reliable network transport connections over UDP/IP and reliable serial link connections. The data link runs in the background and is invisible to the user. The alert manager, which displays warning messages that are generated when LanProbes detect serious errors, is started when the manager selects the alert manager tool in ProbeView or when ProbeView gets an alert from a LanProbe. The applications that make up ProbeView use a message-based interface to share information.

ProbeView has a modular design. Multiple applications on the PC share the common data link transport. The transport works over different media (Ethernet, serial, and modem connections) but is transparent to the higher-level applications. The transport supports a primary connection between ProbeView and one LanProbe, but it can also support several inbound alert connections from other LanProbes. Thus the manager can get alerts sent by LanProbes throughout the network even while looking at a single segment. The separation of the alert manager from ProbeView makes it easier and faster for the user to get and manage alerts. ProbeView's modular design means that it is easier to enhance, test, and support, which leads to a more reliable and useful product.

The network manager works with the ProbeView window shown on the screen. While graphical user interfaces are easy to use, most do not have ways of neatly coupling related applications that run in different windows. ProbeView solves these problems with its "tools" model. A tool is one or more windows that provide a particular feature (for example,. the network map and segment map windows together form the map tool). When the manager selects a tool, ProbeView's menu changes to reflect the menu choices available for that tool. Some menu selections such as exit ProbeView, connect to LanProbe, help, and ProbeView options are common to all tools. The tools model is a convenient way of tying together different actions and displays; it saves memory on the PC and improves performance relative to having all of the tools run as stand-alone applications. It also means that several tools can be aligned on the screen so that information from many tools can be seen at once, permitting faster problem solving as correlations become immediately apparent.

All of the tools share a common database. This database is built from information supplied by the network manager and by the LanProbes to which ProbeView connects. Each time ProbeView connects to a LanProbe, the two systems identify changes in common palls of the database and synchronize those areas. The ability of ProbeView and LanProbe to synchronize quickly and accurately is an important part of good distributed network management. Without synchronization, the ProbeView tools might present incomplete or outdated information to the manager.

The ProbeView tools provide different views of common information in the database. For example, each node on a network is physically attached to only one segment. But if data sent by that node is passed to other segments and is heard by several LanProbes, that node will appear on the segment maps of several segments. (The LanProbes have no way of telling whether the node is connected to the local segment or if data from that node is being forwarded by a bridge, repeater, etc.) Thus ProbeView depends on the network manager to tell it where a node physically resides.

Therefore, the network manager is allowed to place a node on a segment with the map tool. Placing a node means it is physically attached to that one segment. ProbeView automatically removes that node from the other segments on which it used to be visible. By pulling this placement information out of the database, the node traffic window of the statistics tool can display which nodes have been placed on the current segment. This information is useful for identifying if heavy users of a segment are on the current segment. If heavy users are not located on the current segment, there may be a need to reconfigure the network topology. This is all part of the data presentation done by ProbeView, from a common store of data, the ProbeView tools present multiple views and interpretations.

Distributed LAN Analysis as a Planning Tool

Traditional network management tools are not good planning tools. Most Ethernet management tools are fault management tools and help with fault detection, isolation, and recovery and perhaps some performance monitoring. General network management needs to be more comprehensive to include not only fault management but planning, configuration, security, performance, and more. Distributed LAN analysis can fill one of these needs: planning. Because distributed network monitors provide historical data, it is easy to collect data over time and analyze past network performance to plan for the future.

This planning can include capacity growth, changes in network topology, and reliability improvements such as the identification and replacement of error-prone nodes, segments, and interconnects. But as always, the distributed LAN analysis system must know which data is important and how to store it compactly. One distributed network analysis product collects data in a central database. A customer reported that this database grew by four gigabytes per month! Even if enough disk space were available to store that volume of data, there would probably be too much of it to analyze well. This experience emphasizes the importance of good data reduction at the source.

ProbeView is designed to be a planning tool, as the following example illustrates. In laying out a large network, network managers use many strategies to improve network performance and reliability. For example, related nodes are clustered together-engineering nodes are located on the engineering segment, separated from the rest of the network by a router. By partitioning the network, users get faster access to local resources like servers and printers, network backbones are used less, and errors and crashes are localized. For example, if the marketing segment goes down, users on the other segments will not be affected.

Well-designed networks work fine at first, but over time, entropy sets in. For example, a new support group might move into space vacated by engineering. The marketing group might begin to run graphics-intensive demos on local workstations booted from engineering and support servers. Administrative functions might be consolidated so the volume of traffic and number of nodes on the administration segment increases sharply, along with wide area network (WAN) traffic. The benefits of partitioning the network are lost-, local performance drops, the backbone becomes clogged, and network or node downtime on one segment will take down the whole company, not just one department.

Distributed LAN analysis can come to the rescue in such situations. The network manager already uses the distributed LAN analysis system to connect to various segments by hand to diagnose existing problems and get a current view of the network. But to plan for the future, the manager needs to build up a good store of information over time and analyze it offline.

The network manager can use ProbeView's AutoPolling feature to automate this data collection. AutoPolling can be configured to connect to each segment on the network in turn each day. After uploading new log entries and synchronizing the distributed database, ProbeView gathers and exports segment map, log, trends, daily trends, and node traffic data. This new data is appended to the end of the existing export files. These export files can be read into a database manager or into a spreadsheet like microsoft Excel for further analysis. HP provides some Excel macros to solve some problems. For example, simple analysis can find configuration errors like duplicate IP addresses. More sophisticated, network-specific analysis can detect problems like:

A large volume of client/server traffic running across a backbone segment (solution: partition the network to isolate clients and servers onto the same segments). Overburdened segments and devices like the administration segment and the WAN gateway (solution: repartition segments, add new devices, or increase their capacity). Overly high levels of broadcast and multicast traffic, which can significantly slow network performance (solution: change configurations or software on nodes sending too many broadcasts or multicasts).

ProbeView provides network managers with the information they need to analyze and plan for the future of their networks. It does so in a way that cannot be matched by a dispatched protocol analyzer or performance tool that provides a short-term view of a small part of the network.

Conclusion

Distributed LAN analysis can solve network problems. By continually monitoring an Ethernet LAN and by compiling a record of network activity, the HP LanProbe distributed LAN analysis system allows network managers to solve intermittent faults and plan network growth. This system does so with distributed analysis monitors that watch network traffic, store important information and events, and compactly transmit this data to a network management console that presents a multifaceted, expansive view of network activity. LanProbe and ProbeView allow network managers to easily identify and fix network problems as well as analyze the network as an aid to planning its future.

LanProbe and ProbeView work because they have been designed to handle the problems of distributed LAN analysis: data reduction, transmission, and presentation. LanProbe continuously analyzes the volumes of data flowing across the network, capturing only the most vital parts of it. It has well-defined strategies for avoiding and handling information overflow. LanProbe compactly transmits its data to ProbeView using adaptive algorithms designed to work well over both fast and slow connections. ProbeView, taking advantage of its graphical user interface and common central database, gives the network manager many tools with which to solve network ailments and to plan for growth.

Acknowledgments

I would like to acknowledge and congratulate the whole team from Eon Systems and HP's Intelligent Networks Operation for their creativity and hard work in creating LanProbe and ProbeView. The design challenge for distributed LAN analysis is great but the success of the LanProbe system is testimony to how well this team met that challenge. In writing this article, I received help from many people. I would like to thank all of them, especially Mike Watters for his guidance, Joe Adler for his editorial criticism, and Gigi Chu for her historical perspective. Paul Taira took the excellent photographs that illustrate this piece.