NAPA to the Rescue
Robyn PetersonThe only good web page, is a fast web page. With that assumption in mind, we can move on...
The most common complaint in any Webmaster's mailbox has got to be, "why is the site so &#@#% slow?" The truth is, most websites out there are bandwidth hogs. At ExtremeTech, we'll admit it--we're not guilt free in this area either. We all have something to learn.
As Web developers, we're all working so fast and trying so hard to be innovative, we're missing easy opportunities to optimize our sites for faster performance. On our high-speed connections at the office, we may not notice that we have a problem. But any user who dials in with a 56k modem (or slower) will bluntly inform us that we need to sign up our web site for a digital Weight Watchers program.
Prescribing a technical cure-all for slow websites is about as difficult as prescribing the right flu vaccine--it works in some cases, but in others, you're just out of luck. Design is always an issue. It's tough to balance a good design with fast performance--there are always trade-offs. However, if you truly want to optimize your Web site for long run, you'll need to look at not only your design, but how your design and code are delivered to your end user. (Note that we covered many of these design issues in our Web Accessibility story written by John C. Fish.)
On the back-end, if our servers aren't maximizing the available bandwidth or are averaging higher than normal latency, it truly will not matter how light your page is--your servers will deliver it at a snail's pace.
In this article, we'll look at the front-end and the back-end of your website keying in on different ways you can optimize both of these areas. In so doing, we'll add a new open source application from Intel to our toolbox called NAPA (Network Applications Performance Analyzer), which we'll use to pry open the virtual black box that is your web site (and the associated network usage).
The bottom line: to fix a problem, you need to understand the problem. Dig in to find out how to use NAPA to realize the potential of your web site--and then use our tips to acquire the speed you need to survive.
NOTE: Even if you're not a Web Developer or a System Administrators, NAPA is a blast to play around with. So, stay tuned in and get clued in on how to use NAPA solely for informational purposes--you may learn something. (Hint: NAPA might help you decide which browser is the best/fastest for you by analyzing your browser interaction with the sites you visit most frequently.)
Also note that the current version of NAPA runs on Windows 2000 and Windows XP only.
As technology advances, so too does complexity. Digging into the guts of your server software isn't as easy as it used to be. However, as Web developers, we need to know how well our servers are dishing up our blood, sweat, and tears (i.e. our web site content). As a system architect or sys admin, we need to know what kind of latency we are seeing in connections and we need to know how to measure IP and TCP flow, and bandwidth utilization. Hey, the more we know, the better our sites become.
Lesson 1: the speed at which a web page is delivered to your audience is not a constant, and nine times out of ten, it's not optimal.
Although, that's an obvious point, it needs to be made. It's the fact that becomes our starting point.
The speed of delivery can fluctuate due to a variety of conditions beyond our control such as ISP latency, connection speed of user, broadcast "noise", etc. In these cases, our hands are virtually tied. However, we do have control over how our pages are designed, and in most cases, how our servers dish them up.
We still haven't answered the most amorphous question of all, how do you analyze your web site? Well, there are quite a few firms out there that will analyze your web site for you, but you'll need big bucks. On the other hand, there are quite a few shareware tools that will open windows on distinct pieces of your puzzle, but it's difficult and troublesome to build your toolbox this way. Enter NAPA...
We recently encountered Intel's new NAPA tool at the Intel Developer Forum last week, and we were so impressed with its functionality, we wanted to share its capabilities with you. (Kudos to Intel Labs developers Jennie Yoder and Jim Chu, Network Performance Manager Frank Hady, and others who may have been involved). NAPA is a free app and open source-- you can download the NAPA executable and source code.
NAPA opens portals into the inner workings of your network. With NAPA, you can graphically see how a various elements of a Web page are sent over time, as well as how it's received. Both of these aspects will help paint a picture of how your web site works--and most importantly, how you can improve it. You can also see differences in how different browsers work with your website.
NAPA is an open source application (that runs on Windows 2000 and Windows XP currently) written primarily in C++, and it allows you to garner all sorts of statistics about the network and its delivery of data. Specifically, the application allows a developer to monitor network usage as well as identify inefficiencies. But what does that really mean? The answer depends on what your role is at your Web site. We'll delve into that a little later. NAPA can run on any machine that's running Windows 2000 or Windows XP as the operating system, and can run at any connection speed. The driver needed by NAPA, however, has to hook up to a NIC Miniport. In other words, NAPA will need a broadband connection with an ethernet card.
Since NAPA is open source, you have the choice of either compiling it yourself, or using Intel's pre-compiled version. For this article, I used the pre-compiled version. If you're just starting out in this arena, I recommend following my lead.
NAPA is currently only in version 0.55, so it's brand new. Keep in mind that there's probably a few bugs floating around beneath its exterior.
Before we can work any magic, we need to get NAPA up and running. The installation sequence is fairly straightforward and the README file that accompanies the NAPA download is very complete. So, I won't go through every installation step, but rather I'll highlight the important milestones.
If you're familiar with installing protocols such as TCP/IP, then installing NAPA will be a snap. First off, NAPA requires that you install a packet sniffing driver called "Peakaboo". The driver is installed just as if it were any other network connection component.
Once the Peakaboo driver is installed, the rest of the methods NAPA uses will have access to the data required to run the analyses. In other words, NAPA's ready to roll.
From this point, all you need to do now is launch Napa.exe. Then, start downloading a web page and the stats will roll in.
Once you launch NAPA, the user interface looks like a standard windows control panel. Each of NAPA's tabs contains different slices of the data puzzle. Take a look at Figure 3 below to get an idea of the visual design.
The General Tab From here you can decide just what your data looks like. When deciding what "mode" to use consider the following, when using "All Local mode," NAPA looks at all packets sent and received by the local machine. On the other side of the spectrum, In "Directed mode," NAPA will only look at packets received by the local machine, so it doesn't keep track of data that was sent.
To the right of the mode selection, you'll see NIC Adapter selection. Select the appropriate NIC Adapter from this menu.
Directly below the NIC Adapter selection field, you'll find a drop-down menu for Sample Period.
Below the NIC Adapter menu, you'll find the Sample Period selection. Sample Period controls how often the GUI is updated. Intel states that the thread used to sample the data is pretty lightweight, so they recommend that you leave the sample period at one second. That's what we did.
Once you're comfortable, feel free to set up your alert threshold. This function would allow you to keep NAPA running in the background and only alert you once the network has violated one of your threshold requirements.
When actively testing your network, I find it easiest to use the Always on Top feature (check box on left side). That way, you'll be able to watch the testing progress while clicking away on your network applications.
As NAPA is an open source application, you can download the code directly from Intel.
If you're comfortable with C++ and you feel like browsing the files (I did) you'll run across some very interesting code. Each of the files has been descriptively named, so mapping a function to a specific file shouldn't be a problem (i.e. GraphSeries.cpp, ServerTimeline.cpp, ReadPackets.cpp).
For the most part the code is well commented. However, there are a few files which have very minimal (if any) comments. From one file to the next, you could be on either side of the spectrum. Even if you're not a code junky, but you do feel like having a good read--some of the files read more like a Grisham novel than an Intel application. Here's a few selected and humorous tidbits from ParseHTTP.cpp:
// how are we going to handle when HTTP packets are in the middle of a packet? // what if they go ACROSS packets? Eeeek!
Followed later by:
// this should be the right packet! I hope! Ack!
In case the suspense is killing you, don't worry--the coder figured it out a few lines below.
// wow, now the packet should be completely parsed. Cool!
According to Intel, the original build environment for this code was Microsoft Visual Studio 6.0. Prior to building the application, you'll need to acquire the following additional libraries and header files from their original sources: WINUSER.H – Part of Visual Studio or the Platform SDK tvout.h – Part of Visual Studio or the Platform SDK basetsd.h – Part of Visual Studio or the Platform SDK IPHlpApi.Lib – Part of Visual Studio or the Platform SDK IPExport.h – Part of Visual Studio or the Platform SDK IPHlpApi.h – Part of Visual Studio or the Platform SDK IPTypes.h – Part of Visual Studio or the Platform SDK ntddndis.h – Part of Win2K DDK example devioctl.h – Part of Win2K DDK example zlib.lib - http://www.gzip.org/ zlib.h - http://www.gzip.org/ zconf.h - http://www.gzip.org/
For Web Developers, NAPA gives insight on how a web page is delivered. Primarily, what we're talking about here is a breakdown of HTTP data. NAPA not only shows a breakdown of data types sent over HTTP (i.e. % of images, text, etc.) but can also give you a breakdown of how the current design will affect download.
It's as Easy As Pie The main tab that a web developer will be using is the HTTP tab shown below.
The pie chart breaks down the data types sent during the download. With the percentage breakdown, you can see how your design and page structure play out in the transmission of HTTP data.
Note: If you're just having fun, we suggest you set NAPA to be "Always on Top", keep your browser active directly behind--then troll the web hitting your favorite sites. NAPA can clue you in on exact HTTP data type breakdowns of any Website.
In this area, there are some big gains that can come from small actions. Start by grabbing the low hanging fruit. Here's a couple for the easy-pickins: Don't overuse small images--Just because they're light in terms of file size doesn't mean they're harmless. Each small image carries with it around 800 bytes of HTTP Header information. That's a fair amount of overhead. But, we're not done there. The browser has to request each image and, potentially, it could incur a wait time from the server. Make it easy on yourself, don't round every corner of every table. Just use the bare minimum you need to get the gist of the design across. Should you use text compression?--Text compression is a mixed bag--sometimes it's good, sometimes it's bad. Every time you analyze a page, NAPA will tell you how much waste was produced by not compressing the text (keep in mind HTTP Headers are not compressed). As you will soon learn, NAPA makes some significant claims here. To see what I mean, look at the red, checked area on the pie chart in Figure 4 on the previous page (uncompressed "plain text" is simply red). NAPA claims that 60% of the data bytes are wasted due to not compressing text. That seems like quite a high ceiling to me, but you'll have to make up your own mind.
On Text Compression When I perused a few sites on the Internet, I rarely found one that used text compression. To be complete, I even looked at the homepages of Intel and Microsoft--neither of them used text compression. Now, on the far end of the spectrum, 100% of the text that Slashdot.org was transmitting was compressed text (NAPA reported 0.0% wasted by not compressing text). Somebody needs to shake their hand on this one.
Text compression can have unusual results. You will certainly want to test this feature, if possible, prior to full-scale rollout. As always, avoid making any changes during prime hours. And be aware that text compression will put some level of additional load on your servers. Maybe one of Intel's goals with NAPA is to get you to upgrade your server CPUs – Nah :)
NOTE: Text compression is done by a switch on the server. Once the switch is flipped, it's flipped for all websites delivered by that server. (For a step-by-step walk through check out our recent article on Power-Tuning IIS 5.0).
In a nutshell, NAPA opens a window directly into your network traffic. With NAPA you can monitor TCP and IP Flow, keep a close watch on HTTP data types and download timelines, as well as determine general statistics for traffic (such as packets/sec and average latency). The beauty of NAPA is that the application will visually display the timeline of browser requests, opening and closing of sockets, as well as the browser's corresponding use of bandwidth.
Graphing Your Way to a Better Life The HTTP2 tab will be of most interest to you.. From this view you can graph the timeline of HTTP data transmission. This allows you to visually see the sockets opening and closing, Get requests and fulfillment, and browser bandwidth (see figure 5 below).
Once you've mapped the timeline for a network connection, you'll need to get a handle on the data that's being represented. Look below at Figure 6 for an informative legend generated in a PowerPoint document we received from Intel Developer Forum.
Once you have this data in hand, optimization of the back end becomes a much easier prospect (although nothing's truly easy in this realm). The key to implementing changes on the server side is moving slowly, and keeping an eye out for side effects.
Don't Be So Wasteful Now that you've got some good information about the server, it's time to optimize. Here are a few general tips: Keep HTTP Headers small--What's the use of all that extra information? Minimize the number of redirects--Delays, delays, delays... Try to avoid using redirects if possible. See figure 7 below for a visual example of how a redirect can cause a three second delay. Know your server--You're the only one who knows how to balance the needs of your business with the needs of the developers. Every method of optimization for a server has its pros and cons. You'll be the only one who's able to judge what to implement and how to implement it. Keep an eye on the browser--NAPA also provides a bar graph detailing the bandwidth used by the browser. If you're truly a techie, try to streamline your servers to maximize browser efficiency.
How Smart is a Browser? If you're an end user or a developer and you're interested in finding out more about browsers, keep a close eye on the graph created on the HTTP2 tab.
You'll want to focus on the purple bar graph near the bottom of the HTTP2 graph--it's a visual display of the bandwidth used by the browser. A good browser should not sit idle when you want to download data--it should make full use of the bandwidth it has at its disposal. There are a couple of points to keep in mind when evaluating your browser or developing your application. Look for sockets sitting idle--If sockets are idle, then the browser is not making full use of the bandwidth. If a transaction is complete, check to see if the socket is being efficiently re-used. Look for unused sockets--Hey, why let any socket go unused? Use every socket that you can--work in parallel--too many serialized requested will slow everything down. I don't know about you, but if there's an open lane on the highway, I'll use it.
As an end user, you may not be able to re-engineer your browser, but you certainly can make a logical and informed decision about which browser to use.
Every web site (servers & pages) out there can be tweaked to give better performance. Whether you're a Web developer, system administrator, system architect, network developer, or are the jack of all trades, Napa can help you open a portal into the heart of your network usage as well as answer the underlying question, what needs tweaking and how do I tweak it?
As with any analysis application (especially one that's only at version 0.55), take each stat with a grain of salt. If you can, double-check any numbers with another application. If you make any modifications that stem from the data NAPA provides, be sure to run NAPA again and see how the change affects performance.
If you're an expert on your site and an expert in C++, start playing with NAPA's code. Make some modifications so that it can better measure what you need it to measure. Hey, it's open source--make the most out of the opportunities you've got.
All in all, just remember the best looking Web pages may not download the fastest--but the pages that users continually revisit do... and loyalty is the name of the game in this market.
Copyright © 2004 Ziff Davis Media Inc. All Rights Reserved. Originally appearing in ExtremeTech.