« January 2008 | Main | March 2008 »

February 2008

February 28, 2008

HOWTO: Configure apache to correctly serve Zip files to Internet Explorer


One of the more irritating problems to affect my Altos Research users recently was the inability to successfully download a Zip archive file from our servers using Internet Explorer. It didn't matter what version of IE they were using (IEv6 or IEv7) or which version of Windows (XP or Vista). Every time they downloaded the Zip and attempted to open it they would see an error message informing them that the file was corrupt or invalid.


The problem did not occur when using any other web browser - Firefox, Safari or Opera. Only Internet Explorer seemed to be corrupting the Zip files that were downloaded from our servers. Of course, the fact that I have only Linux workstations here in my home office prevented me from replicating the problem, there were enough reports from reliable sources to indicate that this wasn't just a matter of a few misconfigured PCs or network proxies.


In the case of the Altos Research website, the Zip files we were serving are dynamically generated by our application. They are not just static files being served up by the web server. Each Zip file contains the customized reports for each of our customers. Because of that, my first guess was that we were doing something wrong with the generation of the Zip data or the HTTP headers that were being sent when the file was downloaded. To test this theory, I performed the following experiment:


I logged into the AR application as one of our customers using Firefox. I then downloaded the report Zip file to my local workstation. I verified that the file was not corrupt - I was able to open the Zip archive and extract the PDF files it contained. Next, I manually uploaded that file to the AR web server document root where it could be downloaded directly from Apache. I then asked several Windows/IE users to try and download/open that uploaded file.


In every case, the IE users were still unable to open the Zip file, even when it was not dynamically generated and served by our application. This told me that the problem was not being caused by the application code we use to generate the Zip archive. The problem must be due to something in our Apache server configuration.


A note about the Apache configuration: From the very start, I had configured Apache to use the mod_deflate plug-in for HTTP-level compression. This is almost always a good idea, as it decreases the bandwidth used and generally speeds up content delivery to the end-users. I knew from past problems that it was smart to restrict mod_deflate to only apply compression to certain content types (text/html, text/xml, for example) and exclude certain file types that already contain compressed data (image/jpeg, image/gif, application/pdf). There is no sense in having Apache compress already-compressed files, after all. So my initial mod_deflate configuration looked like this:

SetOutputFilter DEFLATE
DeflateFilterNote ratio
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/javascript text/css text
/plain

I had assumed, but did not verify, that this configuration would result in only files of those mime-types being compressed using HTTP GZIP compression. I was wrong.


I discovered just how wrong I was by performing the most basic test: Downloading a Zip file from Apache while monitoring the traffic using Wireshark. Much to my surprise, the packet captures of the Zip file downloading traffic indicated quite clearly that Apache was still using GZIP compression to transfer the data (abbreviated a bit):


GET /test.zip HTTP/1.1
Host: www.altosresearch.com
Accept-Encoding: gzip,deflate

HTTP/1.1 200 OK
Date: Thu, 28 Feb 2008 20:29:01 GMT
Server: Apache/2.0.48 (Fedora)
Content-Encoding: gzip
Content-Type: application/zip


What?!?! I thought that my mod_deflate configuration would force the plugin to ONLY compress output that was one of the mime-types listed in the 'AddOutputFilterByType' parameter. Apparently, that is not the case.


So I started to dig a bit more and discovered that the mod_deflate plugin will make use of an environment variable named 'no-gzip'. If this variable is set when the HTTP request is made, the mod_deflate will NOT compress the output data. There are some basic configuration examples of how to use this for static files served directly by Apache:

SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI \.pdf$ no-gzip dont-vary
SetEnvIfNoCase Request_URI \.zip$ no-gzip dont-vary

After implementing these configuration changes and reloading our Apache configuration, I was able to verify that files ending with a '.zip' suffix were not being GZIP compressed in transit:
GET /test.zip HTTP/1.1
Host: www.altosresearch.com
Accept-Encoding: gzip,deflate

HTTP/1.1 200 OK
Date: Thu, 28 Feb 2008 20:45:56 GMT
Server: Apache/2.0.48 (Fedora)
Last-Modified: Thu, 28 Feb 2008 00:58:06 GMT
Content-Type: application/zip


Notice that there is no 'Content-Encoding: gzip' header - that is the important part. By setting the 'no-gzip' variable for requests ending in '.zip', mod_deflate was prevented from compressing the output. I once again asked a few Internet Explorer users to download that same file and verify that it was not being corrupted. Success! Multiple testers, using combinations of Windows XP and Vista with IE6 and IE7 were all able to successfully download the Zip file and extract the contents.


In most cases, that would be the end of the story, but the Altos Research application makes things a bit trickier. Trickier because our application serves dynamically generated Zip archives, and the URL that our users click to download that archive looks like this instead:

http://www.altosresearch.com/altos/app?service=pdfzip

Because the 'Request_URI' portion of this URL is only '/altos/app', it cannot be used with the SetEnvIf directives to set the 'no-gzip' environment variable. Instead, I had to resort to a much more obfuscated mod_rewrite solution:
RewriteCond %{QUERY_STRING} ^service=pdfzip$
RewriteRule ^(.*)$ $1 [QSA,E=no-gzip:1,PT,L]

In the first line, I am using RewriteCond to ensure that the rule on line two will only be applied to requests where the query string (the stuff that comes after the '?' character) is exactly equal to 'service=pdfzip'. In line 2 I am preserving the request string (not actually doing any rewriting) while specifying that the query string be appended (QSA) and that the environment variable 'no-gzip' be set with a value of '1'. In other words, I am telling Apache to set 'no-gzip=1' for any request with 'service=pdfzip' in the query string and to leave everything else as-is.


The actual final rewrite rule is a bit more complicated because we need to prevent the GZIP compression of both dynamically generated Zip files as well as dynamically generated PDF files. The final version is:

RewriteCond %{QUERY_STRING} ^.*service=pdf$ [or]
RewriteCond %{QUERY_STRING} ^service=pdfzip$
RewriteRule ^(.*)$ $1 [QSA,E=no-gzip:1,PT,L]

With these rules in place and our Apache configuration reloaded, all of our Internet Explorer users are now able to download uncorrupted Zip and PDF files.

February 25, 2008

Affordable health care vs. affordable health insurance



Using health insurance to pay for your yearly physical, eye
exam or dentist visit is like using car insurance to pay for gas and
tune-ups. Can you imagine the response you'd get from your State Farm
Insurance rep if you asked "Does this auto insurance plan cover my
gasoline and oil-changes too?"


Nobody on earth would even consider such a thing. The very idea is just
silly. But if you apply the same reasoning to health
insurance, you have to conclude that at some point, the American
consumer has come to equate health insurance with health care. I don't know anyone who would confuse their car insurance plan
for a car maintenance plan, but I know plenty of well-informed people
who no longer see that distinction when it comes to their personal
health expenditures.

Interesting, but so what?

I recently read through the Obama and McCain
issue statements on health care, hoping to see for myself what the
important differences are (and there are many). After reading a few
paragraphs into each, the following theme emerged: The emphasis of
both of their plans is to reduce the cost and increase the availability
of health insurance. Although the costs of health care are discussed, it is not the primary focus of either plan.

That is when it struck me: Americans do not want affordable health insurance. They want affordable health maintenance. People do not buy home or auto insurance to cover routine maintenance of their homes and cars. We buy those types of insurance to protect ourselves against rare catastrophic events. In terms of health, the desire is for the average family to be able to afford the routine costs of health maintenance. In a less insane world, these families would purchase health insurance like they did auto insurance - to protect themselves financially from infrequent catastrophic events.

Tying these two ideas together: It seems to me that driving force behind both candidates health care plans is the desire to make insurance more affordable. At the same time, it looks like we Americans are using health insurance for all the wrong reasons (regularly scheduled maintenance vs. only catastrophic events). In the end, all we really want is for the typical family to be able to afford the costs of basic health maintenance. Unfortunately, neither candidate seems to be addressing that fundamental issue. By casting the argument in terms of health insurance and not health maintenance, it only reinforces the confusion.

February 21, 2008

527 PACs: The groups everyone loves to hate



In my inbox this morning was a campaign email from the Barack Obama for President 2008 campaign manager David Plouffe. The intent of the message was to warn Obama supporters of the formation of a pro-Clinton/anti-Obama 527 Group named the "American Leadership Project". In that message, the funding structure for this new group was portrayed as:

The so-called "American Leadership Project" will take unlimited contributions from individuals and is organized the same way as the infamous Swift Boat Veterans for Truth.

Seems innocuous enough until you consider the fact that a different 527 Group, which is subject to the same funding legalities as the Swift Boat Veterans for Truth, is the Obama-supporting MoveOn.org 527 Group. My question back to the Obama campaign would be this: If you are going to oppose 527 Groups (American Leadership Project, Swift Boat Veterans for Truth) because of their funding structure - why would you not also opppose MoveOn.org and their fundraising efforts on behalf of pro-Obama advocacy?

Seems a bit hypocritical to me.

Listening To

Real Estate Stats

  • Price Trend for Sunnyvale 94086
  • Median Price for All Sunnyvale
  • Median Price for Sunnyvale 94086

GA