HOWTO: Configure apache to correctly serve Zip files to Internet Explorer
One of the more irritating problems to affect my Altos Research users recently was the inability to successfully download a Zip archive file from our servers using Internet Explorer. It didn't matter what version of IE they were using (IEv6 or IEv7) or which version of Windows (XP or Vista). Every time they downloaded the Zip and attempted to open it they would see an error message informing them that the file was corrupt or invalid.
The problem did not occur when using any other web browser - Firefox, Safari or Opera. Only Internet Explorer seemed to be corrupting the Zip files that were downloaded from our servers. Of course, the fact that I have only Linux workstations here in my home office prevented me from replicating the problem, there were enough reports from reliable sources to indicate that this wasn't just a matter of a few misconfigured PCs or network proxies.
In the case of the Altos Research website, the Zip files we were serving are dynamically generated by our application. They are not just static files being served up by the web server. Each Zip file contains the customized reports for each of our customers. Because of that, my first guess was that we were doing something wrong with the generation of the Zip data or the HTTP headers that were being sent when the file was downloaded. To test this theory, I performed the following experiment:
I logged into the AR application as one of our customers using Firefox. I then downloaded the report Zip file to my local workstation. I verified that the file was not corrupt - I was able to open the Zip archive and extract the PDF files it contained. Next, I manually uploaded that file to the AR web server document root where it could be downloaded directly from Apache. I then asked several Windows/IE users to try and download/open that uploaded file.
In every case, the IE users were still unable to open the Zip file, even when it was not dynamically generated and served by our application. This told me that the problem was not being caused by the application code we use to generate the Zip archive. The problem must be due to something in our Apache server configuration.
A note about the Apache configuration: From the very start, I had configured Apache to use the mod_deflate plug-in for HTTP-level compression. This is almost always a good idea, as it decreases the bandwidth used and generally speeds up content delivery to the end-users. I knew from past problems that it was smart to restrict mod_deflate to only apply compression to certain content types (text/html, text/xml, for example) and exclude certain file types that already contain compressed data (image/jpeg, image/gif, application/pdf). There is no sense in having Apache compress already-compressed files, after all. So my initial mod_deflate configuration looked like this:
SetOutputFilter DEFLATE
DeflateFilterNote ratio
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/javascript text/css text
/plain
I had assumed, but did not verify, that this configuration would result in only files of those mime-types being compressed using HTTP GZIP compression. I was wrong.
I discovered just how wrong I was by performing the most basic test: Downloading a Zip file from Apache while monitoring the traffic using Wireshark. Much to my surprise, the packet captures of the Zip file downloading traffic indicated quite clearly that Apache was still using GZIP compression to transfer the data (abbreviated a bit):
GET /test.zip HTTP/1.1
Host: www.altosresearch.com
Accept-Encoding: gzip,deflateHTTP/1.1 200 OK
Date: Thu, 28 Feb 2008 20:29:01 GMT
Server: Apache/2.0.48 (Fedora)
Content-Encoding: gzip
Content-Type: application/zip
What?!?! I thought that my mod_deflate configuration would force the plugin to ONLY compress output that was one of the mime-types listed in the 'AddOutputFilterByType' parameter. Apparently, that is not the case.
So I started to dig a bit more and discovered that the mod_deflate plugin will make use of an environment variable named 'no-gzip'. If this variable is set when the HTTP request is made, the mod_deflate will NOT compress the output data. There are some basic configuration examples of how to use this for static files served directly by Apache:
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI \.pdf$ no-gzip dont-vary
SetEnvIfNoCase Request_URI \.zip$ no-gzip dont-vary
After implementing these configuration changes and reloading our Apache configuration, I was able to verify that files ending with a '.zip' suffix were not being GZIP compressed in transit:
GET /test.zip HTTP/1.1
Host: www.altosresearch.com
Accept-Encoding: gzip,deflateHTTP/1.1 200 OK
Date: Thu, 28 Feb 2008 20:45:56 GMT
Server: Apache/2.0.48 (Fedora)
Last-Modified: Thu, 28 Feb 2008 00:58:06 GMT
Content-Type: application/zip
Notice that there is no 'Content-Encoding: gzip' header - that is the important part. By setting the 'no-gzip' variable for requests ending in '.zip', mod_deflate was prevented from compressing the output. I once again asked a few Internet Explorer users to download that same file and verify that it was not being corrupted. Success! Multiple testers, using combinations of Windows XP and Vista with IE6 and IE7 were all able to successfully download the Zip file and extract the contents.
In most cases, that would be the end of the story, but the Altos Research application makes things a bit trickier. Trickier because our application serves dynamically generated Zip archives, and the URL that our users click to download that archive looks like this instead:
http://www.altosresearch.com/altos/app?service=pdfzip
Because the 'Request_URI' portion of this URL is only '/altos/app', it cannot be used with the SetEnvIf directives to set the 'no-gzip' environment variable. Instead, I had to resort to a much more obfuscated mod_rewrite solution:
RewriteCond %{QUERY_STRING} ^service=pdfzip$
RewriteRule ^(.*)$ $1 [QSA,E=no-gzip:1,PT,L]In the first line, I am using RewriteCond to ensure that the rule on line two will only be applied to requests where the query string (the stuff that comes after the '?' character) is exactly equal to 'service=pdfzip'. In line 2 I am preserving the request string (not actually doing any rewriting) while specifying that the query string be appended (QSA) and that the environment variable 'no-gzip' be set with a value of '1'. In other words, I am telling Apache to set 'no-gzip=1' for any request with 'service=pdfzip' in the query string and to leave everything else as-is.
The actual final rewrite rule is a bit more complicated because we need to prevent the GZIP compression of both dynamically generated Zip files as well as dynamically generated PDF files. The final version is:
RewriteCond %{QUERY_STRING} ^.*service=pdf$ [or]
RewriteCond %{QUERY_STRING} ^service=pdfzip$
RewriteRule ^(.*)$ $1 [QSA,E=no-gzip:1,PT,L]With these rules in place and our Apache configuration reloaded, all of our Internet Explorer users are now able to download uncorrupted Zip and PDF files.