Join Justin Yost for an in-depth discussion in this video HTTP caching, part of PHP: Performance Optimization.
- [Instructor] HTTP caching is a technique whereby we can serve up the same resource multiple times to our end users before needing to regenerate that resource. There are two different ways to think about the problem of HTTP caching. Sometimes, we need to cache and simply serve up an image or CSS file. Other times, we need to cache a whole entire page. This second way is when we have, say, a blog post or a news article that is generated using a backend like WordPress.
This content, essentially, never changes. You can save the final produced HTML and serve that to everyone that visits that page until it needs to change. If you have a part of your website where you primarily serve static content to anybody who visits like a news site or a blog post or similar, a full caching layer where you serve cached and static HTML is going to be a huge performance win. Nothing is faster than serving an already built HTML file to people.
The tools to do this are called an HTTP Reverse Proxy. There are two super popular ones, Varnish and HAProxy. They act in more or less the same manner. A typical setup for Varnish or for HAProxy or any other reverse proxy system is where the cache layer sits in front of your web server. Any requests from outside clients first hits the cache layer before hitting the web server. If the cache needs to, it'll request the copy of the document, cache it locally inside of its own cache server and then serve it up to each of the clients.
Notice in these cases where the cache doesn't actually have to talk to the web server, the web server is never even involved. It never even sees the request for that particular HTML document. The goal here is that the first request might be a little bit slow because it has to talk to the web server, and do all of its database queries, and forming the HTML, and running a bunch of PHP code. But the next hundred thousand requests for that same webpage, they're going to be blazing fast. The second type of HTTP caching is easier to set up and doesn't require the use of this second tool.
This is HTTP browser caching. This is done through the use of two primarily different HTTP header tags. The first one is the ETag. The ETag is an autogenerated tag the web server assigns that acts almost like a hash of the resource in question. This ETag is then used for any future requests by the requestor to identify which copy is currently cached by that user's browser. Here's an example of this where a request for the file matches the ETag for the file.
The web server then tells the browser, "Okay, this hasn't changed and you can hold your copy of it "for another 120 seconds." So if in the next two minutes you need this file again, don't worry about asking for it, you already have the current copy of it. Well, wait, if we had the ability to specify how long to control the browser's cache for something, why not just always use that header. That's the Cache-Control header. You are correct, ETag has mostly fallen out of favor as it still requires a second request for the server to process.
They also don't always work correctly if you have multiple servers serving the same file. For that reason, the main header we use to control browser caching is the Cache-Control header. So instead, when we make a request for a resource, we return with this Cache-Control header attached to it. This head has a few separate parts but we'll take it slow. First, we have this max-age. This is the number of seconds the resource in question can be cached. Sometimes, like with image files, you might want this time-out naturally over a month or so.
Or you might have an RSS feed file where they should be only cached for, say, up to an hour. The private flag for a Cache-Control header tells any intermediate caches, for instance, if you're on a corporate network, it might have a caching layer built in, in that case, this private flag tells this corporate caching layer not to cache this resource. Only your own browser would attempt to cache it. Cache-Control: no-store on the other hand tells all caching layers, "No touchy." All the caching layers including your own browser will not attempt to cache the file and will always request a fresh copy of it.
So what's the setup process for all my different web servers? Well, for the full list, the open source project, HTML5 Bootstrap has a set of server configurations. These include the caching setup for the most common web servers. However, we'll give a quick example of this in Apache. In Apache, we'll set the FileETag parameter to None. This will turn off our ETags. Then we'll set ExpiresActive to on. This controls our Cache-Control header. Then we can set a default value for how long to have our Cache-Control set.
In this case, we're setting a default of one week. You can also set your Cache-Control header to control each file by the type. In this case, we're setting the CSS files to be cached for one year and JPEGs to be cached for one month. Well, that's great and all, but once I set this up, what do I do about all my files that are sitting in people's cache? Surely I'm not supposed to only deploy new CSS once a year or something, and you shouldn't. There's two main approaches to this problem, and they're virtually identical.
It's called Cache Busting. The way you do Cache Busting is to bust the cache by identifying the resource with a different filename. We can do it one of two ways. One is to pass a query string to the file with a version number or perhaps a timestamp of the last modified date or something similar. A similar approach is to use that same number in the URL of the file in between the name of the file and the extension. Both of these provide for a unique URL that changes whenever the file in question changes.
The first has the advantage of being much simpler. In fact, many PHP frameworks can turn this on by default. However, it has a drawback, that's Squid, a popular proxy and caching layer, will not query files with a query string by default. Either approach is solvable, however, with both PHP frameworks and build tools like Grunt and NPM.
- General optimization techniques and tools
- PHP and Xdebug
- Opcode cache
- Optimization in PHP
- Upgrading PHP
- Macro vs. Micro optimization
- HTTP caching
- HTTP compression