Caching for dynamic content

I made this snippet to show how to use the Last-Modified and the ETag header
to optimize the caching of a website. If used correctly this will speed up
your page loads.

// Start output buffering, this will
// catch all content so that we can 
// do some calculations
ob_start();
 
 
// Some example HTML
print '<html>';
 
// put your content in here:
print '<h1>Example content</h1>';
 
print '<ul>';
for ($x=0; $x < 10; $x++)
    print "<li>List item $x</li>";
 
print '</ul>';
print '</html>';
 
// or include() something here
 
 
// Now save all the content from above into
// a variable
$PageContent = ob_get_contents();
 
// And clear the buffer, so the
// contents will not be submitted to 
// the client (we do that later manually)
ob_end_clean();
 
 
// Generate unique Hash-ID by using MD5
$HashID = md5($PageContent);
 
// Specify the time when the page has
// been changed. For example this date
// can come from the database or any
// file. Here we define a fixed date value:
$LastChangeTime = 1144055759;
 
// Define the proxy or cache expire time 
$ExpireTime = 3600; // seconds (= one hour)
 
// Get request headers:
$headers = apache_request_headers();
// you could also use getallheaders() or $_SERVER
// or HTTP_SERVER_VARS 
 
// Add the content type of your page
header('Content-Type: text/html');
 
// Content language
header('Content-language: en');
 
// Set cache/proxy informations:
header('Cache-Control: max-age=' . $ExpireTime); // must-revalidate
header('Expires: '.gmdate('D, d M Y H:i:s', time()+$ExpireTime).' GMT');
 
// Set last modified (this helps search engines 
// and other web tools to determine if a page has
// been updated)
header('Last-Modified: '.gmdate('D, d M Y H:i:s', $LastChangeTime).' GMT');
 
// Send a "ETag" which represents the content
// (this helps browsers to determine if the page
// has been changed or if it can be loaded from
// the cache - this will speed up the page loading)
header('ETag: ' . $HashID);
 
 
// The browser "asks" us if the requested page has
// been changed and sends the last modified date he
// has in it's internal cache. So now we can check
// if the submitted time equals our internal time value.
// If yes then the page did not get updated
 
$PageWasUpdated = !(isset($headers['If-Modified-Since']) and 
    strtotime($headers['If-Modified-Since']) == $LastChangeTime);
 
 
// The second possibility is that the browser sends us
// the last Hash-ID he has. If he does we can determine
// if he has the latest version by comparing both IDs. 
 
$DoIDsMatch = (isset($headers['If-None-Match']) and 
    ereg($HashID, $headers['If-None-Match']));
 
// Does one of the two ways apply?
if (!$PageWasUpdated or $DoIDsMatch){
 
    // Okay, the browser already has the
    // latest version of our page in his
    // cache. So just tell him that
    // the page was not modified and DON'T
    // send the content -> this saves bandwith and
    // speeds up the loading for the visitor
 
    header('HTTP/1.1 304 Not Modified');
 
    // That's all, now close the connection:
    header('Connection: close');
 
    // The magical part: 
    // No content here ;-) 
    // Just the headers
 
}
else {
 
    // Okay, the browser does not have the
    // latest version or does not have any
    // version cached. So we have to send him
    // the full page.
 
    header('HTTP/1.1 200 OK');
 
    // Tell the browser which size the content
    header('Content-Length: ' . strlen($PageContent));
 
    // Send the full content
    print $PageContent;
}
Snippet Details




Sorry folks, comments have been deactivated for now due to the large amount of spam.

Please try to post your questions or problems on a related programming board, a suitable mailing list, a programming chat-room,
or use a QA website like stackoverflow because I'm usually too busy to answer any mails related
to my code snippets. Therefore please just mail me if you found a serious bug... Thank you!


Older comments:

Nizzy February 27, 2011 at 06:32
Absolutely, it's a good start. But, it's little misleading. For example; if your page content comes from the database, you code will STILL hit the database in each request. This overhead will cost you. This is why we need caching systems. You need to find a way to eliminate database call.

So, what would be ideal solution here? Creating a cached version of your content would be best choice, however, either you need to run a cronjob to build this cache file or when users generate your content, you can update it as well. When you create your cache file, you need to get $HashID from filemtime() or stat() function.

For Peter's post: You will get error if you use preg_match();
Warning: preg_match(): Delimiter must not be alphanumeric or backslash....


Thank you
Peter Bowey December 27, 2008 at 11:55
Great code Jonas! I modified it for working with the php ob_start(ob_gzhandler') = gzip compression. Also changed ereg() to preg_match() = faster. Changed ETag method to include the standard "xxxx" rather than xxxx

<code>
<?php ob_start(); ?>
<?php ob_start('ob_gzhandler'); ?> // must register ob_gzhandler before session start
<?php session_start(); ?>

// Do HTML
print '<html>';
// put your content in here:
print '<h1>Example content</h1>';
// or include() something here

<?php
// Now save all the content from above to a variable
ob_end_flush(); // flush the ob_gzhandler buffer
$compLength = ob_get_length(); // get the compressed length
$PageContent = ob_get_contents(); // save our page content
ob_end_clean(); // clear the outer buffer (show it later)
$HashID = '"' . md5($PageContent) . '"'; // Generate unique Hash-ID by using MD5
$LastChangeTime = filemtime($_SERVER['SCRIPT_FILENAME']); // script modification time
$ExpireTime = 3600; // cache expire = one hour)
$headers = apache_request_headers(); // Get request headers
// you could also use getallheaders() or $_SERVER or HTTP_SERVER_VARS
header('Content-Type: text/html'); // content type
header('Content-language: en-au'); // Content language
header('Cache-Control: max-age=' . $ExpireTime); // must-revalidate
header('Expires: '.gmdate('D, d M Y H:i:s', time()+$ExpireTime).' GMT');
header('Last-Modified: '.gmdate('D, d M Y H:i:s', $LastChangeTime).' GMT');
header("ETag: $HashID"); // send our ETag
// is client's internal cache submit time = ours? If yes then the page did not get updated
$PageWasUpdated = !(isset($headers['If-Modified-Since']) and strtotime($headers['If-Modified-Since']) == $LastChangeTime);
// OR: if the browser sends us the last ETag it has (compare it)
$DoIDsMatch = (isset($headers['If-None-Match']) and preg_match($HashID, $headers['If-None-Match']));
if (!$PageWasUpdated or $DoIDsMatch){
header('HTTP/1.1 304 Not Modified'); // it matches
header('Connection: close'); // close the connection (no content)
} else {
header('HTTP/1.1 200 OK'); // no cache: send the page
header('Content-Length: ' . $compLength);
print $PageContent; // Send the full content
}
?>
</code>

The above is tested and live on my website http://www.pbcomp.com.au/

Cheers,
Peter Bowey
Jonas February 06, 2008 at 11:42
Thanks Werner!

I corrected the snippet :)
Werner Avenant February 04, 2008 at 17:48
Briliant piece of code, really helped ease the load on our server, but there is one small bug:

The line with Header("Expires:
should have
time()+$expire

instead of

$last_change+$expires

Otherwise something that was last modified on 8 Jan 2003 will expire on 8 Jan 2003 instead of expiring at a later stage today