Caching for dynamic content

I made this snippet to show how to use the Last-Modified and the ETag header
to optimize the caching of a website. If used correctly this will speed up
your page loads.

Snippet information

Author:
Jonas John

License:
Public Domain

Language:
PHP

Created:
04/04/2006

Updated:
02/06/2008

Tags:
, ,


// Start output buffering, this will
// catch all content so that we can 
// do some calculations
ob_start();
 
 
// Some example HTML
print '<html>';
 
// put your content in here:
print '<h1>Example content</h1>';
 
print '<ul>';
for ($x=0; $x < 10; $x++)
    print "<li>List item $x</li>";
 
print '</ul>';
print '</html>';
 
// or include() something here
 
 
// Now save all the content from above into
// a variable
$PageContent = ob_get_contents();
 
// And clear the buffer, so the
// contents will not be submitted to 
// the client (we do that later manually)
ob_end_clean();
 
 
// Generate unique Hash-ID by using MD5
$HashID = md5($PageContent);
 
// Specify the time when the page has
// been changed. For example this date
// can come from the database or any
// file. Here we define a fixed date value:
$LastChangeTime = 1144055759;
 
// Define the proxy or cache expire time 
$ExpireTime = 3600; // seconds (= one hour)
 
// Get request headers:
$headers = apache_request_headers();
// you could also use getallheaders() or $_SERVER
// or HTTP_SERVER_VARS 
 
// Add the content type of your page
header('Content-Type: text/html');
 
// Content language
header('Content-language: en');
 
// Set cache/proxy informations:
header('Cache-Control: max-age=' . $ExpireTime); // must-revalidate
header('Expires: '.gmdate('D, d M Y H:i:s', time()+$ExpireTime).' GMT');
 
// Set last modified (this helps search engines 
// and other web tools to determine if a page has
// been updated)
header('Last-Modified: '.gmdate('D, d M Y H:i:s', $LastChangeTime).' GMT');
 
// Send a "ETag" which represents the content
// (this helps browsers to determine if the page
// has been changed or if it can be loaded from
// the cache - this will speed up the page loading)
header('ETag: ' . $HashID);
 
 
// The browser "asks" us if the requested page has
// been changed and sends the last modified date he
// has in it's internal cache. So now we can check
// if the submitted time equals our internal time value.
// If yes then the page did not get updated
 
$PageWasUpdated = !(isset($headers['If-Modified-Since']) and 
    strtotime($headers['If-Modified-Since']) == $LastChangeTime);
 
 
// The second possibility is that the browser sends us
// the last Hash-ID he has. If he does we can determine
// if he has the latest version by comparing both IDs. 
 
$DoIDsMatch = (isset($headers['If-None-Match']) and 
    ereg($HashID, $headers['If-None-Match']));
 
// Does one of the two ways apply?
if (!$PageWasUpdated or $DoIDsMatch){
 
    // Okay, the browser already has the
    // latest version of our page in his
    // cache. So just tell him that
    // the page was not modified and DON'T
    // send the content -> this saves bandwith and
    // speeds up the loading for the visitor
 
    header('HTTP/1.1 304 Not Modified');
 
    // That's all, now close the connection:
    header('Connection: close');
 
    // The magical part: 
    // No content here ;-) 
    // Just the headers
 
}
else {
 
    // Okay, the browser does not have the
    // latest version or does not have any
    // version cached. So we have to send him
    // the full page.
 
    header('HTTP/1.1 200 OK');
 
    // Tell the browser which size the content
    header('Content-Length: ' . strlen($PageContent));
 
    // Send the full content
    print $PageContent;
}


Found a bug? Or do you have a better solution for this?
Feel free to leave a message:

Add a comment


Leave a comment

Peter Bowey December 27, 2008 at 11:55
Great code Jonas! I modified it for working with the php ob_start(ob_gzhandler') = gzip compression. Also changed ereg() to preg_match() = faster. Changed ETag method to include the standard "xxxx" rather than xxxx

<code>
<?php ob_start(); ?>
<?php ob_start('ob_gzhandler'); ?> // must register ob_gzhandler before session start
<?php session_start(); ?>

// Do HTML
print '<html>';
// put your content in here:
print '<h1>Example content</h1>';
// or include() something here

<?php
// Now save all the content from above to a variable
ob_end_flush(); // flush the ob_gzhandler buffer
$compLength = ob_get_length(); // get the compressed length
$PageContent = ob_get_contents(); // save our page content
ob_end_clean(); // clear the outer buffer (show it later)
$HashID = '"' . md5($PageContent) . '"'; // Generate unique Hash-ID by using MD5
$LastChangeTime = filemtime($_SERVER['SCRIPT_FILENAME']); // script modification time
$ExpireTime = 3600; // cache expire = one hour)
$headers = apache_request_headers(); // Get request headers
// you could also use getallheaders() or $_SERVER or HTTP_SERVER_VARS
header('Content-Type: text/html'); // content type
header('Content-language: en-au'); // Content language
header('Cache-Control: max-age=' . $ExpireTime); // must-revalidate
header('Expires: '.gmdate('D, d M Y H:i:s', time()+$ExpireTime).' GMT');
header('Last-Modified: '.gmdate('D, d M Y H:i:s', $LastChangeTime).' GMT');
header("ETag: $HashID"); // send our ETag
// is client's internal cache submit time = ours? If yes then the page did not get updated
$PageWasUpdated = !(isset($headers['If-Modified-Since']) and strtotime($headers['If-Modified-Since']) == $LastChangeTime);
// OR: if the browser sends us the last ETag it has (compare it)
$DoIDsMatch = (isset($headers['If-None-Match']) and preg_match($HashID, $headers['If-None-Match']));
if (!$PageWasUpdated or $DoIDsMatch){
header('HTTP/1.1 304 Not Modified'); // it matches
header('Connection: close'); // close the connection (no content)
} else {
header('HTTP/1.1 200 OK'); // no cache: send the page
header('Content-Length: ' . $compLength);
print $PageContent; // Send the full content
}
?>
</code>

The above is tested and live on my website http://www.pbcomp.com.au/

Cheers,
Peter Bowey
Jonas February 06, 2008 at 11:42
Thanks Werner!

I corrected the snippet :)
Werner Avenant February 04, 2008 at 17:48
Briliant piece of code, really helped ease the load on our server, but there is one small bug:

The line with Header("Expires:
should have
time()+$expire

instead of

$last_change+$expires

Otherwise something that was last modified on 8 Jan 2003 will expire on 8 Jan 2003 instead of expiring at a later stage today