Building a Site With Clean URLs

As an aside in my post about Cambrian House I posted some code for making pretty URLs. A few people (no, not CH) have asked for a little more info, so I’ve written up an explanation of that code.

PHP makes it very easy to create bad URLs like /member.php?id=8. Those are bad because web spiders don’t like to crawl URLs with GET variables, some browsers don’t cache any GET URLs, they expose that you use PHP (when the visitor should never even know), and they’re just downright ugly and hard to remember. I’m going to present a way to build a PHP/Apache site with clean URLs.

Let’s look, line-by-line, at the contents of .htaccess. While writing this article I found a more elegant equivalent in the WordPress code, so I’ll present that here:

# Tell Apache to load mod_rewrite.
RewriteEngine On
# Rewrite URLs for the location starting at /
# Note this is URL location, not a path to your web root.
RewriteBase /
# If the request asks for a file or directory that doesn't exist
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# send the request to index.php.
RewriteRule . /index.php [L]

The hard part about this change is the shift in thinking. A URL isn’t just a path to a file you FTP’d to a web server, it’s a Universal Resource Locator, an address for information. It doesn’t matter whether your site presents data taken from files, a database, or a random number generator — a web browser requests a URL and knows nothing about where the website gets it from.

With that in mind, let’s look at how your PHP site can take apart the URL to route the request to the right PHP script. Create an index.php that looks like:

 
function url_parse($url) {
        // strip off get vars and anchor tags
        if (strpos($url, '?'))
                $url = substr($url, 0, strpos($url, '?'));
        if (strpos($url, '#'))
                $url = substr($url, 0, strpos($url, '#'));
 
        //remove leading slash and possible trailing slash, store in $url
        if (substr($url, 0, 1) == '/')
                $url = substr($url, 1);
        if (substr($url, -1) == '/')
                $url = substr($url, 0, -1);
        if ($url == '/')
                $url = '';
        $url = explode('/', $url);
        
        return($url);
}
 
$actions = Array(
    "" => "front_page.php",
    "mail" => "mail.php",
    "member" => "profile.php",
    "messageboard" => "boards.php",
);
 
$url = url_parse($_SERVER['REQUEST_URI']);
$action = array_shift($url);
if (array_key_exists($action, $actions)) {
    require($actions[$action]);
} else {
    require("404.php");
}

So index.php takes the first element off the array, and probably uses it to require() specific templates or scripts that do the work with the rest of the $url array. Think of it like a switchboard operator — it sends requests where they need to go.

Your individual pages can array_shift() their arguments from the $url array. In the example above, profile.php would expect a username or ID number, and it can require("404.php"); if there’s nothing there or no user by that id.

More complex nested URLs (say, /messageboard/chat/new_thread) work much the same way index.php works: the base script examines $url and passes it on to the scripts it knows about. In the example above, boards.php can load the requested board or the pages used to create a new post or a new board.

In my next post, I’ll provide a clean URL solution for existing sites that can’t afford to redesign their PHP scripts.

Want more? I'm not as good at forgetting to update @pushcx on Twitter.