Building Clean URLs Into a Site «
»


Code: , ,
5 comments

I wrote about building a site with clean URLs, but that’s useless to you. No, you’ve got a creaking hulking monster of a site that coughs up URLs like “render.php?action=list_mailbox&id=42189”, was built “to meet an accelerated schedule”, and eats summer interns whole.

This article tells you how to put clean and human-usable URLs on top of the site without even editing your underlying scripts. All these examples mention PHP but it doesn’t matter what you coded the site in, you just have to be running Apache and have a little familiarity with regular expressions.

So we have two goals. First, requests for the new URL are internally rewritten to call the existing scripts without users ever knowing they exist. Second, requests for the old URLs get a 301 redirect to the new URLs so that search engines and good bookmarks immediately switch to the new URLs.

Let’s work through an example .htaccess file. We take apart the new URLs and map them internally to the old URLs:

RewriteEngine on

RewriteRule ^new/(.*)/(.*)$ /old.php?action=$1&id=$2 [L]

This works great, so we dive into the 301 redirects:

RewriteEngine On

RewriteRule ^new/(.*)/(.*)$ /old.php?action=$1&id=$2 [L]
RewriteCond %{QUERY_STRING} ^action=([a-z]+)&id=([0-9]+)
RewriteRule ^old\.php$ /new/%1/%2? [R=301,L]

Arrrgh! We test this and find a problem: all requests for new are getting 301 redirected back to new. Apache is rewriting new to old fine, but then it sends the new URL back through mod_rewrite again so we’re stuck in an infinite loop of redirects (even though the [L] option is supposed to tell Apache to stop applying rules). We need to turn it up to 11 and tell Apache “No, really, stop rewriting URLs now” by setting a flag that it already rewrote the URL.

RewriteEngine on
# This rule just keeps DirectoryIndex working (so requests for / go to /index.php or whatever)
RewriteRule ^$ - [L]
# Set an environment variable REWROTE that we haven't done any rewriting
RewriteRule ^(.*)$ $1 [E=REWROTE:0]

# Flag that the new URL rewrote
RewriteRule ^new/(.*)/(.*)$ /old.php?action=$1&id=$2 [L,E=REWROTE:1]
# Only rewrote the old url if we didn't rewrite
RewriteCond %{ENV:REWROTE} !^1$
RewriteCond %{QUERY_STRING} ^action=([a-z]+)&id=([0-9]+)
RewriteRule ^old\.php$ /new/%1/%2? [R=301,L]

This set of rewrite rules accomplishes both our goals and the new URLs are all we’ll ever see in the address bar. You can set up any number of rewritten URLs (there’s no need to repeat the code turning on rewriting and REWROTE flag), editing them for your particular GET variables and layout.

Proudly displaying our shiny new URLs, we can send in surgical teams into the site’s source code file-by-file, slowly and carefully replacing instances of the old URLs with the new ones. Once all the URLs are replaced, you can watch your server logs to see usage of the old URLs fall off. The way is now prepared for you to further beautify your site inside and out.


Comments

  1. how about making it simpler ?
    use this in .htaccess

    RewriteEngine On
    RewriteBase /
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . index.php [L]

    and put this in your index.php

    if (SITE_NICEURL == true && isset($_SERVER['REQUEST_URI'])) {
    $url = substr($_SERVER['REQUEST_URI'], 1);
    $urlParts = explode('/', $url);
    if ($urlParts[count($urlParts) - 1] == '') array_pop($urlParts);
    $urlPartsCount = count($urlParts);
    if ($urlPartsCount % 2 != 0) $urlPartsCount++;
    for ($i = 0; $i < $urlPartsCount; $i += 2) { $key = $urlParts[$i]; $value = (isset($urlParts[$i + 1])) ? $urlParts[$i + 1] : NULL; $_GET[$key] = $value; } }

    this will parse your nice urls and return them into the normal GET structure
    lets say you have :
    ?page=1&section=2
    you can turn it to:
    /page/1/section/2

    while keeping your code intact and not having to make a rule for everything in your .htaccess file .. ;)

  2. That is indeed simpler, but my goal here was to put clean URLs onto a site without ever touching the PHP, ASP, Python, Ruby, whatever that the site runs in. I’ve seen code where any edit causes horrific problems, however innocuous it may first seem, so I wanted something transparent to the site.

    (To anyone who noticed, I edited Ahmad’s first comment to correctly highlight his PHP code and removed his two other tries fixing it. Just keeping things tidy.)

  3. yes it’s true that in most cases it might break your site if it wasn’t built with it from the start … however i don’t see a way to make it universally work on ALL your pages without editing the php code (or whatever other language you are using) without manually having to write down every single page and every single variable possibility … right ?

    however after giving it some thought it might be possible if you can do some editing on the server application (like apache) say like making a mod for apache that does all this automatically working with mod_rewrite …

    i would have deticated a lot of time to do it .. sadly i don’t know how :P i am a programmer but i can’t do that (yet)

    btw: since this topic might attract a lot of people, and since this is only doable in apache, i have good news for IIS users !!! it’s called IIS-Rewrite !! here is the link:

    IISRewrite

    i never tried it, but it should work and do everything mod_rewrite does for apache :)

    (thanks peter for the editing and sorry for the multipost)

Leave a Reply

Your email address will not be published.