Introduction to Apache mod_rewrite

Posted January 20th, 2010 by in Search Engine Optimisation, Website Development

What is mod_rewrite?

Apache’s mod_rewrite module is a powerful rule-based rewriting engine that rewrites requested URLs on the fly, server-side. This is an incredibly useful feature and importantly it allows you to set up redirects and SEO friendly URLs in minutes.

Can I Use mod_rewrite?

If you’re using the Apache web server then the chances are that you have the mod_rewrite module available to you, as it’s enabled by default in most distributions.

Getting Started with mod_rewrite

Your rewriting rules will go in an .htaccess file, a hypertext access configuration file. This can be in the root of your website and apply to all lower directories, or you may have .htaccess files within particular directories if the URL rewriting rules will only apply at that location. We’re going to assume that all of your URL rewriting will be done from one .htaccess file in the root of your website.

Turn the Rewriting Engine On!

First we need to tell Apache that it should use the rewriting engine. We do this by adding the following line of text to our .htaccess file above any rewrite rules:

RewriteEngine On

Now we’re ready to start adding some URL rewriting rules with the RewriteRule directive!

URL Rewriting Rules

mod_rewrite accepts Perl compatible regular expressions. The most common mod_rewrite directives that you will use are RewriteRule and RewriteCond.

The RewriteRule directive is the part of the code that does the work in defining what will be rewritten. This directive can be used as many times as you want, with each one defining a single rewrite rule. The syntax for RewriteRule is as follows:

RewriteRule Pattern Substitution

The RewriteCond directive can be used alongside the RewriteRule directive to define a condition under which rewriting will take place. For example, you may have set up a rewrite rule but you only want that rule to be applied under certain circumstances. One or more RewriteCond rule conditions can precede a RewriteRule directive. The syntax for RewriteCond is as follows:

RewriteCond TestString CondPattern

Pattern and CondPattern are Perl compatible regular expressions, with some useful additions. Patterns can be prefixed with a ‘!’ character (exclamation mark) to specify a non-matching pattern. A working knowledge of regular expressions is particularly useful when setting up URL rewriting rules.

RewriteRule

A simple example of RewriteRule used on its own without the need for any extra RewriteCond conditions would be rewriting seemingly static product pages onto a dynamic script. Dynamic URLs such as www.example.com/product.php?id=26 are not recommended for SEO, and a slightly better (static looking) example would be www.example.com/product-26.htm. This is easy to implement with the following rewriting rule:

RewriteRule ^product-([0-9]+)\.htm$ /product.php?id=$1 [L]

First let’s look at the regular expression pattern for the URLs that we’re matching. The caret symbol (^) means “starts”, so URLs must start with the word “product” and then an apostrophe (“-”).

Using parentheses in Pattern or in one of the CondPatterns creates a back-reference, so that those values can be used in the Substitution and TestString patterns. We want to capture the product ID and pass it to our dynamic script, so the ID part of the URL is enclosed in parentheses.

The use of square brackets indicates a range. In this case we’re looking for an ID number, so the range of numbers 0-9 are used. We expect 1 or more numbers in the ID number so the plus (“+”) symbol directly after the range means “match 1 or more of this range”.

We then want our static URL to finish with “.htm” in order to look like a standard HTML page. The full stop (“.”) character has a special meaning in regular expressions so it has to be escaped with the backslash (“\”) character so that it is treated as a literal full stop and not its special meaning.

The dollar symbol (“$”) means “ends”.

Taken as a whole our regular expressions means “match a URL that starts with ‘product-’ then has 1 or more numbers 0 to 9 and finally stops after ‘.htm’”.

Now let’s look at the substitution string. This is much simpler than the regular expression pattern. We want to rewrite our static URLs into a dynamic script called product.php in the root of the website.

We created a back-reference referring to the product ID in the regular expression and this is added to the URL with $1, referring to the first back-reference created.

The [L] flag specifies that if matched, stop the rewriting process here and don’t apply any more rewrite rules. This will stop any subsequent rules matching this request and is also better for performance, as no further rules need to be tested by the server before rewriting takes place.

RewriteCond and RewriteRule

A simple and common example of using a rewrite condition with a rewrite rule is redirecting canonical URLs. Typically you’ll want any visitors arriving at http://example.com or http://example.com/whatever/ to be redirected to http://www.example.com or http://www.example.com/whatever/ (adding the “www.” prefix). This avoids any duplicate content issues. This is easy to implement with the following rewrite condition and rewrite rule:

RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule ^(.*) http://www.example.com/$1 [L,R=301]

This rewrite condition compares the CondPattern to the TestString and if it matches, the RewriteRule that follows will be applied. In this case the TestString is the website host (e.g. example.com or www.example.com etc…), specified by %{HTTP_HOST}. (Lots of other server variables can also be tested against.)

The CondPattern pattern is another regular expression. As mentioned previously, the “!” prefix specifies a non-matching pattern. This is a similar regular expression to our previous example – it starts with “www.example.com” and then ends. Because we’re looking for a non-matching pattern, this means that any HTTP hosts not matching “www.example.com” (e.g. “example.com”) will trigger the rewrite rule.

The RewriteRule pattern here is grabbing anything after the domain part of the URL so that it can be added to the end of the correct version of the URL with the “www.”.

The pattern starts with a caret, as we have seen previously.

The next part is enclosed in parentheses so that a back-reference will be created and added to the correct URL.

The full stop means “any character” and the asterisk (“*”) means “any number of”. Put together those two special characters mean “any number of any character”, which is anything! That will grab any directory / file name specified in the URL after the domain part.

The Substitution is equally as simple as the first example. The new URL will be http://www.example.com/ followed by whatever directory / file was identified by the back-reference.

The L flag specifies that this is the last rule, and R=301 specifies that the redirect code should be 301.

This will set up 301 redirects for any URL that doesn’t contain the “www.” prefix and redirect the visitor to the same page on the correct host.

Putting It All Together

In a real .htaccess file, the examples above would be put together as follows (in .htaccess files lines beginning with the hash character (“#”) are comments and are not parsed):

# Turn on the URL rewrite engine
RewriteEngine On

# Rewrite any non-www requests to the correct www host
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule ^(.*) http://www.example.com/$1 [L,R=301]

# Rewrite "static" HTML product pages onto the PHP script that processes them
RewriteRule ^product-([0-9]+)\.htm$ /product.php?id=$1 [L]

Conclusion

Those are two of the simplest but most useful uses for the Apache mod_rewrite module. As it is such a powerful tool, mod_rewrite can do much more complicated and interesting things, but that’s beyond the scope of this basic introduction.

Be Sociable, Share!



Leave a Reply

Subscribe
Get A Quote Get A Free Website Analysis
© 2009 RAM. All rights reserved. Built and Powered by WSI. | Sitemap
Web Design Huddersfield and Online Marketing Huddersfield as well as Leeds, Manchester, Sheffield & West Yorkshire

WSI Internet Consulting, The Media Centre, 7 Northumberland Street, Huddersfield, HD1 1RL
Registered in England No. 4968860, Bridge End House, Park Mount Avenue, Baildon, BD17 6DS