Clean URLs in Textpattern

    2 May 2004, early afternoon

Update: I've changed the URLs at my site a fair bit. I'll write more about my changes later. I have now managed to avoid using article IDs in any URLs. The changes below should still be valid, but I plan on writing up a more comprehensive guide to fixing URLs in Textpattern at some later date.

Update: You can grab my textpattern install if you are getting tired of waiting for me to write up the way I built my new clean URL scheme.

Update: As of version 4.0.1, Textpattern supports Clean URLs nicely.

--

The URLs at my site follow the format suggested by Mamash in one of his older posts in the textpattern forum, with some slight variation. Basically, a URL (Uniform Resource Locator) at my site has the form http://www.domain.com/section/category/page-X/, where X is the page number. If the user is browsing by category, but not in a particular section, then the section becomes '-', so the URL would be something like http://www.domain.com/-/category/page-X/. The reason for using a dash as opposed to a word, like 'category', is that this may suggest that 'category' is a section of the site, which is it not.

The page can be ommited, as can the category, as can the section. Here are some example URLs from my site.

http://www.funkaoshi.com/page-2/ -- second page of the default section http://www.funkaoshi.com/-/web-design/ -- the first page of the category webdesign http://www.funkaoshi.com/immaculate/life/page-3/ -- the third page of the life category in the immaculate section

The trick to cleaner URLs is cleaning up the paging links, and cleaning up the links to articles in a particular category. Once these generated URLs are cleaned up, we will use a .htaccess file to convert the clean URLs back to messy URLs that textpattern can understand.

To print cleaner catergory lists, we need to edit the catergoryt list function in the taghandlers file. Change Line 387 from:

if($a) $out[] = tag(str_replace("& ","& ", $a),'a','href="'.$pfr.'?c='.urlencode($a).'"');

To:

if($a) $out[] = tag(str_replace("& ","& ", $a),'a','href="'.$pfr.'-/'.stripSpace($a).'/"');

To print cleaner paging links, we need to modify the newer and older functions inside the taghandlers file. Change Lines 527--534 from:

'<a href="?pg='.($pg - 1),
($c) ? a.'c='.urlencode($c) : '',
($s && !$url_mode) ? a.'s='.urlencode($s) : '',
'"',
(empty($title)) ? '' : ' title="'.$title.'"',
'>',
$thing,
'</a>');

To:

'<a href="/'
, ( $s && ($s != 'default') ) ? (stripSpace($s).'/') : (($c) ? '-/' : '')
, ($c) ? stripSpace($c).'/' : ''
, 'page-'.($pg - 1).'/'
, '"'
, (empty($title)) ? '' : ' title="'.$title.'"'
, '>'
, $thing
,'</a>');

And change Lines 555,562 from:

'<a href="?pg='.($pg + 1),
($c) ? a.'c='.urlencode($c) : '',
($s && !$url_mode) ? a.'s='.urlencode($s) : '',
'"',
(empty($title)) ? '' : ' title="'.$title.'"',
'>',
$thing,
'</a>');

To:

'<a href="/'
, ( $s && ($s != 'default') ) ? (stripSpace($s).'/') : (($c) ? '-/' : '')
, ($c) ? stripSpace($c).'/' : ''
, 'page-'.($pg + 1).'/'
, '"'
, (empty($title)) ? '' : ' title="'.$title.'"'
, '>'
, $thing
,'</a>');

To print cleaner category links, change Line 704 from:

return '<a href="'.$pfr.'?c='.$thisarticle['category1'].'">'.

to:

return '<a href="'.$pfr.'-/'.stripSpace($thisarticle['category1']).'/">'.

And change Line 717 from:

return '<a href="'.$pfr.'?c='.$thisarticle['category2'].'">'.

To:

return '| <a href="'.$pfr.'-/'.stripSpace($thisarticle['category2']).'/">'.

Now, the mod_rewrite rules I use at this site are given below. This is the one part of the solution I am not 100% on. My rules may not be perfect, and there may be better ways to do this. The rules are straight forward enough if you understand how regular expressions works, if not, you will just have to belive my voodoo magic. The rules below will map categories whose names have at most 3 words. You will need to add rules if you have categories with more then 3 words in their names. The rules you will need to add are in the rules that match categories.

Add these rules to your .htaccess file:

RewriteEngine on
RewriteBase /

#Check for existing file or directory
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.*) %{REQUEST_FILENAME} [L]

# RSS and Atom feeds.
RewriteRule ^(rss)/?$   index.php?rss="1"   [T=application/x-httpd-php,L]
RewriteRule ^(atom)/?$  index.php?atom="1"  [T=application/x-httpd-php,L]

# paged index
RewriteRule ^(page-)([0-9]+)/?$ index.php?pg=$2 [T=application/x-httpd-php,L]

# individual article
RewriteRule ^([A-Za-z0-9_]+)/([0-9]+)/[^.]*/?$ index.php?id=$2 [T=application/x-httpd-php,L]

# section, no category
RewriteRule ^([A-Za-z0-9_]+)/?$ index.php?s=$1 [T=application/x-httpd-php,L]
RewriteRule ^([A-Za-z0-9_]+)/page-([0-9]+)/?$ index.php?s=$1&pg=$2 [T=application/x-httpd-php,L]

# category, no section.
RewriteRule ^-/([A-Za-z0-9]+)/?$ index.php?c=$1 [T=application/x-httpd-php,L]
RewriteRule ^-/([A-Za-z0-9]+)/page-([0-9]+)/?$ index.php?c=$1&pg=$2 [T=application/x-httpd-php,L]
RewriteRule ^-/([A-Za-z0-9]+)-([A-Za-z0-9]+)/?$ index.php?c=$1+$2 [T=application/x-httpd-php,L]
RewriteRule ^-/([A-Za-z0-9]+)-([A-Za-z0-9]+)/page-([0-9]+)/?$ index.php?c=$1+$2&pg=$3 [T=application/x-httpd-php,L]
RewriteRule ^-/([A-Za-z0-9]+)-([A-Za-z0-9]+)-([A-Za-z0-9]+)/?$ index.php?c=$1+$2+$3 [T=application/x-httpd-php,L]
RewriteRule ^-/([A-Za-z0-9]+)-([A-Za-z0-9]+)-([A-Za-z0-9]+)/page-([0-9]+)/?$ index.php?c=$1+$2+$3&pg=$4 [T=application/x-httpd-php,L]

#section and category
RewriteRule ^([A-Za-z0-9_]+)/([A-Za-z0-9]+)/?$ index.php?s=$1&c=$2 [T=application/x-httpd-php,L]
RewriteRule ^([A-Za-z0-9_]+)/([A-Za-z0-9]+)/page-([0-9]+)/?$ index.php?s=$1&c=$2&pg=$3 [T=application/x-httpd-php,L]
RewriteRule ^([A-Za-z0-9_]+)/([A-Za-z0-9]+)-([A-Za-z0-9]+)/?$ index.php?s=$1&c=$2+$3 [T=application/x-httpd-php,L]
RewriteRule ^([A-Za-z0-9_]+)/([A-Za-z0-9]+)-([A-Za-z0-9]+)/page-([0-9]+)/?$ index.php?s=$1&c=$2+$3&pg=$4 [T=application/x-httpd-php,L]
RewriteRule ^([A-Za-z0-9_]+)/([A-Za-z0-9]+)-([A-Za-z0-9]+)-([A-Za-z0-9]+)/?$ index.php?s=$1&c=$2+$3+$4 [T=application/x-httpd-php,L]
RewriteRule ^([A-Za-z0-9_]+)/([A-Za-z0-9]+)-([A-Za-z0-9]+)-([A-Za-z0-9]+)/page-([0-9]+)/?$ index.php?s=$1&c=$2+$3+$4&pg=$5 [T=application/x-httpd-php,L]

That should be all you need to do to have cleaner URLs. I'm waiting to see what Dean plans to do with the URL titles stored with messages. Once the next version of Textpattern is out, I will update these instructions.

|  

Comments

  1. That’s pretty hardcore TXP editing, but well worth it.

    Maybe I’ll try it, eventually, although I’m in the midst of a redesign.

  2. I’m trying to get this damn message to display properly. Textile does a shit job of formatting code.

    I finished university a week ago, and am unemployed and bored. That’s why I work on this website all the time. Not much else to do.

  3. I play videogames and bake bread. I’m sure there’s a bread baking videogame that I could import from Japan if I looked long enough, then I’d just need a catheter.

  4. I am going to try this out, if it works, I will be the happiest man alive!

  5. Looks interesting.

    Could you format the mod into the phpBB MOD format?

    http://www.phpbb.com/phpBB/viewtopic.php?t=11549

    Or when you post your updated one. I’ll be back :)

    Douglas

  6. Update! Make it work for version 1.19! PLEASE!!!

  7. What I’ve posted here does work with 1.19. At least it should. I didn’t outline how I got rid of the ID numbers in the URLs. That’s a bit more involved. Well not a bit, a lot. I was really planning on writing up what is involved when the next (final) version of Textpattern is out.

  8. The website is sexy! I’m having such a b&*(&^ of a time with clean URL’s. If you ever get a hankering to help out a fellow Torontonian I’d definately love a comforting hand to go along with my txp clean url sorrows.

  9. Worked like a charm, thanks for your help on the forums ;)

    http://forum.textpattern.com/viewtopic.php?id=4955

  10. Everything works fine for me, except the main page. Going to the top-level-domain and the index.php returns a “no page template specified for section”. How do I get around this?

  11. There may be a problem with your rewrite rule. Perhaps you are setting the section variable s all the time. There could be other problems. I can only guess.

  12. Is there anything that I can do for the rewrite? All that I did was copy what you have here exactly.

  13. Well I fixed it. I just changed index.php to say
    < ?php
    include ”./textpattern/config.php”;
    $s = “default”;
    include $txpcfg[‘txpath’].”/publish.php”;
    textpattern();
    ? >

  14. oops sorry… disregard that last one, it makes every page go to the default

  15. anyone know of a way to get the rewrite to work on an IIS server? Need to know because I unfortunately have a site that has to stay on an IIS server

  16. You may want to check out Zem’s rewrite plug-in, it does all the heavy lifting inside textpattern, instead of at the webserver. I don’t think Zem’s plugin accomplishes everything I have done at this site with hacks to the source code, but I think for most people it is a very easy route for cleaner URLs.

  17. pretty article …
    but i have a bit Q ..
    how can i install mod_rewrite on my personal server….

  18. I have absolutely no idea. You’d need to scour the web or ask a system’s administrator.

  19. how to do like this ?

    http://forum.textpattern.com/viewtopic.php?pid=59050#p59050

  20. How do I like what? I already have clean URLs, ones that work with dates as well in fact.

  21. Thank you very much

Don't be shy, you can comment too!

 
Some things to keep in mind: You can style comments using Textile. In particular, *text* will get turned into text and _text_ will get turned into text. You can post a link using the command "linktext":link, so something like "google":http://www.google.com will get turned in to google. I may erase off-topic comments, or edit poorly formatted comments; I do this very rarely.