PostNuke

Flexible Content Management System

News

Search engine friendly URLs revisited.

Contributed by I started to spell c on Apr 04, 2002 - 01:45 AM

Why common methods to enable search engine friendly dynamic websites do not apply for PostNuke


Search engine friendly URLs for dynamic websites are always a hot subject.


There are several nice solutions: PHPBuilder dealed with this subject and the force type solution several times [1 2]. However, PostNuke can't benefit from Tim's solution, because all paths in PostNuke are relative. So, if you have an URL such as http://www.postnuke.com/article/1/, postnuke will try to find the topic image in http://www.postnuke.com/article/images/topics/some_fancy_topic.gif, but the picture is located in http://www.postnuke.com/images/topics/some_fancy_topic.gif


So, without a core rewrite that would change all relative to absolute paths, this method does not apply for us.




Prerequisites for my proposed method


E. Soysal has covered this subject twice for Nuke-Sites [1 2].


A lot has changed in PostNuke and the way URLs are being built since then, so it might be a good idea to revisit this subject.


To successfully use this method, you need apache as a webserver and mod_rewrite enabled. Now, if you don't know if you have this on your host, please ask your system administrator. If not, it might be an idea to collect hosts that have mod_rewrite enabled in the comments to this article. Don't forget: you can always ask and talk to your server admins and maybe even convince them. :)




What we want to achieve


Did you ever try to tell your friends about a Web_Links directory on a postnuke site? Chances are high that you gave up on it, because the URL was too long.


Did you look up your site on Google? How much of your site was really indexed by Google? It's very likely that if you are using 0.7+, only the mainpage was indexed by Google.


So, our goal is to make your site more user friendly and search engine friendly at the same time, without touching the PostNuke core or slowing down your system's performance too much.


So, we want to have an URL like http://www.postnuke.com/Web_Links.html rather than http://www.postnuke.com/modules.php?op=modload&name=Web_Links&file=index.




Time to dive into the code


You may wonder how to achieve our goal without touching the core. Well, themes are not considered as core files, so themes/yourtheme/theme.php is our candidate for a hack. :)


There are 4 steps that you need to take:




1) We will first start to buffer your output. We do this by adding the following line in your function themeheader:


ob_start();




2) Now, go to the end of your themefooter function and add those lines:




$contents = ob_get_contents(); // store buffer in $contents


ob_end_clean(); // delete output buffer and stop buffering


echo replace_for_mod_rewrite($contents); //display modified buffer to screen




3) replace_for_mod_rewrite() needs to be explained as well:


Add the following function above your first function in your theme:





function replace_for_mod_rewrite(&$s)


{


$in = array("'(?<!/)modules.php\?op=modload&name=News&file=article&sid=([0-9]*)&mode=([a-zA-Z]*)&order=([0-9]*)&thold=([0-9]*)'",


"'(?<!/)modules.php\?op=modload&name=News&file=index&catid=&topic=([1-9][0-9]*)&allstories=1'",


"'(?<!/)modules.php\?op=modload&name=News&file=index&catid=&topic=([1-9][0-9]*)'",


"'(?<!/)modules.php\?op=modload&name=Sections&file=index&req=listarticles&secid=([1-9][0-9]*)'",


"'(?<!/)modules.php\?op=modload&name=Sections&file=index&req=viewarticle&artid=([1-9][0-9]*)&page=([1-9][0-9]*)'",


"'(?<!/)modules.php\?op=modload&name=Sections&file=index&req=printpage&artid=([1-9][0-9]*)'",


"'(?<!/)modules.php\?op=modload&name=NS-Polls&file=index&req=results&pollID=([0-9]*)&mode=thread&order=0&thold=0'",


"'(?<!/)modules.php\?op=modload&name=([^&]*)&file=index&([a-zA-Z0-9_.;+&]*)=([a-zA-Z0-9_.;+&]*)&([a-zA-Z0-9_.;+&]*)=([a-zA-Z0-9_.;+&]*)&([a-zA-Z0-9_.;+&]*)=([a-zA-Z0-9_.;+&\[\] ]*)&([a-zA-Z0-9_.;+&\[\] ]*)=([a-zA-Z0-9_.;+&\[\] ]*)'",


"'(?<!/)modules.php\?op=modload&name=([^&]*)&file=index&([a-zA-Z0-9_.&]*)=([a-zA-Z0-9_.&]*)&([a-zA-Z0-9_.&]*)=([a-zA-Z0-9_.&]*)&([a-zA-Z0-9_.&]*)=([a-zA-Z0-9_.&\[\] ]*)&([a-zA-Z0-9_.&\[\] ]*)=([a-zA-Z0-9_.&\[\] ]*)'",


"'(?<!/)modules.php\?op=modload&name=([^&]*)&file=index&([a-zA-Z0-9]*)=([a-zA-Z0-9]*)&([a-zA-Z0-9_.;+&]*)=([a-zA-Z0-9_.;+&]*)&([a-zA-Z0-9_.;+&]*)=([a-zA-Z0-9_.;&+]*)'",


"'(?<!/)modules.php\?op=modload&name=([^&]*)&file=index&([a-zA-Z0-9]*)=([a-zA-Z0-9]*)&([a-zA-Z0-9]*)=([a-zA-Z0-9]*)'",


"'(?<!/)modules.php\?op=modload&name=([^&]*)&file=index&([a-zA-Z0-9&]*)=([a-zA-Z0-9&]*)'",


"'(?<!/)modules.php\?op=modload&name=([^&]*)&file=index'",


"'(?<!/)modules.php\?op=modload&name=([^&]*)&file=([a-zA-Z0-9]*)'",


"'(?<!/)print.php\?sid=([0-9]*)'"


);


$out = array("article\\1.html",


"Topic\\1-all.html",


"Topic\\1.html",


"Sections\\1.html",


"Sections-article\\1-page\\2.html",


"Sections-print-article\\1.html",


"NS-Polls-results-\\1.html",


"\\1-\\2-\\3-\\4-\\5-\\6-\\7-\\8-\\9-\10-\11.html",


"\\1-\\2-\\3-\\4-\\5-\\6-\\7-\\8-\\9.html",


"\\1-\\2-\\3-\\4-\\5-\\6-\\7.html",


"\\1-\\2-\\3-\\4-\\5.html",


"\\1-\\2-\\3.html",


"\\1.html",


"\\1-\\2.html",


"print\\1.html"


);


$s = preg_replace($in, $out, $s);


return $s;


}





That's a lot of code, but it covers almost all typical PostNuke links and it modifies the important article and Section links.


In $in, we have an array of patterns that we want to replace. In $out, we have the array of new patterns.




So, I'll try to explain what some of these patterns do, so that you can start from there to do your own changes or extensions to this.





"'(?<!/)modules.php\?op=modload&name=News&file=article&sid=([0-9]*)&mode=([a-zA-Z]*)&order=([0-9]*)&thold=([0-9]*)'"





- (?<!/) is an assertion that I happily give Jim McDonald credit for. It means that this pattern will only be valid, if it does not start with a preceding slash and helps us to convert only links inside your PostNuke site (so links to external sites will not be converted).


- Notice that the question mark needs to be escaped so that it looks like this: modules.php\?op


- Another important thing to remember is that ampersands in links have to be
&
(and not just &). The new API strictly follows and supports this already, but if you still find links in your site that aren't converted properly, it is most likely because old modules have used just &.


- sid=([0-9]*) indicates that you may have any digit from 0 to 9 after sid=, the occurrence may be 0 to unlimited times.


- mode=([a-zA-Z]*) indicates that you may have any alphabetical character after mode=, from 0 to unlimited times.




So, we have managed to convert the links on the fly, what is missing now is the appropriate apache directives.




4) Basically all converted URL-calls need to be reverted by apache with mod_rewrite. You will only need a .htaccess file for this. For your convenience, just copy and paste the code below into a .htaccess and upload it to the webroot of your PostNuke installation.








RewriteEngine On




#Articles


RewriteRule ^article([1-9][0-9]*).* modules.php?op=modload&name=News&file=article&sid=$1




#Topics


RewriteRule ^Topic([1-9][0-9]*)-all.* modules.php?op=modload&name=News&file=index&catid=&topic=$1&allstories=1


RewriteRule ^Topic([1-9][0-9]*).* modules.php?op=modload&name=News&file=index&catid=&topic=$1




#FAQ


RewriteRule ^FAQ([1-9][0-9]*)-([0-9]*).* modules.php?op=modload&name=FAQ&file=index&myfaq=yes&id_cat=$1




#Sections


RewriteRule ^Sections([1-9][0-9]*).* modules.php?op=modload&name=Sections&file=index&req=listarticles&secid=$1


RewriteRule ^Sections-article([1-9][0-9]*)-page([1-9][0-9]*).* modules.php?op=modload&name=Sections&file=index&req=viewarticle&artid=$1&page=$2


RewriteRule ^Sections-print-article([1-9][0-9]*).* modules.php?op=modload&name=Sections&file=index&req=printpage&artid=$1




#NS-type modules


RewriteRule ^NS-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*).* modules.php?op=modload&name=NS-$1&file=index&$2=$3&$4=$5


RewriteRule ^NS-Polls-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*).html modules.php?op=modload&name=NS-Polls&file=index&req=$1&pollID=$2


RewriteRule ^NS-Polls-([a-zA-Z0-9_]*).html modules.php?op=modload&name=NS-Polls&file=index&pollID=$1


RewriteRule ^NS-([a-zA-Z0-9_]*).html modules.php?op=modload&name=NS-$1&file=index




#General Stuff


RewriteRule ^([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*).* modules.php?op=modload&name=$1&file=index&$2=$3&$4=$5&$6=$7&$8=$9


RewriteRule ^([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*).* modules.php?op=modload&name=$1&file=index&$2=$3&$4=$5&$6=$7


RewriteRule ^([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*).* modules.php?op=modload&name=$1&file=index&$2=$3&$4=$5


RewriteRule ^([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*).* modules.php?op=modload&name=$1&file=index&$2=$3


RewriteRule ^([a-zA-Z0-9_]*)-([a-zA-Z0-9_]*).html modules.php?op=modload&name=$1&file=$2


RewriteRule ^([a-zA-Z0-9_]*).html modules.php?op=modload&name=$1&file=index








Finally, you can see my proposed solution in action on the following (test) sites so far:


http://www.mountainwatersspa.com/


http://www.lobosoft.com/




I'm not claiming that my code is the best solution, maybe even not for this method, but it works.


So, there you have an example and some basic instructions, now go and conquer Google et al and don't forget to mention other hosting companies that support mod_rewrite or make suggestions on how to make this better. :)
26047