PostNuke

Flexible Content Management System

News

Using PostNuke and MySQL in Unicode (UTF-8)

Contributed by on Jun 06, 2007 - 08:28 PM

MySql and UTF-8



by: Bernd Plagge

www.choicenet.ne.jp



MySql introduced support for different character sets and collating rules in version 4.1. While this has many advantages databases sometimes need to be migrated and then we have to ensure that the correct character code is set for the new database. We are mainly using UTF-8 and hence you may have to adapt this information to your situation!



What is the problem?



MySql uses as default Latin1, swedish flavour.

How do we check this?



run mysql.

mysql> show variables;



How to check the character set for a particular database?



run mysql.

mysql> show create table <tablename>





The character code can be set in various places - so where is the best place for this?





We always use UTF-8

------------------------

In this case it makes sense to set the character code for the whole server to UTF-8. MySql actually sets the character code at different levels:

server

client

database connection

database



The character encoding for server, client and the connection work hand in hand to ensure that the data is interpreted correctly. It seems that MySql makes no effort to determine the correct character set but will convert data in accordance with it's configuration. This leads to problems for old databases if the database character code differes from the one configured.



The easiest way to configure MySql for UTF-8 is to put the configuration inot the main configuration file (/etc/mysql/my.cnf).

In my.cnf there are different file sections. So you can't put it just anywhere.



[mysqld]

character-set-server=utf8



[client]

default-character-set=utf8





Setting UTF-8 for one Database

--------------------------------------

You can check the default character code for a database either by dumping the data and then changing the schema file or you can do this online.

You use the 'alter table' statement to change the database using mysql.



run mysql

alter database <xvy>

default character set utf-8

default collotion_name utf8-general_ci;



note: you may omit the word 'default'.





It is also worth noting that e.g. mysqldump silently adds the default character code when exporting data!

This can causes problems if the character code defined and the database content are different. In such cases MySqldump will try to convert data to it's default character code!

However, you can disable that by adding the character code option when using MySqldump.

e.g. mysqldump --set-charset.



In the same way you may explicitely set the character set.



mysqlimport --default-character-set = utf8

mysqladmin --defautl-character-set = utf8 create <db>





Setting the character code at server start

-----------------------------------------------------

You can start the server with:

character-set-server

collation-server



The current values can be determined with the command



run MySql:

mysql> show variables;





Values may be changed with commands like:



set character_set_server = utf8;

set collation_server = utf8_unicode_ci





Setting the character code within Applications

----------------------------------------------------------

You want ensure that the correct character code is set when connecting to a particular database.



Connect to the database and issue the following SQL command:



mysql>SET NAMES utf8;





SET NAMES is equivalent to the 3 commands:

character_set_client

character_set_connection

character_set_results





How to run PostNuke in UTF-8

-----------------------------------

Several things have to fall into place for this to work correctly.



1) the database encoding needs to be set to UTF-8

2) the application language needs to be encoded in UTF-8

3) PostNuke needs to be told to use UTF-8



ad 1)

see above for details!



ad 2)

The character code for a given language is set in language/<lang>/global.php (PN 0.76x) or in language/<lang>/core.php (PN 0.8). Search and adjust the following 3 define strings:

define('_CHARSET','UTF-8');

define('_LOCALE','en_US');

define('_LOCALEWIN','eng');



Of course it is not sufficient just to change the _CHARSET to UTF-8. The text strings themselves need to be encoded in UTF-8. This can be done in a number of ways:

a) use 'recode' or 'iconv' on every language file

b) use the pnlwb (PN Language Workbench) to extract and convert all language files



ad 3)

Following the database initialization you need to issue a "set names to 'UTF-8'" command. As this is done in includes/pnAPI.php we need to patch that file.



PN 0.76x

----------



// load security functions.

include 'includes/pnSecurity.php';



// Load our language files

include 'includes/pnLang.php';

pnLangLoad();



//bplagge 2006-01-31 - charset fix for new MySql version

$dbconn =& pnDBGetConn(true);

$info=$dbconn->ServerInfo();

if ($dbconn->ErrorNo() != 0) {

echo "Error: "; echo $dbconn->ErrorNo();

}

// print_r($info);

$c = _CHARSET;

// Mysql uses non-standard name for UTF-8!

if ($c == 'UTF-8')

$c = 'UTF8';

$query = sprintf('SET NAMES \'%s\'', $c);

$dbconn->Execute($query);

if ($dbconn->ErrorNo()!=0) {

echo "Error: "; echo $dbconn->ErrorMsg();

echo "check pnAPI.php";

}

}

// end bplagge - mysql charset adjustment





PN 0.8

-------



if ($stages & PN_CORE_LANGS) {

// Load our language files

pnLangLoad();

}



# bp 2007-06-01

# need to "set names '<charset>' " to ensure correct data handling

$query = sprintf('SET NAMES \'%s\'', 'UTF8');

$result = DBUtil::executeSQL($query, -1, -1, true, true);



Please note that the MySQL character code names are NON STANDARD!



How to check?

----------------

I always found that phpMyAdmin is an excellent tool for MySQL databases. If the data entered in PostNuke is also correctly displayed in phpMyAdmin table browse mode I'm quite confident that everything is fine.









4639