Fixing double-encoded UTF-8 data in MySQL


Double-encoded UTF-8 texts (not to mention triple-, quadruple- and so on) are a fairly common problem when dealing with MySQL. This may be due to the fact that the default character set of the connection to the server is Latin-1, but that is not relevant once the data is already corrupt.

Here is how to fix it, in two simple steps, using the mysqldump and mysql commands:

mysqldump -h DB_HOST -u DB_USER -p DB_PASSWORD --opt --quote-names \
    --skip-set-charset --default-character-set=latin1 DB_NAME > DB_NAME-dump.sql

mysql -h DB_HOST -u DB_USER -p DB_PASSWORD \
    --default-character-set=utf8 DB_NAME < DB_NAME-dump.sql

Of course, you should first replace DB_HOST, DB_USER, DB_PASSWORD and DB_NAME with values, corresponding to your database setup.

Comments are closed.