PHP DomDocument UTF-8 Mangled

Find an article
Jan 8
Published by in ยท Leave your thoughts
( words)
Warning! There was an error loading some of the images for this post.

Recently when using PHP’s built in DOMDocument class I noticed it was mangling some UTF-8 encoded Chinese characters.

After some time I realised that DOMDocument::loadHTML will treat the input as being in ISO-8859-1 unless told otherwise. The following resolved the problem:

$html = mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8');

$dom = new DOMDocument;
@$doc->loadHTML($html);


Source:
http://stackoverflow.com/questions/8218230/php-domdocument-loadhtml-not-encoding-utf-8-correctly

Leave a Reply

Your email address will not be published.