Jan
8
Published by Kieran in Computer Science
ยท Leave your thoughts
( words)
( words)
Recently when using PHP’s built in DOMDocument class I noticed it was mangling some UTF-8 encoded Chinese characters.
After some time I realised that DOMDocument::loadHTML will treat the input as being in ISO-8859-1 unless told otherwise. The following resolved the problem:
$html = mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'); $dom = new DOMDocument; @$doc->loadHTML($html);
Source:
http://stackoverflow.com/questions/8218230/php-domdocument-loadhtml-not-encoding-utf-8-correctly