PHP DomDocument UTF-8 Mangled

Jan 8
Recently when using PHP’s built in DOMDocument class I noticed it was mangling some UTF-8 encoded Chinese characters.

After some time I realised that DOMDocument::loadHTML will treat the input as being in ISO-8859-1 unless told otherwise. The following resolved the problem:

$html = mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8');

$dom = new DOMDocument;


