ext/dom and libxml2 charset and entities behaviors

In case you are unaware, there is [as of PHP 5.1.0] a second argument to the DomDocument->SaveXML() method.

This argument currently only supports one value which is the constant LIBXML_NOEMPTYTAGS. This option makes sure that you do not end up with <tag /> but instead, <tag></tag>. This can make things easier if you need more predictable text to perform other changes on later.

However, in playing around with the option, I noticed that my markup changed somewhat significantly in size (it’s a large document). Some further playing yields that the following six uses of DomDocument->SaveXML() yield different results:

&#xA0; is a non-breaking space character (in HTML &nbsp;). ext/dom Defaults to UTF-8

<?php
$dom = DOMDocument::loadXML("<xml><test /> </xml>");

echo   $dom->saveXML();
/*
Default behavior, entities stay as entities, no encoding added to the XML prolog
<?xml version="1.0"?>
<xml><test/> </xml>
*/

echo $dom->saveXML($dom->documentElement);
/*
Entities are transformed to output charset, no XML prolog
<xml><test/>[nbsp char]</xml>
*/

echo $dom->saveXML($dom);
/*
Entities are transformed to output charset, encoding added to the XML prolog
<?xml version="1.0" encoding="UTF-8"?>
<xml><test/>[nbsp char]</xml>
*/

echo $dom->saveXML($dom->documentElement, LIBXML_NOEMPTYTAG);
/*
Entities are transformed to output charset, no XML prolog, tags expanded
<xml><test></test>[nbsp char]</xml>
*/

echo $dom->saveXML($dom, LIBXML_NOEMPTYTAG);
/*
Entities are transformed to output charset, encoding added to the XML prolog, tags expanded
<?xml version="1.0" encoding="UTF-8"?>
<xml><test></test>[nbsp char]</xml>
*/

echo $dom->saveXML(null, LIBXML_NOEMPTYTAG);
/*
Entities stay as entities, no encoding added to the XML prolog, tags expanded
<?xml version="1.0"?>
<xml><test></test> </xml>
*/
?>

Just something to keep in mind next time you’re fooling around with the DOM.

- Davey

Comments are closed.

Twitter

@JillyEnFuego it's where they pretend a bunch of folks survived the apocalypse and what they have to do to survive; reality style

@dshafik [11 hours ago]

@dshafik Oooh! See that's why I have to DVR things, I can't keep up with when shows come on.

@tattooedmommie [13 hours ago]

@tattooedmommie it started last night...

@dshafik [13 hours ago]

@dshafik hahahaha.. doesn't everyone?

@beth_warren [13 hours ago]

@david973 I've known her 14 years; so I'm not far behind :P

@dshafik [14 hours ago]

Books & Things