XmlDocument Documentation
The XmlDocument
class extends PHP’s native DOMDocument
with additional functionality for XML manipulation, transformation, and querying. This class is the core component for working with XML documents in the Derafu XML library.
- Key Features
- Basic Usage
- XPath Querying
- Array Conversion
- Working with XML Digital Signatures
- Handling Special Characters and Entities
- Advanced Canonicalization
- Error Handling
- Performance Considerations
Key Features
- Extended XML loading with character encoding handling.
- Simplified access to XML metadata (namespace, schema).
- Canonicalization support with encoding options.
- XPath query integration.
- Array conversion capabilities.
- Signature node handling for XML digital signatures.
Basic Usage
Creating a New Document
use Derafu\Xml\XmlDocument;
// Create with default version (1.0) and encoding (ISO-8859-1).
$document = new XmlDocument();
// Or specify version and encoding.
$document = new XmlDocument('1.0', 'UTF-8');
Loading XML Content
// Load from a string.
$xmlString = '<root><element>value</element></root>';
$document->loadXml($xmlString);
// The method handles encoding conversion automatically.
$utf8Xml = '<?xml version="1.0" encoding="UTF-8"?><root><element>Árbol</element></root>';
$document->loadXml($utf8Xml); // Will convert to ISO-8859-1 if needed.
Accessing Document Information
// Get the root element name.
$rootName = $document->getName(); // e.g., "root"
// Get the XML namespace (if any).
$namespace = $document->getNamespace(); // e.g., "http://example.com"
// Get the schema location (if any).
$schema = $document->getSchema(); // e.g., "schema.xsd"
Saving and Serializing
// Get the complete XML document with declaration.
$xmlString = $document->saveXml();
// Get only the XML content without the declaration.
$xmlContent = $document->getXml();
// Get the canonicalized (C14N) version.
$canonXml = $document->C14N();
// Get canonicalized version with ISO-8859-1 encoding.
$isoCanonXml = $document->C14NWithIso88591Encoding();
// Get flattened canonicalized version (whitespace removed between tags).
$flatXml = $document->C14NWithIso88591EncodingFlattened();
XPath Querying
The XmlDocument class has integrated XPath querying capabilities:
// Execute an XPath query and get result as string or array.
$result = $document->query('/root/element');
// Get DOMNodeList from an XPath query.
$nodes = $document->getNodes('/root/element');
// Use parameters in XPath queries.
$params = ['id' => '123'];
$result = $document->query('/root/element[@id=:id]', $params);
Array Conversion
// Convert the entire document to an array.
$array = $document->toArray();
// Access a specific part using dot notation.
$value = $document->get('root.element');
// With a default value if not found.
$value = $document->get('root.missing', 'default value');
Working with XML Digital Signatures
If the document contains an XML digital signature, you can extract it:
// Get the signature node as XML.
$signatureXml = $document->getSignatureNodeXml();
// Returns null if no signature is present.
if ($signatureXml !== null) {
// Process signature...
}
Handling Special Characters and Entities
The XmlDocument
class works with the XmlHelper
to properly handle special characters and entities:
// When saving XML, entities are handled correctly.
$document->loadXml('<root><element>Text with & < > " \'</element></root>');
$xml = $document->saveXml();
// Produces: <root><element>Text with & < > " '</element></root>
Advanced Canonicalization
Working with Specific Nodes
You can apply canonicalization to specific parts of the document:
// Canonicalize only a portion of the document.
$canonXml = $document->C14NWithIso88591Encoding('/root/section');
Differences Between Canonicalization Methods
The class offers several canonicalization methods:
- C14N(): Standard canonicalization, always outputs UTF-8.
- C14NWithIso88591Encoding(): Canonicalization with conversion to ISO-8859-1.
- C14NWithIso88591EncodingFlattened(): ISO-8859-1 canonicalization with whitespace removal.
These methods are particularly useful when working with digital signatures or when XML needs to be processed by systems with specific encoding requirements.
Error Handling
The loadXml()
method throws descriptive exceptions when it encounters errors:
use Derafu\Xml\Exception\XmlException;
try {
$document->loadXml($potentiallyInvalidXml);
} catch (XmlException $e) {
echo "Error loading XML: " . $e->getMessage();
$errors = $e->getErrors(); // Get detailed error information if available.
}
Performance Considerations
- When working with large XML documents, prefer using XPath queries to extract specific sections rather than converting the entire document to an array.
- The canonicalization methods perform more processing, so use them only when needed.
- The
getXml()
method is more efficient thansaveXml()
when you only need the XML content without the declaration.