Accessing document data

Basic traversal functions

The internal representation of the document is a tree, where each node has a list of child nodes (the order of children corresponds to their order in the XML representation), and additionally element nodes have a list of attributes, which is also ordered. Several functions are provided in order to let you get from one node in the tree to the other. These functions roughly correspond to the internal representation, and thus are usually building blocks for other methods of traversing (i.e. XPath traversals are based on these functions).

xml_node xml_node::parent() const;
xml_node xml_node::first_child() const;
xml_node xml_node::last_child() const;
xml_node xml_node::next_sibling() const;
xml_node xml_node::previous_sibling() const;

xml_attribute xml_node::first_attribute() const;
xml_attribute xml_node::last_attribute() const;
xml_attribute xml_attribute::next_attribute() const;
xml_attribute xml_attribute::previous_attribute() const;

parent function returns the node's parent; all non-null nodes except the document have non-null parent. first_child and last_child return the first and last child of the node, respectively; note that only document nodes and element nodes can have non-empty child node list. If node has no children, both functions return null nodes. next_sibling and previous_sibling return the node that's immediately to the right/left of this node in the children list, respectively - for example, in <a/><c/>, calling next_sibling for a handle that points to  results in a handle pointing to <c/>, and calling previous_sibling results in handle pointing to <a/>. If node does not have next/previous sibling (this happens if it is the last/first node in the list, respectively), the functions return null nodes. first_attribute, last_attribute, next_attribute and previous_attribute functions behave similarly to the corresponding child node functions and allow to iterate through attribute list in the same way.

	Note
	Because of memory consumption reasons, attributes do not have a link to their parent nodes. Thus there is no `xml_attribute::parent()` function.

Calling any of the functions above on the null handle results in a null handle - i.e. node.first_child().next_sibling() returns the second child of node, and null handle if node is null, has no children at all or if it has only one child node.

With these functions, you can iterate through all child nodes and display all attributes like this (samples/traverse_base.cpp):

for (pugi::xml_node tool = tools.first_child(); tool; tool = tool.next_sibling())
{
    std::cout << "Tool:";

    for (pugi::xml_attribute attr = tool.first_attribute(); attr; attr = attr.next_attribute())
    {
        std::cout << " " << attr.name() << "=" << attr.value();
    }

    std::cout << std::endl;
}

Getting node data

Apart from structural information (parent, child nodes, attributes), nodes can have name and value, both of which are strings. Depending on node type, name or value may be absent. node_document nodes do not have a name or value, node_element and node_declaration nodes always have a name but never have a value, node_pcdata, node_cdata, node_comment and node_doctype nodes never have a name but always have a value (it may be empty though), node_pi nodes always have a name and a value (again, value may be empty). In order to get node's name or value, you can use the following functions:

const char_t* xml_node::name() const;
const char_t* xml_node::value() const;

In case node does not have a name or value or if the node handle is null, both functions return empty strings - they never return null pointers.

It is common to store data as text contents of some node - i.e. <node><description>This is a node</description></node>. In this case, <description> node does not have a value, but instead has a child of type node_pcdata with value "This is a node". pugixml provides several helper functions to parse such data:

const char_t* xml_node::child_value() const;
const char_t* xml_node::child_value(const char_t* name) const;
xml_text xml_node::text() const;

child_value() returns the value of the first child with type node_pcdata or node_cdata; child_value(name) is a simple wrapper for child(name).child_value(). For the above example, calling node.child_value("description") and description.child_value() will both produce string "This is a node". If there is no child with relevant type, or if the handle is null, child_value functions return empty string.

text() returns a special object that can be used for working with PCDATA contents in more complex cases than just retrieving the value; it is described in Working with text contents sections.

There is an example of using some of these functions at the end of the next section.

Getting attribute data

All attributes have name and value, both of which are strings (value may be empty). There are two corresponding accessors, like for xml_node:

const char_t* xml_attribute::name() const;
const char_t* xml_attribute::value() const;

In case the attribute handle is null, both functions return empty strings - they never return null pointers.

If you need a non-empty string if the attribute handle is null (for example, you need to get the option value from XML attribute, but if it is not specified, you need it to default to "sorted" instead of ""), you can use as_string accessor:

const char_t* xml_attribute::as_string(const char_t* def = "") const;

It returns def argument if the attribute handle is null. If you do not specify the argument, the function is equivalent to value().

In many cases attribute values have types that are not strings - i.e. an attribute may always contain values that should be treated as integers, despite the fact that they are represented as strings in XML. pugixml provides several accessors that convert attribute value to some other type:

int xml_attribute::as_int(int def = 0) const;
unsigned int xml_attribute::as_uint(unsigned int def = 0) const;
double xml_attribute::as_double(double def = 0) const;
float xml_attribute::as_float(float def = 0) const;
bool xml_attribute::as_bool(bool def = false) const;
long long xml_attribute::as_llong(long long def = 0) const;
unsigned long long xml_attribute::as_ullong(unsigned long long def = 0) const;

as_int, as_uint, as_llong, as_ullong, as_double and as_float convert attribute values to numbers. If attribute handle is null or attribute value is empty, def argument is returned (which is 0 by default). Otherwise, all leading whitespace characters are truncated, and the remaining string is parsed as an integer number in either decimal or hexadecimal form (applicable to as_int, as_uint, as_llong and as_ullong; hexadecimal format is used if the number has 0x or 0X prefix) or as a floating point number in either decimal or scientific form (as_double or as_float). Any extra characters are silently discarded, i.e. as_int will return 1 for string "1abc".

In case the input string contains a number that is out of the target numeric range, the result is undefined.

	Caution
	Number conversion functions depend on current C locale as set with `setlocale`, so may return unexpected results if the locale is different from `"C"`.

as_bool converts attribute value to boolean as follows: if attribute handle is null, def argument is returned (which is false by default). If attribute value is empty, false is returned. Otherwise, true is returned if the first character is one of '1', 't', 'T', 'y', 'Y'. This means that strings like "true" and "yes" are recognized as true, while strings like "false" and "no" are recognized as false. For more complex matching you'll have to write your own function.

	Note
	`as_llong` and `as_ullong` are only available if your platform has reliable support for the `long long` type, including string conversions.

This is an example of using these functions, along with node data retrieval ones (samples/traverse_base.cpp):

for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))
{
    std::cout << "Tool " << tool.attribute("Filename").value();
    std::cout << ": AllowRemote " << tool.attribute("AllowRemote").as_bool();
    std::cout << ", Timeout " << tool.attribute("Timeout").as_int();
    std::cout << ", Description '" << tool.child_value("Description") << "'\n";
}

Contents-based traversal functions

Since a lot of document traversal consists of finding the node/attribute with the correct name, there are special functions for that purpose:

xml_node xml_node::child(const char_t* name) const;
xml_attribute xml_node::attribute(const char_t* name) const;
xml_node xml_node::next_sibling(const char_t* name) const;
xml_node xml_node::previous_sibling(const char_t* name) const;

child and attribute return the first child/attribute with the specified name; next_sibling and previous_sibling return the first sibling in the corresponding direction with the specified name. All string comparisons are case-sensitive. In case the node handle is null or there is no node/attribute with the specified name, null handle is returned.

child and next_sibling functions can be used together to loop through all child nodes with the desired name like this:

for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))

Occasionally the needed node is specified not by the unique name but instead by the value of some attribute; for example, it is common to have node collections with each node having a unique id: <group><item id="1"/> <item id="2"/></group>. There are two functions for finding child nodes based on the attribute values:

xml_node xml_node::find_child_by_attribute(const char_t* name, const char_t* attr_name, const char_t* attr_value) const;
xml_node xml_node::find_child_by_attribute(const char_t* attr_name, const char_t* attr_value) const;

The three-argument function returns the first child node with the specified name which has an attribute with the specified name/value; the two-argument function skips the name test for the node, which can be useful for searching in heterogeneous collections. If the node handle is null or if no node is found, null handle is returned. All string comparisons are case-sensitive.

In all of the above functions, all arguments have to be valid strings; passing null pointers results in undefined behavior.

This is an example of using these functions (samples/traverse_base.cpp):

std::cout << "Tool for *.dae generation: " << tools.find_child_by_attribute("Tool", "OutputFileMasks", "*.dae").attribute("Filename").value() << "\n";

for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))
{
    std::cout << "Tool " << tool.attribute("Filename").value() << "\n";
}

Range-based for-loop support

If your C++ compiler supports range-based for-loop (this is a C++11 feature, at the time of writing it's supported by Microsoft Visual Studio 11 Beta, GCC 4.6 and Clang 3.0), you can use it to enumerate nodes/attributes. Additional helpers are provided to support this; note that they are also compatible with Boost Foreach, and possibly other pre-C++11 foreach facilities.

implementation-defined type xml_node::children() const;
implementation-defined type xml_node::children(const char_t* name) const;
implementation-defined type xml_node::attributes() const;

children function allows you to enumerate all child nodes; children function with name argument allows you to enumerate all child nodes with a specific name; attributes function allows you to enumerate all attributes of the node. Note that you can also use node object itself in a range-based for construct, which is equivalent to using children().

This is an example of using these functions (samples/traverse_rangefor.cpp):

for (pugi::xml_node tool: tools.children("Tool"))
{
    std::cout << "Tool:";

    for (pugi::xml_attribute attr: tool.attributes())
    {
        std::cout << " " << attr.name() << "=" << attr.value();
    }

    for (pugi::xml_node child: tool.children())
    {
        std::cout << ", child " << child.name();
    }

    std::cout << std::endl;
}

Traversing node/attribute lists via iterators

Child node lists and attribute lists are simply double-linked lists; while you can use previous_sibling/next_sibling and other such functions for iteration, pugixml additionally provides node and attribute iterators, so that you can treat nodes as containers of other nodes or attributes:

class xml_node_iterator;
class xml_attribute_iterator;

typedef xml_node_iterator xml_node::iterator;
iterator xml_node::begin() const;
iterator xml_node::end() const;

typedef xml_attribute_iterator xml_node::attribute_iterator;
attribute_iterator xml_node::attributes_begin() const;
attribute_iterator xml_node::attributes_end() const;

begin and attributes_begin return iterators that point to the first node/attribute, respectively; end and attributes_end return past-the-end iterator for node/attribute list, respectively - this iterator can't be dereferenced, but decrementing it results in an iterator pointing to the last element in the list (except for empty lists, where decrementing past-the-end iterator results in undefined behavior). Past-the-end iterator is commonly used as a termination value for iteration loops (see sample below). If you want to get an iterator that points to an existing handle, you can construct the iterator with the handle as a single constructor argument, like so: xml_node_iterator(node). For xml_attribute_iterator, you'll have to provide both an attribute and its parent node.

begin and end return equal iterators if called on null node; such iterators can't be dereferenced. attributes_begin and attributes_end behave the same way. For correct iterator usage this means that child node/attribute collections of null nodes appear to be empty.

Both types of iterators have bidirectional iterator semantics (i.e. they can be incremented and decremented, but efficient random access is not supported) and support all usual iterator operations - comparison, dereference, etc. The iterators are invalidated if the node/attribute objects they're pointing to are removed from the tree; adding nodes/attributes does not invalidate any iterators.

Here is an example of using iterators for document traversal (samples/traverse_iter.cpp):

for (pugi::xml_node_iterator it = tools.begin(); it != tools.end(); ++it)
{
    std::cout << "Tool:";

    for (pugi::xml_attribute_iterator ait = it->attributes_begin(); ait != it->attributes_end(); ++ait)
    {
        std::cout << " " << ait->name() << "=" << ait->value();
    }

    std::cout << std::endl;
}

Caution

Node and attribute iterators are somewhere in the middle between const and non-const iterators. While dereference operation yields a non-constant reference to the object, so that you can use it for tree modification operations, modifying this reference by assignment - i.e. passing iterators to a function like std::sort - will not give expected results, as assignment modifies local handle that's stored in the iterator.

Recursive traversal with xml_tree_walker

The methods described above allow traversal of immediate children of some node; if you want to do a deep tree traversal, you'll have to do it via a recursive function or some equivalent method. However, pugixml provides a helper for depth-first traversal of a subtree. In order to use it, you have to implement xml_tree_walker interface and to call traverse function:

class xml_tree_walker
{
public:
    virtual bool begin(xml_node& node);
    virtual bool for_each(xml_node& node) = 0;
    virtual bool end(xml_node& node);

    int depth() const;
};

bool xml_node::traverse(xml_tree_walker& walker);

The traversal is launched by calling traverse function on traversal root and proceeds as follows:

First, begin function is called with traversal root as its argument.
Then, for_each function is called for all nodes in the traversal subtree in depth first order, excluding the traversal root. Node is passed as an argument.
Finally, end function is called with traversal root as its argument.

If begin, end or any of the for_each calls return false, the traversal is terminated and false is returned as the traversal result; otherwise, the traversal results in true. Note that you don't have to override begin or end functions; their default implementations return true.

You can get the node's depth relative to the traversal root at any point by calling depth function. It returns -1 if called from begin/end, and returns 0-based depth if called from for_each - depth is 0 for all children of the traversal root, 1 for all grandchildren and so on.

This is an example of traversing tree hierarchy with xml_tree_walker (samples/traverse_walker.cpp):

struct simple_walker: pugi::xml_tree_walker
{
    virtual bool for_each(pugi::xml_node& node)
    {
        for (int i = 0; i < depth(); ++i) std::cout << "  "; // indentation

        std::cout << node_types[node.type()] << ": name='" << node.name() << "', value='" << node.value() << "'\n";

        return true; // continue traversal
    }
};

simple_walker walker;
doc.traverse(walker);

Searching for nodes/attributes with predicates

While there are existing functions for getting a node/attribute with known contents, they are often not sufficient for simple queries. As an alternative for manual iteration through nodes/attributes until the needed one is found, you can make a predicate and call one of find_ functions:

template <typename Predicate> xml_attribute xml_node::find_attribute(Predicate pred) const;
template <typename Predicate> xml_node xml_node::find_child(Predicate pred) const;
template <typename Predicate> xml_node xml_node::find_node(Predicate pred) const;

The predicate should be either a plain function or a function object which accepts one argument of type xml_attribute (for find_attribute) or xml_node (for find_child and find_node), and returns bool. The predicate is never called with null handle as an argument.

find_attribute function iterates through all attributes of the specified node, and returns the first attribute for which the predicate returned true. If the predicate returned false for all attributes or if there were no attributes (including the case where the node is null), null attribute is returned.

find_child function iterates through all child nodes of the specified node, and returns the first node for which the predicate returned true. If the predicate returned false for all nodes or if there were no child nodes (including the case where the node is null), null node is returned.

find_node function performs a depth-first traversal through the subtree of the specified node (excluding the node itself), and returns the first node for which the predicate returned true. If the predicate returned false for all nodes or if subtree was empty, null node is returned.

This is an example of using predicate-based functions (samples/traverse_predicate.cpp):

bool small_timeout(pugi::xml_node node)
{
    return node.attribute("Timeout").as_int() < 20;
}

struct allow_remote_predicate
{
    bool operator()(pugi::xml_attribute attr) const
    {
        return strcmp(attr.name(), "AllowRemote") == 0;
    }

    bool operator()(pugi::xml_node node) const
    {
        return node.attribute("AllowRemote").as_bool();
    }
};

// Find child via predicate (looks for direct children only)
std::cout << tools.find_child(allow_remote_predicate()).attribute("Filename").value() << std::endl;

// Find node via predicate (looks for all descendants in depth-first order)
std::cout << doc.find_node(allow_remote_predicate()).attribute("Filename").value() << std::endl;

// Find attribute via predicate
std::cout << tools.last_child().find_attribute(allow_remote_predicate()).value() << std::endl;

// We can use simple functions instead of function objects
std::cout << tools.find_child(small_timeout).attribute("Filename").value() << std::endl;

Working with text contents

It is common to store data as text contents of some node - i.e. <node><description>This is a node</description></node>. In this case, <description> node does not have a value, but instead has a child of type node_pcdata with value "This is a node". pugixml provides a special class, xml_text, to work with such data. Working with text objects to modify data is described in the documentation for modifying document data; this section describes the access interface of xml_text.

You can get the text object from a node by using text() method:

xml_text xml_node::text() const;

If the node has a type node_pcdata or node_cdata, then the node itself is used to return data; otherwise, a first child node of type node_pcdata or node_cdata is used.

You can check if the text object is bound to a valid PCDATA/CDATA node by using it as a boolean value, i.e. if (text) { ... } or if (!text) { ... }. Alternatively you can check it by using the empty() method:

bool xml_text::empty() const;

Given a text object, you can get the contents (i.e. the value of PCDATA/CDATA node) by using the following function:

const char_t* xml_text::get() const;

In case text object is empty, the function returns an empty string - it never returns a null pointer.

If you need a non-empty string if the text object is empty, or if the text contents is actually a number or a boolean that is stored as a string, you can use the following accessors:

const char_t* xml_text::as_string(const char_t* def = "") const;
int xml_text::as_int(int def = 0) const;
unsigned int xml_text::as_uint(unsigned int def = 0) const;
double xml_text::as_double(double def = 0) const;
float xml_text::as_float(float def = 0) const;
bool xml_text::as_bool(bool def = false) const;
long long xml_text::as_llong(long long def = 0) const;
unsigned long long xml_text::as_ullong(unsigned long long def = 0) const;

All of the above functions have the same semantics as similar xml_attribute members: they return the default argument if the text object is empty, they convert the text contents to a target type using the same rules and restrictions. You can refer to documentation for the attribute functions for details.

xml_text is essentially a helper class that operates on xml_node values. It is bound to a node of type node_pcdata or node_cdata. You can use the following function to retrieve this node:

xml_node xml_text::data() const;

Essentially, assuming text is an xml_text object, calling text.get() is equivalent to calling text.data().value().

This is an example of using xml_text object (samples/text.cpp):

std::cout << "Project name: " << project.child("name").text().get() << std::endl;
std::cout << "Project version: " << project.child("version").text().as_double() << std::endl;
std::cout << "Project visibility: " << (project.child("public").text().as_bool(/* def= */ true) ? "public" : "private") << std::endl;
std::cout << "Project description: " << project.child("description").text().get() << std::endl;

Miscellaneous functions

If you need to get the document root of some node, you can use the following function:

xml_node xml_node::root() const;

This function returns the node with type node_document, which is the root node of the document the node belongs to (unless the node is null, in which case null node is returned).

While pugixml supports complex XPath expressions, sometimes a simple path handling facility is needed. There are two functions, for getting node path and for converting path to a node:

string_t xml_node::path(char_t delimiter = '/') const;
xml_node xml_node::first_element_by_path(const char_t* path, char_t delimiter = '/') const;

Node paths consist of node names, separated with a delimiter (which is / by default); also paths can contain self (.) and parent (..) pseudo-names, so that this is a valid path: "../../foo/./bar". path returns the path to the node from the document root, first_element_by_path looks for a node represented by a given path; a path can be an absolute one (absolute paths start with the delimiter), in which case the rest of the path is treated as document root relative, and relative to the given node. For example, in the following document: <a><c/></a>, node <c/> has path "a/b/c"; calling first_element_by_path for document with path "a/b" results in node ; calling first_element_by_path for node <a/> with path "../a/./b/../." results in node <a/>; calling first_element_by_path with path "/a" results in node <a/> for any node.

In case path component is ambiguous (if there are two nodes with given name), the first one is selected; paths are not guaranteed to uniquely identify nodes in a document. If any component of a path is not found, the result of first_element_by_path is null node; also first_element_by_path returns null node for null nodes, in which case the path does not matter. path returns an empty string for null nodes.

	Note
	`path` function returns the result as STL string, and thus is not available if PUGIXML_NO_STL is defined.

pugixml does not record row/column information for nodes upon parsing for efficiency reasons. However, if the node has not changed in a significant way since parsing (the name/value are not changed, and the node itself is the original one, i.e. it was not deleted from the tree and re-added later), it is possible to get the offset from the beginning of XML buffer:

ptrdiff_t xml_node::offset_debug() const;

If the offset is not available (this happens if the node is null, was not originally parsed from a stream, or has changed in a significant way), the function returns -1. Otherwise it returns the offset to node's data from the beginning of XML buffer in pugi::char_t units. For more information on parsing offsets, see parsing error handling documentation.