Introduction
This is a feature complete XML library containing a validating parser as well as a modern C++ API for the data structures. It also supports serializing custom data structures.
The core of this library is a validating XML parser with DTD processing and all. On top of this are implemented an API for manipulating XML data in a DOM like fashion and a serialization API. As a bonus there’s also an XPath implementation, albeit this is limited to XPath 1.0.
This XML library was extracted from libzeep since having a separate and simple XML library is more convenient. The API is unfortunately no longer compatible since the goal was to be more standards compliant. E.g., mxml::element
should now be a complete Sequence Container
The DOM API
MXML uses a modern C++ way of accessing and manipulating data. Look at the following code to get an idea how this works.
XML nodes
The class mxml::node
is the base class for all classes in the DOM API. The class is not copy constructable and subclasses use move semantics to offer a simple API while still being memory and performance efficient. Nodes can have siblings and a parent but no children.
The class mxml::element
is the main class, it implements a full XML node with child nodes and attributes. The children are stored as a linked list and same goes for the attributes.
The class mxml::text
contains the text between XML elements. A mxml::cdata
class is derived from mxml::text
and other possible child nodes for an XML element are mxml::processing_instruction
and mxml::comment
.
XML elements also contain attributes, stored in the mxml::attribute
class. Namespace information is stored in these attributes as well. Attributes support structured binding, so the following works:
mxml::attribute a("x", "1");
auto& [name, value] = a; // name == "x", value == "1"
Input and output
The class mxml::document
looks very much like a mxml::element
but can only contain one child and no attributes. This class can load from and write to files.
streaming I/O
You can use std::iostream to read and write mxml::document
objects. Reading is as simple as:
mxml::document doc;
std::cin >> doc;
Writing is just as simple. A warning though, round trip fidelity is not guaranteed. There are a few issues with that. First of all, the default is to replace CDATA sections in a file with their content. If this is not the desired behaviour you can call :cpp:mxml::document::set_preserve_cdata()
with argument true.
Another issue is that text nodes containing only white space are present in documents read from disk while these are absent by default in documents created on the fly. When writing out XML using iostream you can specify to wrap and indent a document. But if the document was read in, the result will have extraneous spacing.
Specifying indentation is BTW done like this:
std::cout << std::setw(2) << doc;
That will indent with two spaces for each level.
validation
This will not validate the XML using the DTD by default. If you do want to validate and process the DTD, you have to specify where to find this DTD and other external entities. You can either use :cpp:mxml::document::set_base_dir()
or you can specify an entity_loader using :cpp:mxml::document::set_entity_loader()
As an example, take the following DTD file
<!ELEMENT foo (bar)>
<!ELEMENT bar (#PCDATA)>
<!ENTITY hello "Hello, world!">
And an XML document containing
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE foo SYSTEM "sample.dtd">
<foo>
<bar>&hello;</bar>
</foo>
When we want to see the &hello; entity replaced with ‘Hello, world!’ as specified in the DTD, we need to provide a way to load this DTD. To do this, look at the following code. Of course, in this example a simple call to :cpp:mxml::document::set_base_dir()
would have been sufficient.
Serialization
An alternative way to read/write XML files is using serialization. To do this, we first construct a structure called Person. We add a templated function to this struct just like in boost::serialize and then we can read the file.
attributes
Suppose you want to serialize a value into a XML attribute, you would have to replace mxml::make_element_nvp with mxml::make_attribute_nvp.
custom types
What happens during serialization is deconstruction of structured data types into parts that can be converted into text strings. For this final conversion there are mxml::value_serializer helper classes. mxml::value_serializer is a template and specializations for the default types are given in <mxml/serializer.ixx>. You can create your own specializations for this class for custom data types, look at the one for std::chrono::system_clock::time_point for inspiration.
enums
For conversion of enum’s you can use the mxml::value_serializer specialization for enums:
enum class MyEnum { FOO, BAR };
mxml::value_serializer<MyEnum>::init({
{ MyEnum::FOO, "foo" },
{ MyEnum::BAR, "bar" }
});
XPath 1.0
MXML comes with a [XPath 1.0](http://www.w3.org/TR/xpath/) implementation. You can use this to locate elements in a DOM tree easily. For a complete description of the XPath specification you should read the documentation at e.g. http://www.w3.org/TR/xpath/ or https://www.w3schools.com/xml/xpath_intro.asp.
The way it works in MXML is that you can call find() on an mxml::element
object and it will return a mxml::element_set object which is actually a std::list of mxml::element
pointers of the elements that conform to the specification in XPath passed as parameter to find(). An alternative method find_first() can be used to return only the first element.
An example where we look for the first person in our test file with the lastname Jones:
mxml::element* jones = doc.child()->find_first("//person[lastname='Jones']");
variables
XPath constructs can reference variables. As an example, suppose you need to find nodes in a special XML Namespace but you do not want to find out what the prefix of this Namespace is, you could do something like this:
Note
Please note that the evaluation of an XPath returns pointers to XML nodes. Of course these are only valid as long as you do not modify the the document in which they are contained.
Real world example
A real world example is given in the clavichord-example application. It uses a subset of configuration data to calculate the ideal layout of strings for a clavichord.
Start by looking at the following DTD files:
<!ELEMENT data (naam,omschrijving,stemming,snaren)>
<!ELEMENT naam (#PCDATA)>
<!ELEMENT omschrijving (#PCDATA)>
<!ELEMENT stemming (noot, noot, noot, noot, noot, noot, noot, noot, noot, noot, noot, noot)>
<!ATTLIST stemming a NMTOKEN #REQUIRED>
<!ELEMENT noot EMPTY>
<!ATTLIST noot id CDATA #REQUIRED>
<!ATTLIST noot f NMTOKEN #REQUIRED>
<!-- de verschillende stemmingen -->
<!ENTITY middentoon SYSTEM "middentoon.xml">
<!ENTITY werckmeister SYSTEM "werckmeister.xml">
<!ELEMENT snaren (gebonden?)>
<!ATTLIST snaren hoek NMTOKEN #REQUIRED>
<!ATTLIST snaren ideale-stress NMTOKEN #REQUIRED>
<!--
<!ATTLIST snaren breedte NMTOKEN #REQUIRED>
<!ATTLIST snaren marge NMTOKEN #REQUIRED>
<!ATTLIST snaren afstand-paar-van NMTOKEN #REQUIRED>
<!ATTLIST snaren afstand-paar-tot NMTOKEN #REQUIRED>
<!ATTLIST snaren afstand-volgende NMTOKEN #REQUIRED>
-->
<!ELEMENT gebonden (bind*)>
<!ATTLIST gebonden vanaf NMTOKEN #REQUIRED>
<!ATTLIST gebonden schema (zweeds|duits) #REQUIRED>
<!ELEMENT bind (#PCDATA)>
And
<?xml version="1.0" encoding="utf-8"?>
<!-- werkmeister III -->
<noot id="c" f="0.5986459813"/>
<noot id="c#" f="0.6306723111"/>
<noot id="d" f="0.6689290015"/>
<noot id="eb" f="0.7095063493"/>
<noot id="e" f="0.7499999996"/>
<noot id="f" f="0.7981946422"/>
<noot id="f#" f="0.8408964153"/>
<noot id="g" f="0.8949320181"/>
<noot id="g#" f="0.9460084662"/>
<noot id="a" f="1"/>
<noot id="bb" f="1.0642595234"/>
<noot id="b" f="1.1249999989"/>
These files define what needs to go into a configuration file for our program. A sample configuration file may then look like this:
<?xml version="1.1" encoding="UTF-8"?>
<!DOCTYPE data SYSTEM "clavichord-stringing.dtd">
<data>
<naam>Zweed klavechord, versie 2</naam>
<omschrijving>Een clavichord naar zweedse voorbeelden, werkmeister stemming</omschrijving>
<stemming a="466">&werckmeister;</stemming>
<snaren hoek="7.5" ideale-stress="70">
<gebonden vanaf="c" schema="zweeds"></gebonden>
</snaren>
</data>
And the application code to read this data looks like this:
#include "mxml.hpp"
#include <array>
#include <fstream>
#include <iostream>
#include <optional>
#include <string>
enum class BindingType
{
Swedish,
German
};
enum class NoteName
{
C,
C_sharp,
D,
E_flat,
E,
F,
F_sharp,
G,
G_sharp,
A,
B_flat,
B
};
struct Note
{
NoteName name;
float pitch;
template <typename Archive>
void serialize(Archive &ar, unsigned long)
{
// clang-format off
ar & mxml::make_attribute_nvp("id", name)
& mxml::make_attribute_nvp("f", pitch);
// clang-format on
}
};
struct Tuning
{
float A_frequency;
std::array<Note, 12> notes;
template <typename Archive>
void serialize(Archive &ar, unsigned long)
{
// clang-format off
ar & mxml::make_attribute_nvp("a", A_frequency)
& mxml::make_element_nvp("noot", notes);
// clang-format on
}
};
struct Binding
{
BindingType type;
std::string start;
template <typename Archive>
void serialize(Archive &ar, unsigned long)
{
// clang-format off
ar & mxml::make_attribute_nvp("schema", type)
& mxml::make_attribute_nvp("vanaf", start);
// clang-format on
}
};
struct Stringing
{
float angle;
float stress;
std::optional<Binding> binding;
template <typename Archive>
void serialize(Archive &ar, unsigned long)
{
// clang-format off
ar & mxml::make_attribute_nvp("hoek", angle)
& mxml::make_attribute_nvp("ideale-stress", stress)
& mxml::make_element_nvp("gebonden", binding);
// clang-format on
}
};
struct ClavichordSettings
{
std::string name;
std::string description;
Tuning tuning;
Stringing strings;
template<typename Archive>
void serialize(Archive &ar, unsigned long)
{
// clang-format off
ar & mxml::make_element_nvp("naam", name)
& mxml::make_element_nvp("omschrijving", description)
& mxml::make_element_nvp("stemming", tuning)
& mxml::make_element_nvp("snaren", strings);
// clang-format on
}
};
int main()
{
mxml::value_serializer<BindingType>::init({
// clang-format off
{ BindingType::German, "german" },
{ BindingType::Swedish, "swedish" }
// clang-format on
});
mxml::value_serializer<NoteName>::init({
// clang-format off
{ NoteName::C, "c" },
{ NoteName::C_sharp, "c#" },
{ NoteName::D, "d" },
{ NoteName::E_flat, "eb" },
{ NoteName::E, "e" },
{ NoteName::F, "f" },
{ NoteName::F_sharp, "f#" },
{ NoteName::G, "g" },
{ NoteName::G_sharp, "g#" },
{ NoteName::A, "a" },
{ NoteName::B_flat, "bb" },
{ NoteName::B, "b" }
// clang-format on
});
ClavichordSettings cs;
try
{
mxml::document doc;
doc.set_validating(true);
std::ifstream f("clavichord-v2.xml");
f >> doc;
from_xml(doc, "data", cs);
// And now do something useful with the data in cs
}
catch (const std::exception& ex)
{
std::cerr << ex.what() << '\n';
}
return 0;
}