Schematron - A language for validating XML

Schematron is a validation language that checks XML documents against business rules. It extends the validation provided by languages such as Document Type Definitions (DTD), W3C XML Schema, and RELAX NG, giving you the ability to check your XML documents for compliance with rules that can be difficult, if not impossible, to check with the other validation languages.
This book explains Schematron for both XML aficionados and people who are less experienced. And for those who are new to XML or have limited programming experience, it contains introductions to two important topics: XPath and XML namespaces.
It was published by XML Press in 2022.

XProc - 3.0 Programmer Reference

XProc 3.0 is a programming language for processing XML, JSON, and other documents in pipelines. XProc chains conversions and other steps, allowing for potentially complex processing. XProc is especially useful for applications, such as publishing, where content may come from multiple input sources, pass through multiple processing steps and result in multiple output streams. It is published by XML Press.

eXist - A NoSQL Document Database and Application Platform

Together with Adam Retter I wrote a book about the XML database and application platform eXist. It is published by O'Reilly and for sale in their webshop.

Articles on

The website published the following article I've written:

Whitepaper: Content Packaging with Cocoon

Content Packaging is the process in which content that belongs together is bundled into a zip file. A file describing the content (called a manifest) is added so the receiving side can understand what to do with it. For transporting educational content (e.g. e-learning courses) this is common practice, but it is also used elsewhere (e.g. Java .jar and .war files). This presentation derives from practical experience building a Content Packaging application, for educational content, using the open source Cocoon framework.

Creating such a Content Package starts with thousands of XML source files and involves quite a lot of transforming, splitting and merging before the end zip file can be assembled. Cocoon proved itself to be a very good platform for operations like this.

Whitepaper: Medium neutral content production

It isn’t easy, producing educational content with XML technologies. At the beginning of the lifecycle are authors with hardly any IT affinity. In the middle we need to split their output into smaller sections and reassemble it into meaningful combinations. At the end we need high quality output for print and other media. Designing the technology for this is hard, putting it into practice even harder.

So, why is this so difficult? There are XML editors, content management systems and PDF generators by the dozen. Just put some of it together and off we go. Unfortunately, reality proves otherwise. Most XML technology on the market is geared towards relatively simple applications like websites or simple print publications. It all seems to assume that its users are IT proficient, know what an XML tag is and how to handle it.

Another problem is rooted in the holy grail of XML publishing: Medium neutrality. High quality print output requires an enormous amount of detail in the XML sources. So what to do with designers that need fine grained control over the placement of illustrations on the page? You somehow have to add all this information somewhere and before you know it, your content is no longer medium neutral but very print oriented. Yes, it is possible to create educational books with XML technology. No, it is definitely not yet the smooth production facility we want it to be.

This whitepaper was written as a result of the, sometimes unpleasant, experiences with XML publishing of educational material. We analyzed what happened and came up with some interesting thoughts and ideas that we would like to share with you. Necessary background for this whitepaper is some experience with XML content production. In-depth knowledge about XML is not required.

This paper is written for a presentation for the XML Europe 2004 conference.

Whitepaper: Schemas for an XML standard

When companies need to communicate a lot to get work done, they will sooner or later start thinking about automating this. In most cases this is a complicated and time-consuming process. You need to agree about the functionality, format and content of the messages, the technical details for actually sending and receiving them, the legal side of it, and much, much more. Setting up a business-to-business (B2B) communication standard isn’t easy.

However, the reward can be enormous: Increased volumes of business transactions, decreased number of errors, etc. Sometimes because of the automation whole new markets open up.

Nor so long ago, this kind of B2B communication was done using the EDI standards. For several reasons, with a few notable exceptions, EDI never really made it as an important standard. Nowadays, these kinds of processes are automated using XML technology. However, whatever technology is used to implement things, the groundwork will stay the same: Somebody must analyze the communication processes, create a common data model, design messages and message flows, etc. After this is done and agreed upon, the technical people come in and design the actual messages, the infrastructure and all the other necessary technical bits and pieces to make it work.

This is a case study about a small but significant step in a XML standard creation process: The analysts were ready, we knew what we wanted to do and there was a huge formal data and communication model. Now, from this, we needed to design the actual messages and create XML schemas. Maintenance was an important issue, because this was only version one and things would evolve.

Since the standard was quite large, hand crafting all message schemas was definitely not a good idea: It would have been next to impossible to get all the details, like field types and sizes, completely right. But inconsistencies between standard and actual messages were not acceptable…

This whitepaper explains how we automated the conversion from data/communication model into the actual XML message schemas. It involves an interesting mix of technologies, resulting in not only the schemas themselves but also a full set of documentation, all generated and therefore easily maintained.

 tel:  +31 - 6 - 53260792