XML Data Validation

by Dinesh 2012-07-22 23:02:43

XML Data Validation

As I’ve shown you so far in this chapter, there are very strict rules for the basic structure and syntax of well-formed XML documents. There are also several formats within the boundaries of well-formed XML syntax that provide standardized ways of representing specific types of data.

For example, NewsML offers a standard format for packaging news information in XML. NewsML defines what the element name should be that contains the title, publication date, headline, article text, and other parts of a news item. NewsML also defines how these elements should be arranged, and which elements are optional. NewsML documents are well-formed XML, and they also conform to NewsML specifications.

The validity of an XML document is determined by a Document Type Definition (DTD) or an XML Schema. There are several formats for data validation to choose from. A good listing for XML validation formats can be found at http://www. oasis-open.org/cover/schemas.html. However, the most common and officially W3C sanctioned formats are the Document Type Definition (DTD) and the W3C Schema, which I will focus on in this chapter.

XML documents are compared to rules that are specified in a DTD or schema. A well-formed XML document that meets all of the requirements of one or more specifications is called a valid XML Document.

For example, the NewsML specification is defined and managed by the International Press Telecommunications Council (IPTC). The IPTC has published a DTD that can be used by news providers to validate NewsML news items (Reuters and other news providers have NewsML-compatible news feeds). If a member of the press wants to produce NewsML formatted news items, they can download the DTD from the IPTC Website at http://www.iptc.org. Once the DTD is downloaded, XML developers can validate their NewsML output against the DTD using a validating parser.

A Very Simple XML Document with a Schema and DTD Reference:


This is level 1 of the nested

This is level 2 of the nested

It is not common to see both DTD and Schema references in a single document that verify the same structural rules, but it’s a good example of the fact that you can combine Schema and DTD references in a single document. References to a DTD and a schema can occur when an XML document is made up of two or more source documents. The DTD and schema references maintain all of the structure rules that were present in the original document. Dual references can also be used when illegal XML characters are represented in an XML document by entity references. I’ll describe entity references in more detail later in this chapter.

You must LOGIN to add comments