| Document Type Definitions (DTDs) |
A DTD uses a formal grammar to specify the
structure and permissible values of XML documents. The well-formed
XML just conforms to the basic syntactic rules in XML. With
DTD, we are going to create valid XML, XML that conforms to
the syntactic rules of XML as well as the vocabulary we create.
There are several benefits when a DTD is
used. A DTD created in a formal and precise manner identifies
the vocabulary. The rules of the vocabulary are contained in the DTD.
The parsers could also use the DTD to validate
an instance of the document. A simple declaration in the document
instance allows the parser to retrieve the DTD and compare
the document instance to the rules in the DTD.
The DTD must have a formal structure. A
question arises regarding the need for a formal structure.
The answer is a clear, precise set of syntactic rules that
capture everything permitted in the vocabulary. There are
encoded rules in the vocabulary in the source code. The code
enforces a certain structure, when the structure changes the
code must also change. This helps the designer to convey the
information he wants to and the user understands what the
programmer wants him to know.
The XML document is a snapshot of the data
structures in a program. The XML documents communicate with
one another. The DTD on the other hand captures the information
in the vocabulary by definition. Everything `learnt that went
into the design of the vocabulary must be in the DTD.
The DTD hence solves two problems at the
same time one we get something can use and the other is we get documentation.
A well-formed document written to implicit
rules cannot be checked for errors. We rely on the integrity
of the applications that create and consume the XML for the
integrity of the overall system. Errors in the code cannot
be caught. They could either cause the program to break or
cause bad errors. This is the reason that the W3C specifies
the behavior of a validating parser. If an XML document refers
a DTD, a validating parser is required to retrieve the DTD
and ensure that the document conforms to the grammar that
the DTD describes.
To check errors, simply use DTDs and a validating
parser. The parser will check for errors in the document syntax,
vocabulary, and any specified values.
After the parser has validated the document,
the document can be passed on to the application logic. The
application logic does protect the document from faulty application
logic but filters the bad data. This is particularly important
in case of Internet applications.
One cannot assume that the quality control
over the application subject and the codes written to be the
same. A programming team working for one organization might
be implementing a public XML vocabulary for a particular business.
Their interpretation of the vocabulary may not be the same.
The same case applies for the testing as well. But, with a
DTD and a validating parser, we can have an immediate and
effective check of the document's integrity. This check depends
on the DTD. With this in mind we shall now delve into the
principles needed to write effective DTDs.
| General Principles in writing DTDs |
XML documents, to simplify, consist of elements
and their attributes. There are some other items that we ought
to define, but documents support only the above two main concepts.
In addition, an element's content is defined in terms of other
elements or some basic concepts defined in the XML standard.
A DTD, therefore, must define all the elements in a document
and the relationship between elements.
DTDs are associated with documents. When
a validating parser reads the instruction by which documents
are associated with a DTD, that tells the parser to get the
DTD and validate the document according to the rules provided
therein. We will now see how to tie DTDs to document instances.
XML provides the DOCTYPE tag to connect
the DTD declarations to a document instance.
The DOCTYPE declaration must follow the
XML declaration and precede any elements in the document.
However, comments and processing instructions may appear between
the XML declaration and the DOCTYPE declaration.
The DOCTYPE declaration must contain the
keyword DOCTYPE followed by the name of the root element of
the document, followed by a construction that brings in the content declaration.
| Internal & External subsets |
<?xml version="1.0?>
<!DOCTYPE Catalog
> <Catalog>
Before we understand more about the DOCTYPE
tag, we shall see an example of the position of the DOCTYPE
declaration in a document instance. Shown in the image are
the three lines of an XML document. The first line states
that this document conforms to the syntax of XML 1.0. This
is done by using the XML declaration at the top, we have declared
that this document falls under the CATALOG vocabulary. This
is done by specifying the word CATALOG after the Document
Type "CATALOG". Also the first element also called
as the root of the document must be CATALOG or the parser will return an error.
The ellipsis concealing the DOCTYPE declaration
is not very satisfying. Where are the declarations? There
are two ways to provide declarations. There can be an external
subset of declarations in a separate DTD file or include an
internal subset within the body of a DOCTYPE declaration or both.
In the instance of mixing the external and
internal subsets, the internal DTD may add declarations or
override declarations found in the external DTD. Parsers,
generally, read the internal subset first and declarations
therein take priority, by definition of the XML specification.
There is one further variation to be considered
before we further discuss how to provide declarations. The
XML declaration can have a standalone attribute. The standalone
attribute is, however, seldom seen in practice. The figure
shows the declaration of the standalone attribute.
<?xml version="1.0" standalone="YES"
?> <!DOCTYPE Catalog
This attribute can have two values: YES and NO.
If the value of the attribute is YES, then
there are no declarations external to the document instance
that would affect the information in the document passed to
the application using it. The presence of the attribute with
the value YES, does not guarantee that the document does not
have external dependencies of any type. It merely states that
the document has no external dependencies that if not included
in the processing would make the document erroneous as far
as the receiving application is concerned.
A value of NO indicates that there are external
declarations that contain values that are necessary to properly
define the document content. The main use of the standalone
attribute is as a flag for parsers and other applications
to indicate whether they need to retrieve external content.
The DOCTYPE declaration formally consists
of the keyword followed by the name of the documents root
element's root element in our example the word CATALOG. This
is followed by an optional external identifier, which is again
followed by an optional block of markup characters.
The external identifier locates the external
DTD (external subset).
The markup declaration block actually contains
markup declarations (internal subset). We shall now discuss these in detail.
| Internal DTD Subset Declarations |
Declarations such as entity references can
be declared in the internal subset. This markup declaration
block is delimited within the DOCTYPE declaration using square
brackets ([
]). A list of declarations is declared
within these brackets. An example of the declaration is as shown in the figure.
<!DOCTYPE Catalog
[
internal subset declaration here
]>
Internal DTDs are very useful. An internal DTD, however, adds
a substantial size to the document. The declarations must
be transmitted with the document even if the consumer of the
document does not intend to verify the document. Internal
DTDs are very useful for simple vocabularies significantly
when using prototypes of markup.
Sometimes, programmers might feel the need
to use both the internal as well as external DTD. In such
cases, the internal DTD adds declarations. Nonetheless, when
an internal DTD declares some item that is also declared in
the external DTD, the internal DTD supersedes the external
DTD. This permits some fine-tuning of declarations for particular
documents needs, but enough care must be taken, as, if we
override the external DTD, it starts to loose relevance, which
is a sign of poor initial design.
An external DTD is more flexible in certain
aspects. In this case, the DOCTYPE declaration comprises of
the usual keyword and the root element name, followed by another
keyword denoting the source of the external DTD, which is
then followed by the location of that DTD.
The keyword can either be SYSTEM or PUBLIC.
In case, the keyword is SYSTEM, a URL directly
and explicitly locates the DTD. Thus the parser should be
able to find the DTD given the URL alone. Hence, what follows
SYSTEM, is a URL naming the DTD file. The URLs used to locate
DTDs should not contain fragment identifiers, that is, the
character # followed by a name, as XML 1.0 indicates that
parsers may signal an error if the URL contains such an identifier.
<DOCTYPE Catalog SYSTEM http://myserver/Catalog.dtd>
<DOCTYPE Catalog SYSTEM http://www.universallibrary.org/Catalog.dtd>
The image shows an example of DOCTYPE declaration
using the SYSTEM keyword. All the declarations needed to validate
the document containing the first DOCTYPE declaration will
be found in the file Catalog.dtd. In the second case, the
DTD file is found on a Web server that is operated by a hypothetical
universal library organization. In both the cases, an element
declaration for the CATALOG element is to be found within
the Catalog.dtd file.
The PUBLIC keyword is used for well-known
vocabularies. Going back to our CATALOG example, let us suppose
considerable consensus has been built upon the catalog DTD
in the publishing industry. In that case, an application parsing
a document from this vocabulary might employ some strategy
for locating the DTD. If possible, the application might have
a local copy. Hence, using it would be preferable than making
a roundabout trip to a Web server.
Using the PUBLIC keyword with a Uniform
Resource Identifier (URI), applications are given the opportunity
to locate the DTD using their own algorithms. The URI could
be a URL or simply a unique name.
If the URI universal/Book is well known
to the application processing documents of this type, the
application can go and find the DTD on its own. It might even
have a local copy of the DTD, or it might access a DTD maintained
on a local database server. Thus, it can be inferred that
the means of finding the DTD is left primarily to the application,
processing the DOCTYPE declaration.
The tem "well known" is normally
relative. XML 1.0, however, permits a PUBLIC declaration to
have both a public URI and a system identifier. If the application
or parser consuming the document cannot locate a DTD from
the URI provided with the PUBLIC keyword, it must use the
system identifier. In the example as shown, the author of
the document gave the receiving application a chance to find
the DTD based on the public URI. If that fails, which can
expected from a general-purpose parser with no knowledge of
our publishing domain, the application would be expected to
request the name from the Web server at www.universallibrary.org.
Copyrights : Layout Galaxy All Rights Reserved
No part of this tutorial may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, electrostatic, magnetic tape, mechanical or otherwise, without prior permission in writing from Layout Galaxy.
|
|