Untitled Document



 Introduction to XML
 Data Definition and Data  Modeling
 Well Formed Document
 Well - Formed  Documents
   Parsers
 XML Processing -  Attribute Values
   XML Processing
   Event-Driven Parsers
 Tree-based Parsers
 Document Type  Definitions
 
 Document Type  Definitions (DTDs)
   Document validation
 General Principles in  writing DTDs
 Internal & External  subsets
   Standalone attribute
   DOCTYPE Declaration
 Internal DTD Subset  Declarations
   External DTDs
 Basic Markup  Declarations
 Formal DTD Structure - Entities
   Predefined Entities
   General Entities
   Parameter Entities
 Formal DTD Structure - Elements
   Content Model
   Cardinality Operators
   Attributes
   Default Values
   Attribute Types
   CDATA
 ID
 Data Modeling
 
   Data Modeling
   Information Modeling
 Static and Dynamic  Models
 Static Information  Model
   Organizing Things
   Finding Relationships
   Defining Properties
   Dynamic Modeling
 Dynamic Model  Techniques
 Designing XML  Documents
   XML for Messages
 XML for Persistent  Data
 Mapping the  Information Model to  XML
 Schema Languages  and Notations
 Document Object Model
 
 Document Object  Model
 XML Document  Structure
   Why use DOM?
 The DOM  Specification
 DOM Level2  Specification
   Working with DOM
 Client Side and  Server Side DOM
 Namespaces and  Schemas
 Linking and Querying

 Ecommerce Application  using XML

Copyrights : Layout Galaxy All Rights Reserved
No part of this tutorial may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, electrostatic, magnetic tape, mechanical or otherwise, without prior permission in writing from Layout Galaxy.




 Data Definitions and Data Modeling > Document Type Definitions

  Document Type Definitions (DTDs)

A DTD uses a formal grammar to specify the structure and permissible values of XML documents. The well-formed XML just conforms to the basic syntactic rules in XML. With DTD, we are going to create valid XML, XML that conforms to the syntactic rules of XML as well as the vocabulary we create.

There are several benefits when a DTD is used. A DTD created in a formal and precise manner identifies the vocabulary. The rules of the vocabulary are contained in the DTD.

The parsers could also use the DTD to validate an instance of the document. A simple declaration in the document instance allows the parser to retrieve the DTD and compare the document instance to the rules in the DTD.

The DTD must have a formal structure. A question arises regarding the need for a formal structure. The answer is a clear, precise set of syntactic rules that capture everything permitted in the vocabulary. There are encoded rules in the vocabulary in the source code. The code enforces a certain structure, when the structure changes the code must also change. This helps the designer to convey the information he wants to and the user understands what the programmer wants him to know.

The XML document is a snapshot of the data structures in a program. The XML documents communicate with one another. The DTD on the other hand captures the information in the vocabulary by definition. Everything `learnt that went into the design of the vocabulary must be in the DTD.

The DTD hence solves two problems at the same time one we get something can use and the other is we get documentation.

  Document validation

A well-formed document written to implicit rules cannot be checked for errors. We rely on the integrity of the applications that create and consume the XML for the integrity of the overall system. Errors in the code cannot be caught. They could either cause the program to break or cause bad errors. This is the reason that the W3C specifies the behavior of a validating parser. If an XML document refers a DTD, a validating parser is required to retrieve the DTD and ensure that the document conforms to the grammar that the DTD describes.

To check errors, simply use DTDs and a validating parser. The parser will check for errors in the document syntax, vocabulary, and any specified values.

After the parser has validated the document, the document can be passed on to the application logic. The application logic does protect the document from faulty application logic but filters the bad data. This is particularly important in case of Internet applications.

One cannot assume that the quality control over the application subject and the codes written to be the same. A programming team working for one organization might be implementing a public XML vocabulary for a particular business. Their interpretation of the vocabulary may not be the same. The same case applies for the testing as well. But, with a DTD and a validating parser, we can have an immediate and effective check of the document's integrity. This check depends on the DTD. With this in mind we shall now delve into the principles needed to write effective DTDs.

  General Principles in writing DTDs

XML documents, to simplify, consist of elements and their attributes. There are some other items that we ought to define, but documents support only the above two main concepts. In addition, an element's content is defined in terms of other elements or some basic concepts defined in the XML standard. A DTD, therefore, must define all the elements in a document and the relationship between elements.

DTDs are associated with documents. When a validating parser reads the instruction by which documents are associated with a DTD, that tells the parser to get the DTD and validate the document according to the rules provided therein. We will now see how to tie DTDs to document instances.

XML provides the DOCTYPE tag to connect the DTD declarations to a document instance.

The DOCTYPE declaration must follow the XML declaration and precede any elements in the document. However, comments and processing instructions may appear between the XML declaration and the DOCTYPE declaration.

The DOCTYPE declaration must contain the keyword DOCTYPE followed by the name of the root element of the document, followed by a construction that brings in the content declaration.

  Internal & External subsets

<?xml version="1.0?>
<!DOCTYPE Catalog…>
<Catalog>…

Before we understand more about the DOCTYPE tag, we shall see an example of the position of the DOCTYPE declaration in a document instance. Shown in the image are the three lines of an XML document. The first line states that this document conforms to the syntax of XML 1.0. This is done by using the XML declaration at the top, we have declared that this document falls under the CATALOG vocabulary. This is done by specifying the word CATALOG after the Document Type "CATALOG". Also the first element also called as the root of the document must be CATALOG or the parser will return an error.

The ellipsis concealing the DOCTYPE declaration is not very satisfying. Where are the declarations? There are two ways to provide declarations. There can be an external subset of declarations in a separate DTD file or include an internal subset within the body of a DOCTYPE declaration or both.

In the instance of mixing the external and internal subsets, the internal DTD may add declarations or override declarations found in the external DTD. Parsers, generally, read the internal subset first and declarations therein take priority, by definition of the XML specification.

  Standalone attribute

There is one further variation to be considered before we further discuss how to provide declarations. The XML declaration can have a standalone attribute. The standalone attribute is, however, seldom seen in practice. The figure shows the declaration of the standalone attribute.

<?xml version="1.0" standalone="YES" ?>
<!DOCTYPE Catalog…

This attribute can have two values: YES and NO.

If the value of the attribute is YES, then there are no declarations external to the document instance that would affect the information in the document passed to the application using it. The presence of the attribute with the value YES, does not guarantee that the document does not have external dependencies of any type. It merely states that the document has no external dependencies that if not included in the processing would make the document erroneous as far as the receiving application is concerned.

A value of NO indicates that there are external declarations that contain values that are necessary to properly define the document content. The main use of the standalone attribute is as a flag for parsers and other applications to indicate whether they need to retrieve external content.

  DOCTYPE Declaration

The DOCTYPE declaration formally consists of the keyword followed by the name of the documents root element's root element in our example the word CATALOG. This is followed by an optional external identifier, which is again followed by an optional block of markup characters.

The external identifier locates the external DTD (external subset).

The markup declaration block actually contains markup declarations (internal subset). We shall now discuss these in detail.

  Internal DTD Subset Declarations

Declarations such as entity references can be declared in the internal subset. This markup declaration block is delimited within the DOCTYPE declaration using square brackets ([……]). A list of declarations is declared within these brackets. An example of the declaration is as shown in the figure.

<!DOCTYPE Catalog […internal subset declaration here…]>
Internal DTDs are very useful. An internal DTD, however, adds a substantial size to the document. The declarations must be transmitted with the document even if the consumer of the document does not intend to verify the document. Internal DTDs are very useful for simple vocabularies significantly when using prototypes of markup.

Sometimes, programmers might feel the need to use both the internal as well as external DTD. In such cases, the internal DTD adds declarations. Nonetheless, when an internal DTD declares some item that is also declared in the external DTD, the internal DTD supersedes the external DTD. This permits some fine-tuning of declarations for particular documents needs, but enough care must be taken, as, if we override the external DTD, it starts to loose relevance, which is a sign of poor initial design.

  External DTDs

An external DTD is more flexible in certain aspects. In this case, the DOCTYPE declaration comprises of the usual keyword and the root element name, followed by another keyword denoting the source of the external DTD, which is then followed by the location of that DTD.

The keyword can either be SYSTEM or PUBLIC.

In case, the keyword is SYSTEM, a URL directly and explicitly locates the DTD. Thus the parser should be able to find the DTD given the URL alone. Hence, what follows SYSTEM, is a URL naming the DTD file. The URLs used to locate DTDs should not contain fragment identifiers, that is, the character # followed by a name, as XML 1.0 indicates that parsers may signal an error if the URL contains such an identifier.

<DOCTYPE Catalog SYSTEM http://myserver/Catalog.dtd>

<DOCTYPE Catalog SYSTEM http://www.universallibrary.org/Catalog.dtd>

The image shows an example of DOCTYPE declaration using the SYSTEM keyword. All the declarations needed to validate the document containing the first DOCTYPE declaration will be found in the file Catalog.dtd. In the second case, the DTD file is found on a Web server that is operated by a hypothetical universal library organization. In both the cases, an element declaration for the CATALOG element is to be found within the Catalog.dtd file.

The PUBLIC keyword is used for well-known vocabularies. Going back to our CATALOG example, let us suppose considerable consensus has been built upon the catalog DTD in the publishing industry. In that case, an application parsing a document from this vocabulary might employ some strategy for locating the DTD. If possible, the application might have a local copy. Hence, using it would be preferable than making a roundabout trip to a Web server.

Using the PUBLIC keyword with a Uniform Resource Identifier (URI), applications are given the opportunity to locate the DTD using their own algorithms. The URI could be a URL or simply a unique name.

If the URI universal/Book is well known to the application processing documents of this type, the application can go and find the DTD on its own. It might even have a local copy of the DTD, or it might access a DTD maintained on a local database server. Thus, it can be inferred that the means of finding the DTD is left primarily to the application, processing the DOCTYPE declaration.

The tem "well known" is normally relative. XML 1.0, however, permits a PUBLIC declaration to have both a public URI and a system identifier. If the application or parser consuming the document cannot locate a DTD from the URI provided with the PUBLIC keyword, it must use the system identifier. In the example as shown, the author of the document gave the receiving application a chance to find the DTD based on the public URI. If that fails, which can expected from a general-purpose parser with no knowledge of our publishing domain, the application would be expected to request the name from the Web server at www.universallibrary.org.

Back Next


Copyrights : Layout Galaxy All Rights Reserved
No part of this tutorial may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, electrostatic, magnetic tape, mechanical or otherwise, without prior permission in writing from Layout Galaxy.




17, Vadsarvala Nivas, 65-A, J. Nehru Road, Mulund (W), Mumbai - 400 080 INDIA
Tel : 91-22-21645588, 91-22-21640585 Fax : 91-22-21641545
Email : ionline@vsnl.com
© Image Online 2001-2003