Untitled Document



 Introduction to XML
 Data Definition and Data  Modeling
 Well Formed Document
 Well - Formed  Documents
   Parsers
 XML Processing -  Attribute Values
   XML Processing
   Event-Driven Parsers
 Tree-based Parsers
 Document Type  Definitions
 
 Document Type  Definitions (DTDs)
   Document validation
 General Principles in  writing DTDs
 Internal & External  subsets
   Standalone attribute
   DOCTYPE Declaration
 Internal DTD Subset  Declarations
   External DTDs
 Basic Markup  Declarations
 Formal DTD Structure - Entities
   Predefined Entities
   General Entities
   Parameter Entities
 Formal DTD Structure - Elements
   Content Model
   Cardinality Operators
   Attributes
   Default Values
   Attribute Types
   CDATA
 ID
 Data Modeling
 
   Data Modeling
   Information Modeling
 Static and Dynamic  Models
 Static Information  Model
   Organizing Things
   Finding Relationships
   Defining Properties
   Dynamic Modeling
 Dynamic Model  Techniques
 Designing XML  Documents
   XML for Messages
 XML for Persistent  Data
 Mapping the  Information Model to  XML
 Schema Languages  and Notations
 Document Object Model
 
 Document Object  Model
 XML Document  Structure
   Why use DOM?
 The DOM  Specification
 DOM Level2  Specification
   Working with DOM
 Client Side and  Server Side DOM
 Namespaces and  Schemas
 Linking and Querying

 Ecommerce Application  using XML

Copyrights : Layout Galaxy All Rights Reserved
No part of this tutorial may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, electrostatic, magnetic tape, mechanical or otherwise, without prior permission in writing from Layout Galaxy.




 Data Definitions and Data Modeling > Well Formed Document

  Well - Formed Documents

The data objects also called as documents that conform to the syntax specification in XML are called Well-formed XML documents. These documents describe the structure, and are also known as standalone XML documents.

These documents are not dependent on external declarations, and attribute values receive no special processing or default values.

A well-formed XML document contains one or more elements in it that are delimited by the start and end tags. There is one element, the document element, which contains all the other elements within the document. All the elements are in the form of a hierarchical tree, thus, the relationship between the elements is in the form of a parent-child relationship.

So, to summarize, data objects are well-formed documents if :

Syntax conforms to the XML specification,
Elements are in the form of a simple hierarchical tree with a single node and
There are no external references to entities.

An XML parser that encounters a construct in XML and finds the construct not to be well-formed will report an error to the application as "fatal" error. This approach of error handling is the result of the compact design of XML and the intention that XML is to be used for much more than document display.

  Parsers

The W3C Recommendation has also described the behavior of Parsers or XML processor or the lower tier of the XML's architecture. This has been defined with the objective of easing the burden on the applications that handle the XML data.

There are two types of parsers namely non-validating and validating.

The non-validating type of parser merely ensures that a data object is a well-formed XML.

In the validating type, the parser uses a DTD to ensure the validity of a well-formed data object's form and content. Some parsers support both types along with configuration switches that determine the validation of the document.

The behavior of XML parsers has been defined with the purpose of easing the burden on the application's handling of the XML data. For example, the sequences of characters that are used as delimiters of the end of texts are operating system specific. Nevertheless, the XML application need not be concerned about this, as the parser will normalize all the delimiters to a single line-feed character. Whitespaces are another area where the parsers are constrained, as unlike HTML or SGML all whitespaces must be passed from the document to the application. The general entity strings are expanded by the parser as defined by the internal or external DTD subset.

  XML Processing - Attribute Values

XML parsers are required to normalize the attribute values (AttValue) before passing them to the XML application.

The table shows how the parsers handle the characters and references.

Reference Handling
Charcter Reference Append Referenced character to AttValue.
Entity Reference Expand the replacement text of that entity, appending it to the AttValue.
Whitespace Characters Replace any carriage return/line-feed pairs that are a part of an external parsed entity or the literal entity value of an internal parsed entity, or any single whitespace character with the space character and then append the space of the AttValue.
Other Characters Append the character to the AttValue.

  XML Processing

The AttValue is then processed by removing any leading or trailing spaces, and converting the multiple spaces into single spaces. The exception to this rule arises if the attribute value is declared as CDATA in the DTD and a validating parser is used.

There are two approaches in implementing an XML parser. They are the Event-Driven Parsers and the Tree-Based Parsers.

  Event-Driven Parsers

In this approach of XML processing namely the event-driven parser - the model which is familiar to the programmers of modern GUIs and operating systems - the parser executes a call-back to the application for each class of XML data that includes element with attributes, character data, processing instructions, notation, or comments.

Data handling in XML depends on the application as data is provided through the call-backs. The XML parser does not maintain the element tree structure, or any of the data after it has been parsed.

  Tree-based Parsers

The most widely used structures in software engineering is the simple hierarchical tree.

In this approach, the well-formed documents are defined as a tree, and common and mature algorithms could be used to traverse the nodes of an XML document.

This approach conforms to the Document Object Model as specified by W3C. The DOM is a platform and language neutral interface that allows manipulation of tree-structured documents.

MSXML, a Java based XML, was developed by Microsoft. XML was later included as a part of the Internet Explorer 5 with a different parser.

Back Next


Copyrights : Layout Galaxy All Rights Reserved
No part of this tutorial may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, electrostatic, magnetic tape, mechanical or otherwise, without prior permission in writing from Layout Galaxy.




17, Vadsarvala Nivas, 65-A, J. Nehru Road, Mulund (W), Mumbai - 400 080 INDIA
Tel : 91-22-25795588, 91-22-25780444 Fax : 91-22-25793397
Email : ionline@vsnl.com
© Image Online 2001-2003