|
The data objects also called as documents
that conform to the syntax specification in XML are called
Well-formed XML documents. These documents describe the structure,
and are also known as standalone XML documents.
These documents are not dependent on external
declarations, and attribute values receive no special processing
or default values.
A well-formed XML document contains one
or more elements in it that are delimited by the start and
end tags. There is one element, the document element, which
contains all the other elements within the document. All the
elements are in the form of a hierarchical tree, thus, the
relationship between the elements is in the form of a parent-child
relationship.
So, to summarize, data objects are well-formed
documents if :
Syntax conforms to the XML specification,
Elements are in the form of a simple hierarchical tree with
a single node and
There are no external references to entities.
An XML parser that encounters a construct
in XML and finds the construct not to be well-formed will
report an error to the application as "fatal" error.
This approach of error handling is the result of the compact
design of XML and the intention that XML is to be used for
much more than document display.
The W3C Recommendation has also described
the behavior of Parsers or XML processor or the lower tier
of the XML's architecture. This has been defined with the
objective of easing the burden on the applications that handle
the XML data.
There are two types of parsers namely non-validating
and validating.
The non-validating type of parser merely
ensures that a data object is a well-formed XML.
In the validating type, the parser uses
a DTD to ensure the validity of a well-formed data object's
form and content. Some parsers support both types along with
configuration switches that determine the validation of the
document.
The behavior of XML parsers has been defined
with the purpose of easing the burden on the application's
handling of the XML data. For example, the sequences of characters
that are used as delimiters of the end of texts are operating
system specific. Nevertheless, the XML application need not
be concerned about this, as the parser will normalize all
the delimiters to a single line-feed character. Whitespaces
are another area where the parsers are constrained, as unlike
HTML or SGML all whitespaces must be passed from the document
to the application. The general entity strings are expanded
by the parser as defined by the internal or external DTD subset.
| XML Processing
- Attribute Values |
XML parsers are required to normalize the
attribute values (AttValue) before passing them to the XML
application.
The table shows how the parsers handle the
characters and references.
| Reference |
Handling |
| Charcter Reference |
Append Referenced
character to AttValue. |
| Entity Reference |
Expand the
replacement text of that entity, appending it to
the AttValue. |
| Whitespace
Characters |
Replace any
carriage return/line-feed pairs that are a part
of an external parsed entity or the literal entity
value of an internal parsed entity, or any single
whitespace character with the space character and
then append the space of the AttValue. |
| Other Characters |
Append the
character to the AttValue. |
|
The AttValue is then processed by removing
any leading or trailing spaces, and converting the multiple
spaces into single spaces. The exception to this rule arises
if the attribute value is declared as CDATA in the DTD and
a validating parser is used.
There are two approaches in implementing
an XML parser. They are the Event-Driven Parsers and the Tree-Based
Parsers.
In this approach of XML processing namely
the event-driven parser - the model which is familiar to the
programmers of modern GUIs and operating systems - the parser
executes a call-back to the application for each class of
XML data that includes element with attributes, character
data, processing instructions, notation, or comments.
Data handling in XML depends on the application
as data is provided through the call-backs. The XML parser
does not maintain the element tree structure, or any of the
data after it has been parsed.
The most widely used structures in software
engineering is the simple hierarchical tree.
In this approach, the well-formed documents
are defined as a tree, and common and mature algorithms could
be used to traverse the nodes of an XML document.
This approach conforms to the Document Object
Model as specified by W3C. The DOM is a platform and language
neutral interface that allows manipulation of tree-structured
documents.
MSXML, a Java based XML, was developed by
Microsoft. XML was later included as a part of the Internet
Explorer 5 with a different parser.
Copyrights : Layout Galaxy All Rights Reserved
No part of this tutorial may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, electrostatic, magnetic tape, mechanical or otherwise, without prior permission in writing from Layout Galaxy.
|
|