|
One of XML's best features is its ability
to provide structure to a document. Every XML document includes
both logical and a physical structure.
The logical structure is like a template
that entitles the elements to be included in a document and
in the order in which they have to be included.
The physical structure contains the actual
data used in a document.
Logical Structure refers to the organization
of the different parts of a document. It indicates how a document
is built, as opposed to what a document contains.
The first structural element in an XML document
is an optional prolog element. The prolog is the base for
the logical structure of an XML document.
The Prolog consists of two basic components,
the XML Declaration and the Document Type Declaration. These
two components are also optional.
The XML Declaration identifies the version
of the XML specification to which the document conforms. Although
the XML declaration is an optional element, we should always
include it in the XML document.
The code snippet here gives an example of
basic XML declaration. Here, the line of code must use only lowercase letters.
<?xml version="1.0"?>
An XML declaration can also contain an encoding
declaration and a stand-alone document declaration.
The encoding declaration identifies the character-encoding
scheme, such as UTF-8 or EUC-JP. Different encoding schemas
map to different character formats or languages. For example,
UTF-8, the default scheme, includes representations for most
of the characters in the English language.
The stand-alone document declaration identifies
whether any markup declarations exist that are external to
the document. The stand-alone document declaration has the
value Yes or no.
| Document Type Declaration |
The Document Type Declaration consists of
markup code that indicates the grammar rules, or Document
Type Definition (DTD), for the particular class of the document.
The document type declaration can also point to an external
file that contains all or part of the DTD.
<?xml version="1.0"?>
<!DOCTYPE Book SYSTEM "Book.dtd">
The code snippet here, conveys the XML processor
that the document is of the class Catalog and conforms to
the rules formed in the DTD file named "book.dtd".
The second structural element in an XML
document is the document element, where the actual content
lies. Each XML document must have only one root element, and
all other elements must be completely enclosed in that element.
The document element contains all the data in an XML document.
This element can comprise any number of nested sub elements
and external entities.
<?xml version="1.0"?>
<!DOCTYPE Book SYSTEM "Book.dtd">
<Book>
<Bookname>Paradise Lost</Bookname>
<Authorname>John Milton</Authorname>
</Book>
The code snippet given here shows book element
in the Book.dtd. Here we can see that Element tags can include
one or more optional or mandatory attributes that give further
information about the elements they delimit. Attributes can
only be specified in the start tag.
<element.type.name attribute.name="attribute
value">
The code snippet here gives the syntax for
specifying an attribute.
In direct contrast to SGML and HTML, in
which multiple declarations are considered as errors, XML
deals with multiple declarations of attributes in a unique
manner. If an element appears once with one set of attributes
and then appears again with a different set of attributes,
the two sets of attributes are merged. The first declaration
for a particular element is the only one that counts, and
any other declarations are ignored.
| Physical Structure in XML |
The physical structure of an XML document
is composed of all the content used in that document.
The storage units, called Entities, can
be part of the document or external to the document. Each
entity is identified by a unique name and contains its own
contents, from a single character inside the document to a
large file that exists outside the document.
In terms of the logical structure of an
XML document, entities declared in the prolog and referenced
in the Document element, an entity directs the processor to
retrieve the content of the entity, as declared in the entity
declaration, and use it in the document.
Entities in an XML document can be handled
in the following ways. Entities may either be parsed or unparsed,
Or Entities may be Predefined Entities, or the entities may
be an External or an Internal Entity.
| Parsed and Unparsed Entities |
An entity can be either parsed or unparsed.
A parsed entity, also called a text entity, contains text
data that becomes part of the XML document once that data
is processed.
An unparsed entity is a container whose
contents may or may not be text. If the content is a text,
the content is not parsable XML.
A parsed entity is intended to be read by
the XML processor, which will extract the content. After the
content is extracted, a parsed entity's content appears as
part of the document at the location of the entity reference.
For example, in our Book document, a publisher information
entity may be declared as in the following code snippet. Whenever
this entity declaration is referenced in the document, it
will be replaced by its content. So, if we need to change
it in only one place, the declaration, the change will be
reflected wherever the entity is used in the document.
<! ENTITY Publisher1 "McGrawHill Publishing
Company.">
An Unparsed entity is sometimes referred
to as a binary entity because its content is often a binary
file (such as an image) that is not directly interpreted by
the XML processor. An unparsed entity requires a notation.
A notation identifies the format, or type, of resource to
which the entity is declared. The following code snippet shows
the declaration of an unparsed entity.
<!NOTATION GIF SYSTEM "/Utils/Gifview.exe">
In XML, certain characters are used specifically
for marking up the document. For example, in the following
element, the angle brackets (< >) and forward slash
(/) are interpreted as markup and not as actual character data.
The characters that are reserved for markup
cannot be used as content. If we intend to use these characters
to be displayed as data, they must be escaped. To escape a
character, we must use an entity to insert the character into
a document. So, if the text <bookname> is to be entered
in the document, we should use the following sequence.
<BOOKNAME>
The table shown here gives a list of all
the predefined entities.
| Entity Reference |
Character |
| < |
< (opening angle bracket) |
| > |
> (closing angle bracket) |
| & |
& (ampersand) |
| &apos |
' (apostophe) |
| " |
" (double quotation mark) |
|
| Internal and External Entity |
An Internal entity is one in which no separate
physical storage exists. The content of the entity is provided
in its declaration as shown in the following piece of code.
<! ENTITY Publisher1 "McGrawHill Publishing
Company.">
An External entity refers to a storage unit
in its declaration by using a system or public identifier.
The system identifier provides a pointer to a location at
which the entity content can be found, such as URI (Uniform
Resource Identifier). The following code snippet gives an
example of how a file book1.gif is used by XML processor to
read and retrieve the content of this entity.
<ENTITY FirstImg SYSTEM "www.books.com/images/book1.gif"
NDATA GIF>
In HTML code, an element usually contains
an opening tag and an optional closing tag. XML, unlike HTML,
requires a closing tag for every element.
HTML is based on a predefined structure
that allows processors to assume where certain tags should
be located in a document. Since a paragraph in HTML cannot
be nested inside another paragraph, the processor can read
an opening paragraph tag and assume that it also marks the
end of the preceding paragraph. Such minimization techniques
are not allowed in XML.
Although XML requires the usage of a closing
tag, it supports a shortcut for empty elements called as the
empty-element tag. The empty-element tag effectively combines
the opening and closing tags for an element containing no
content. It uses a special format: <TAGNAME/>.
In this format, the forward slash follows the tag name, which
is not supported in HTML.
Attributes provide a method of associating
values to an element without making the attributes a part
of the content of that element.
<PRICE CURRENCY="USD">315.00</PRICE>
The code snippet here give an example, here
we can see that a currency attribute can be added to the price
element of the book document instead of adding a separate
currency element to the document.
The attribute in XML is used in the same
way as an HTML attribute, but we can define our own attribute
names. One important point is that the value of the attribute
must be within single or double quotes.
The DTD (Document Type Definition) specified
in the prolog outlines all the rules for the document.
A valid document must obey the rules specified
in the DTD. A valid document also obeys all the validity constraints
identified in the XML specification.
The processor must understand the validity
constraints of the XML specification and check the document
for possible violations. If the processor finds any errors,
it must report them to the XML application. The processor
must also read the DTD, validate the documents against it,
and again report any violations to the XML application.
As all the above-mentioned processing and
checking can take time and because validation might not always
be necessary, XML supports the concept of well-formed document.
A Document is described as well-formed if
it meets the well-formedness constraints of the XML recommendation.
Principally, this means it must have a single root element
and all the other elements must be correctly nested. If a
document is well formed, it can be correctly parsed by a computer program.
Well-formedness can reduce the amount of
work a client has to do. For example, if the server has already
validated a document, it is not necessary to burden the client
with validating the document again. As a result, well-formedness
can save download time because the client does not need to
download the DTD, and it can save processing time as the DTD
need not be processed again.
In many cases, authoring a DTD or validating
a document is unnecessary. For example, someone in a small
company might want to use XML to provide structure to a departmental
web site, but all the features that validation provides are
not needed for the site.
According to XML specification, a well-formed
document must meet the following criteria:
A well-formed document must match the definition
of a document. The definition of a document is that it should
contain one or more elements. It contains exactly one root
element; also called the Document element, and all other elements
are properly nested.
All of the parsed entities referenced in
the document are well formed. Since parsed entities become
part of the document once the XML processor parses them, they
must satisfy the well-formedness constraints, for the document
to be considered well-formed.
A well-formed document must observe the
constraints for a well-formed document as defined by the XML
specification.
Copyrights : Layout Galaxy All Rights Reserved
No part of this tutorial may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, electrostatic, magnetic tape, mechanical or otherwise, without prior permission in writing from Layout Galaxy.
|
|