Untitled Document



 Introduction to XML
 Markup Laguages
   Internet Introduction
   Markup Languages
 Specific Markup  Languages
 Generalized Markup  Language
 SGML - A  MetaLanguage
   SGML - Example
 XML over HTML
 Introduction to XML
 
   What XML does?
 Need for XML based  Languages
   Publishing XML
 XML and word  processors
 Checking XML  Structure
 XML - Document  presentation
 XML over SGML
 XML Structure
 
   XML structure
   Logical structure
   XML Declaration
 Document Type  Declaration
 Physical Structure in  XML
 Parsed and Unparsed  Entities
   Predefined Entities
 Internal and External  Entity
   XML General Syntax
   Attributes
   Valid Documents
 Well - Formed  Documents
 Data Definition and Data  Modeling
 Namespaces and  Schemas
 Linking and Querying
 Ecommerce Application  using XML

Copyrights : Layout Galaxy All Rights Reserved
No part of this tutorial may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, electrostatic, magnetic tape, mechanical or otherwise, without prior permission in writing from Layout Galaxy.




 Introduction to XML > XML Structure

  XML structure

One of XML's best features is its ability to provide structure to a document. Every XML document includes both logical and a physical structure.

The logical structure is like a template that entitles the elements to be included in a document and in the order in which they have to be included.

The physical structure contains the actual data used in a document.

  Logical structure

Logical Structure refers to the organization of the different parts of a document. It indicates how a document is built, as opposed to what a document contains.

The first structural element in an XML document is an optional prolog element. The prolog is the base for the logical structure of an XML document.

The Prolog consists of two basic components, the XML Declaration and the Document Type Declaration. These two components are also optional.

  XML Declaration

The XML Declaration identifies the version of the XML specification to which the document conforms. Although the XML declaration is an optional element, we should always include it in the XML document.

The code snippet here gives an example of basic XML declaration. Here, the line of code must use only lowercase letters.

<?xml version="1.0"?>

An XML declaration can also contain an encoding declaration and a stand-alone document declaration.

The encoding declaration identifies the character-encoding scheme, such as UTF-8 or EUC-JP. Different encoding schemas map to different character formats or languages. For example, UTF-8, the default scheme, includes representations for most of the characters in the English language.

The stand-alone document declaration identifies whether any markup declarations exist that are external to the document. The stand-alone document declaration has the value Yes or no.

  Document Type Declaration

The Document Type Declaration consists of markup code that indicates the grammar rules, or Document Type Definition (DTD), for the particular class of the document. The document type declaration can also point to an external file that contains all or part of the DTD.

<?xml version="1.0"?>
<!DOCTYPE Book SYSTEM "Book.dtd">

The code snippet here, conveys the XML processor that the document is of the class Catalog and conforms to the rules formed in the DTD file named "book.dtd".

The second structural element in an XML document is the document element, where the actual content lies. Each XML document must have only one root element, and all other elements must be completely enclosed in that element. The document element contains all the data in an XML document. This element can comprise any number of nested sub elements and external entities.

<?xml version="1.0"?>
<!DOCTYPE Book SYSTEM "Book.dtd">

<Book>
<Bookname>Paradise Lost</Bookname>
<Authorname>John Milton</Authorname>
</Book>

The code snippet given here shows book element in the Book.dtd. Here we can see that Element tags can include one or more optional or mandatory attributes that give further information about the elements they delimit. Attributes can only be specified in the start tag.

<element.type.name attribute.name="attribute value">

The code snippet here gives the syntax for specifying an attribute.

In direct contrast to SGML and HTML, in which multiple declarations are considered as errors, XML deals with multiple declarations of attributes in a unique manner. If an element appears once with one set of attributes and then appears again with a different set of attributes, the two sets of attributes are merged. The first declaration for a particular element is the only one that counts, and any other declarations are ignored.

  Physical Structure in XML

The physical structure of an XML document is composed of all the content used in that document.

The storage units, called Entities, can be part of the document or external to the document. Each entity is identified by a unique name and contains its own contents, from a single character inside the document to a large file that exists outside the document.

In terms of the logical structure of an XML document, entities declared in the prolog and referenced in the Document element, an entity directs the processor to retrieve the content of the entity, as declared in the entity declaration, and use it in the document.

Entities in an XML document can be handled in the following ways. Entities may either be parsed or unparsed, Or Entities may be Predefined Entities, or the entities may be an External or an Internal Entity.

  Parsed and Unparsed Entities

An entity can be either parsed or unparsed. A parsed entity, also called a text entity, contains text data that becomes part of the XML document once that data is processed.

An unparsed entity is a container whose contents may or may not be text. If the content is a text, the content is not parsable XML.

A parsed entity is intended to be read by the XML processor, which will extract the content. After the content is extracted, a parsed entity's content appears as part of the document at the location of the entity reference. For example, in our Book document, a publisher information entity may be declared as in the following code snippet. Whenever this entity declaration is referenced in the document, it will be replaced by its content. So, if we need to change it in only one place, the declaration, the change will be reflected wherever the entity is used in the document.

<! ENTITY Publisher1 "McGrawHill Publishing Company.">

An Unparsed entity is sometimes referred to as a binary entity because its content is often a binary file (such as an image) that is not directly interpreted by the XML processor. An unparsed entity requires a notation. A notation identifies the format, or type, of resource to which the entity is declared. The following code snippet shows the declaration of an unparsed entity.

<!NOTATION GIF SYSTEM "/Utils/Gifview.exe">

  Predefined Entities

In XML, certain characters are used specifically for marking up the document. For example, in the following element, the angle brackets (< >) and forward slash (/) are interpreted as markup and not as actual character data.

The characters that are reserved for markup cannot be used as content. If we intend to use these characters to be displayed as data, they must be escaped. To escape a character, we must use an entity to insert the character into a document. So, if the text <bookname> is to be entered in the document, we should use the following sequence.

&lt;BOOKNAME&gt;

The table shown here gives a list of all the predefined entities.

Entity Reference Character
&lt < (opening angle bracket)
&gt > (closing angle bracket)
&amp & (ampersand)
&apos ' (apostophe)
&quot " (double quotation mark)

  Internal and External Entity

An Internal entity is one in which no separate physical storage exists. The content of the entity is provided in its declaration as shown in the following piece of code.

<! ENTITY Publisher1 "McGrawHill Publishing Company.">

An External entity refers to a storage unit in its declaration by using a system or public identifier. The system identifier provides a pointer to a location at which the entity content can be found, such as URI (Uniform Resource Identifier). The following code snippet gives an example of how a file book1.gif is used by XML processor to read and retrieve the content of this entity.

<ENTITY FirstImg SYSTEM "www.books.com/images/book1.gif" NDATA GIF>

  XML General Syntax

In HTML code, an element usually contains an opening tag and an optional closing tag. XML, unlike HTML, requires a closing tag for every element.

HTML is based on a predefined structure that allows processors to assume where certain tags should be located in a document. Since a paragraph in HTML cannot be nested inside another paragraph, the processor can read an opening paragraph tag and assume that it also marks the end of the preceding paragraph. Such minimization techniques are not allowed in XML.

Although XML requires the usage of a closing tag, it supports a shortcut for empty elements called as the empty-element tag. The empty-element tag effectively combines the opening and closing tags for an element containing no content. It uses a special format: <TAGNAME/>. In this format, the forward slash follows the tag name, which is not supported in HTML.

  Attributes

Attributes provide a method of associating values to an element without making the attributes a part of the content of that element.

<PRICE CURRENCY="USD">315.00</PRICE>

The code snippet here give an example, here we can see that a currency attribute can be added to the price element of the book document instead of adding a separate currency element to the document.

The attribute in XML is used in the same way as an HTML attribute, but we can define our own attribute names. One important point is that the value of the attribute must be within single or double quotes.

  Valid Documents

The DTD (Document Type Definition) specified in the prolog outlines all the rules for the document.

A valid document must obey the rules specified in the DTD. A valid document also obeys all the validity constraints identified in the XML specification.

The processor must understand the validity constraints of the XML specification and check the document for possible violations. If the processor finds any errors, it must report them to the XML application. The processor must also read the DTD, validate the documents against it, and again report any violations to the XML application.

As all the above-mentioned processing and checking can take time and because validation might not always be necessary, XML supports the concept of well-formed document.

  Well - Formed Documents

A Document is described as well-formed if it meets the well-formedness constraints of the XML recommendation. Principally, this means it must have a single root element and all the other elements must be correctly nested. If a document is well formed, it can be correctly parsed by a computer program.

Well-formedness can reduce the amount of work a client has to do. For example, if the server has already validated a document, it is not necessary to burden the client with validating the document again. As a result, well-formedness can save download time because the client does not need to download the DTD, and it can save processing time as the DTD need not be processed again.

In many cases, authoring a DTD or validating a document is unnecessary. For example, someone in a small company might want to use XML to provide structure to a departmental web site, but all the features that validation provides are not needed for the site.

According to XML specification, a well-formed document must meet the following criteria:

A well-formed document must match the definition of a document. The definition of a document is that it should contain one or more elements. It contains exactly one root element; also called the Document element, and all other elements are properly nested.

All of the parsed entities referenced in the document are well formed. Since parsed entities become part of the document once the XML processor parses them, they must satisfy the well-formedness constraints, for the document to be considered well-formed.

A well-formed document must observe the constraints for a well-formed document as defined by the XML specification.

Back Next


Copyrights : Layout Galaxy All Rights Reserved
No part of this tutorial may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, electrostatic, magnetic tape, mechanical or otherwise, without prior permission in writing from Layout Galaxy.




17, Vadsarvala Nivas, 65-A, J. Nehru Road, Mulund (W), Mumbai - 400 080 INDIA
Tel : 91-22-25795588, 91-22-25780444 Fax : 91-22-25793397
Email : ionline@vsnl.com
© Image Online 2001-2003