| Basic Markup Declarations |
The content of an XML document is defined
in terms of four kinds of markup declarations used in the DTD.
| DTD Construct |
Meaning |
| ELEMENT |
Declaration of an XML element type. |
| ATTLIST |
Declaration of
the attributes that may be assigned to a specific
element type and the permissible values of those attributes. |
| ENTITY |
Declaration of a reusable content. |
| NOTATION |
Format declaration
for external content not meant to be parsed and the
external application that handles the content. |
|
The keywords associated with these declarations
and their meanings are as shown in the table. The first two
declarations deal with the information found in an XML document
element namely ELEMENTS and ATTRIBUTES. The last two types
could be considered as supporting players. Entities in particular,
are designed to make an XML vocabulary designer's life easier.
They normally consist of content that recurs in the DTD or
document to warrant creating a special declaration. Notations
deal with content other than XML. A notation is used to declare
a particular class of data and associate it with an external
program. That external program becomes the handler for the declared class of data.
| Formal DTD Structure - Entities |
XML provides a facility for declaring chunks
of content and referencing them as many as many times as we
like where they are needed, saving space and sparing document
authors, a lot of typing. With the declaration of an entity
in the DTD, we can define a name and the content it refers
to. When needed, we can refer to it by name with a particular
syntax that the name is an entity reference.
An entity used within the content of a document
is called as a General Entity.
A parsed entity is XML document. The value of the entity is
known as the replacement text. In contrast, an unparsed entity
need not even be text. If it is text, it need not be XML.
If the replacement content is not XML, there is no need in
turning a parser on it. On the other hand, a parsed entity
is XML that is pasted into the document content, so it must
be passed through the parser.
XML reserves some characters such as the
angle brackets for its own use.
In addition, some characters are unprintable.
XML therefore provides some predefined entities so that authors
can use these characters in their documents without conflict.
Hence, in the text content of an element, for example, certain
characters can be referred to without using them and being
confused with markup by the document processor at parse time.
Any character can be referred to by a numeric
reference. This is done by writing the characters followed
immediately by the numeric value of the character and a semicolon.
So for example the greater than symbol could be written as >.
Some characters are so prevalent in XML
that XML provides some predefined entities which are as shown in the table.
| Character |
Entity Reference |
| < |
< |
| > |
> |
| & |
& |
| ' (apostrophe) |
' |
| ? (question mark) |
" |
|
It allows us to declare a piece of parsed
text associated with a name by which we shall refer to the
text. The entity is declared with the keyword ENTITY, a name
and a replacement value. The figure shows an example of the usage.
<!ENTITY copyright "© Image Online,
2001-2003"> ©right;
With this declaration in place, we can plug
in the copyright text anywhere in a document's content when
we need it simply by referring to the name "copyright".
Of course, the parser needs to be told when we are making
an entity reference so that it will not confuse the entity
name with markup text. To signal this intent, we delimit the
name with an ampersand in front of the name and a semicolon
following. There cannot be a whitespace between the name and its delimiters.
It is to be noted that the ampersand character
is reserved for this role in XML, if we need to use an ampersand
for something else in a document, we must use the predefined
entity for the character.
<!ENTITY Entity1 SYSTEM http://www.vvco.com/boilerplate/copyrighttext.txt>
General entities also have an external form,
where the replacement text is given in an external file. The
declaration takes the form as shown in the figure. The keyword
SYSTEM is used to indicate an external source followed by the URL for the file.
Lastly, entities must not contain references
to themselves, directly or indirectly.
Parsed entities that are used solely within
the DTD are called as Parameter Entities.
Parameter entities allow the user to easily
reference or change commonly used constructs in the DTD by
keeping them in one place.
This is easier than changing a construct
everywhere as and when it appears in a DTD, but it still must
be edited when a construct is extended.
The keyword CDATA refers to character data.
The replacement text is a part of an attribute list declaration
containing three common attributes. This is processed as if
it had been written into the DTD. Whenever this set of attributes
turns up in the DTD, we can simply refer to the entity peopleParameters.
All the parameter entities must be declared
before they are referred to in the DTD.
This means that the parameter entity declared
in the external subset of the DTD cannot be referred to in
the internal subset as the latter is read first by the parser,
thus, the reference will be seen before the declaration.
A parameter entity reference consists of
the name delimited by a percent sign in front of the name
and a semicolon following. There cannot be any whitespace
between the delimiters and the name.
<!ATTLIST InsuredPerson
age CDATA # IMPLIED weight CDATA #IMPLIED height CDATA #REQUIRED
carrier CDATA #REQUIRED
Thus the reference for the example we had
seen in the previous screen would be as shown. For the moment
the InsuredPerson element is declared to have four attributes:
one carrier, which is explicitly declared and the other three
namely age, weight and height that appeared in the parameter
entity and have already been declared when the replacement
text is substituted for the entity reference by the parser.
The above example is thus equivalent to the figure as shown.
<ATTLIST InsuredPerson
%peopleParameters; carrier CDATA #REQUIRED>
All the rules for well-formed documents
apply to parameter entities. The document must be well-formed
after the replacement text has been substituted for the entity reference.
Just as the case of general entities, parameter
entities can also have replacement text that resides in an external file.
| Formal DTD Structure - Elements |
Elements are the heart and soul of XML.
Element types are declared in DTDs using
the ELEMENT tag. In addition to the keyword, the tag provides
a name for the declared type and a content specification.
The element type names have some restrictions
that apply to names throughout XML. Names may use letters,
digits and punctuation marks colon, underscore, hyphen and
period. Names may however not begin with a digit. They may
only begin with a letter, underscore or colon.
The element content can be classified into
four categories namely empty, element, mixed and any.
An empty element neither has text nor child
elements contained in it. It may however have attributes.
The empty element is denoted by the keyword EMPTY.
Element content is the condition where the
element contains child elements but no text.
Mixed content as the name infers is a mix
of elements and parsed character data (#PCDATA) or content.
Element and mixed are the two types where
we can use structure to express meaning. Mixed and element
content is indicated with a content model.
If we wish to leave the content of an element
wide-open to any content that does not violate XML well- formed
syntax, we declare it using the keyword ANY.
Copyrights : Layout Galaxy All Rights Reserved
No part of this tutorial may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, electrostatic, magnetic tape, mechanical or otherwise, without prior permission in writing from Layout Galaxy.
|
|