|
While discussing data flow models, we saw
that there are two kinds of data in the system, data stores
and message flows. XML is useful for both kinds of data, but
the design considerations are rather different one is the
XML for messages and XML for persistent data.
Using of XML for messages in systems poses
less design problems than it for persistent data.
This is mainly because each message is fairly
self-contained, and the question of what to include in a message
usually falls out naturally from the process model. The term
message is usually used in a very general sense, which might
be an EDI-style message sent between organizations to represent a transaction.
There are some general rules that are to
be applied to all XML messages whatever their precise role might be.
The design must reflect the information
and not the intended use. This means that the use of the information
may change over time, whereas the information content is more
likely to remain stable. This applies particularly to presentation details.
The design must foresee change. The design
of XML itself gives an advantage to this area, by avoiding
traditional drawbacks such as fixed sized fields and fixed
column ordering. But the document designer also has the responsibility
to structure information in a way that foresees change.
Instead of inventing a message, it is better
to use a standard message if one is present. Such increasing
range of standardized messages is available, for example,
from the Biztalk initiative http://www.biztalk.org.
The data encoding must be as close as to
the natural coding so as to achieve within performance constraints.
The dynamic information model determines
the design of messages. By contrast, for persistent data,
it is the static model that is important.
The first thing that is to be decided is
the vastness of the document. The most difficult part of the
design is to decide what the granularity of data should be
and what needs to get into the document? There are some applications
where it makes sense to have a single XML document run into
gigabytes of data. In such a case it will be necessary to
parse the whole document, which might take hours. On the opposite
extreme, having a large number of documents is usually not ideal either.
When document XML persistent data is used,
finding information will always be a two-part operation. First
find the right document, then the facts interested in the information.
To locate the right document there are four options.
First, use the directory structure in the
operating system to locate the documents.
Second, index the documents from each other,
like in a traditional web site where documents are always
found by following links, but typically in a more structured manner.
Third, index the documents from a relational
database. In this case, we have the choice of holding the
XML documents in files referenced from the database, or holding
them in the database itself.
Fourth, index the documents using a free-text
search engine. A large number of search engines offer native support to XML.
Another option would to use the so-called
XML server. An XML server not only holds the XML data in a
raw unparsed form, but in the form of a persistent DOM, that
is, it stores the nodes of the Document Object Model as objects
in an object database.
| Mapping the Information Model to XML |
This basically deals with how to map the
different parts of the information model to an XML document
structure. One of the ways is through representation of object
types. Generally, an object type in the information model
will translate into an element type in XML structure. We can
use the name of the object type as the element name, or even abbreviate it.
Most people use short names as their elements
not to save space, but because XML seems to be more specific,
readable that way, perhaps to avoid the tags distracting too
much from the content. The advantage of using specific type
is that the DTD can define more precisely exactly what attributes
and child elements are associated with this element.
Nested elements in the XML document structure
can used to represent some of the relationships in the model.
The obvious ones to represent this way are the "contains" relationships.
There are several ways to represent a link
from one element to another in XML. We can use ID, IDREF attributes,
Xpointer references which are equivalent to the HREF tag in
HTML. We can also use application-defined primary keys and
foreign keys in XML documents.
All the three approaches have their own
merit. The main advantage of using ID, IDREF is that the validation
is done by the XML parser.
Xpointer references are much more flexible
than ID, IDREF but they are not yet fully standardized.
The option of handling relationships through
primary and foreign key is a perfectly viable approach, but
the XML parser does not give any help in this matter.
When we have identified a property in the
information model, a dilemma arises whether we represent it
in the XML document using an XML attribute or using a nested
element. In this case, there are no rules and we are free
to choose the way we want either using an attribute or using
a nested element. The table gives the pros and cons of each approach.
| |
Advantages
|
Disadvantages
|
| XML Attributes |
DTD can constrain
the values; useful when there is a small set of
allowed values, such as "yes" or "no". |
Simple string
values. No support for metadata (or attributes of attributes). |
| |
DTD can define a default value. |
Unordered. |
| |
ID and IDREF Validation. |
|
| |
Lower soace
overhead (makes a difference when sending gigabytes
of data over the network). |
|
| |
Whitespace
normalization available for certain data types that
save application some parsing effort. |
|
| |
Easier to process
DOM and SAX interfaces. |
|
| Child
elements |
Support arbitrarily
complex values and repeating values. |
Slightly higher
space usage. More complex programming. |
| |
Ordered. |
|
| |
Support "attributes
of attributes". |
|
| |
Extensible
when data model changes. |
|
|
On representing the properties of an object
using elements or attributes, we have to make a decision on
how to encode their values.
Some of the common situations that are encountered
are quantities such as height, width and weight, Yes/No values,
dates and times, property names and binary data.
| Schema Languages and Notations |
The concept of schema has been present in
both the database and the document worlds for a long time.
The formal role of a schema is to define the set of all possible
valid documents, or in other words to define what constraints,
beyond XML itself, the documents must meet for them to be more meaningful.
One purpose of a schema is to define the
difference between a valid document and an invalid one.
The second purpose of a schema is to explain to the document
the interpretation and usage of the constructs provided so
that the sender and the recipient share a common understanding
of the meaning of the message.
As a constraint language, DTDs are very
limited. They provide some control over which of the elements
can be nested within each other but say nothing about the
text contained within elements.
Copyrights : Layout Galaxy All Rights Reserved
No part of this tutorial may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, electrostatic, magnetic tape, mechanical or otherwise, without prior permission in writing from Layout Galaxy.
|
|