XML fundamentals
Definition
XML
= extensible markup language
W3C standard for document markup
-
structural and semantic language / document type
static documents that don’t do anything:
not a programming language, network protocol
not a database but can be stored in databases
-
plain-text
portable data, human-readable, machine-readable
-
application-specific, extensible
no fixed set of tags
can be extended to different needs
-
parsing
content gets parsed off of document
must be well-formd / syntactically correct
HTML
XML ≠ HTML
- presentation language
- Fixed set of tags with predefined meanings
-
not extensible
only used for web pages
Fundamentals
Elements and Tags
element = tags + content
- content can be empty, consist of text, elements or be mixed
- tags are case-sensitive
Attributes
<tagName attributeName="attributeValue">
Tags can have multiple attributes
- attribute order is not significant
- attribute names must be unique
allowed names
XML names = element names, attribute names, construct names
- alphanumeric (but foreign letters allowed)
-
must start with letter or underscore
some constraints for
:
-
must not start with
xml
(independent of casing)
- no size limit
character references
content must not contain
<
,
&
but can use character references instead)
(new definitions can be added)
Mandatory:
<
for
<
&
for
&
Optional:
>
for
>
"
for
"
'
for
'
Comments
<!-- comment -->
not element
comment must not contain
--
Processing Instructions
<?target instruction?>
ie.
<?xml-stylesheet href="course.css" type="text/css"?>
not element
used to to pass information to applications
the target is the XML name of the application or an identifier
XML Declaration
first thing in the document
not element, not processing instruction
should begin with
xml
(but optional)
ie.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
version
used XML version in document
encoding
used character encoding - default is UTF-8
standalone
whether the document uses external declarations - default is no