Chilton::CISD::W3C UK News

Issue 3: March 1998

XML 1.0

The long awaited XML 1.0 Specification was released in February. XML was created and developed by the W3C XML Working Group, which includes key industry players such as Adobe, ArborText, DataChannel, Inso, Hewlett-Packard, Isogen, Microsoft, NCSA, Netscape, SoftQuad, Sun, Texcel, Vignette, and Fuji Xerox together with experts in structured documents and electronic publishing.

XML 1.0 is a subset of SGML (Standard Generalised Markup Language, ISO 8879) for use on the Web. It retains SGML's basic features but in a form that is much easier to implement and understand. XML can be processed by existing commercial SGML tools and a rapidly growing number of free ones of its own.

XML is primarily aimed at the large-scale Web content providers for industry-specific mark-up, vendor-neutral data exchange, media-independent publishing, etc. It can also be used in metadata applications. XML is fully internationalised for both European and Asian languages, with all conforming processors required to support the Unicode character set. The language is designed for the quickest possible client-side processing consistent with its primary purpose as an electronic publishing and data interchange format.

The XML1.0 specification is available at: http://www.w3.org/TR/REC-xml

Key Features

Optimised for Use on the Internet

XML is carefully designed to avoid the requirement for delivery of multiple document components when one will do. All external addressing in the XML domain is via standard Web addresses.

Built on Experience with SGML

XML, while much simpler than SGML, and optimised for network applications, is fully compatible, thus leveraging the substantial base of SGML tools and experience.

Easy to Process

Programs to process XML are easy to write. Within a few days of the first public draft, freeware implementations arrived on the Internet. The number of implementations is now well into double figures, and is rapidly growing.

Solid Base for Internationalisation

XML avoids the pitfalls of insufficient attention to internationalisation, and of being so general as to impair interoperability. This is made possible by leveraging the use of the Unicode (ISO 10646) standard for internationalised character sets.

A General-purpose Tool

While optimised for network delivery, the design of XML includes many features designed to support authoring, indexing, and other types of application. XML's general applicability is demonstrated by the first wave of applications which concentrate on structured machine-to-machine data interchange, and generalised metadata; none of these applications were particular design targets for the working group.

Designed to Support Automation

Unlike any other Internet data format, the specification of XML includes a precise and rigorous set of rules for error and exception handling. This ensures that XML data will normally be well-formed, and, when errors occur, common fallback procedures can be established.

A Simple Example

The main characteristic of XML is user-defined tags. A simple example is:

<?XML version="1.0"?>
<exam> 
<question>Who is the last King of England</question>
<answer>George VI</answer>
<question>How many queens were named Elizabeth </question>
<answer>Two</answer>
</exam>

Elements are the most common form of mark-up. They are delimited by angle brackets and define the content they enclose. Three elements (question, answer and exam) are used in the above example.

This could be used for transmitting the answers to an exam paper from one site to another. If the two parties involved had agreed the format, there is no reason why a formal Document Type Definition (DTD) needs to be specified.

However, to formalise the format of the exchange, the DTD for the above would be something like:

<!ELEMENT exam (question, answer)+ >
<!ELEMENT question #PCDATA>
<!ELEMENT answer #PCDATA>

This states that the answer to an exam paper consists of a set of questions each followed by answers, each of which is of type PCDATA (Parseable Character Data).

Libwww

Libwww, W3C's general-purpose Web API, provides a sample implementation of HTTP and other Internet protocols and serves as a testbed for protocol experiments within W3C. It is freely available. See: www.w3.org/Library/Distribution.html.

The recent Release 5.1j is a "second generation" HTTP/1.1 implementation that uses persistent connections, pipelining, smart output buffering, and persistent caching. This was the version of the library used to test the performance of HTTP 1.1, CSS1 and PNG. See: http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html for the impressive results.

Libwww is a general code base that can be used as a basis for building a large variety of World Wide Web applications. Its main purpose is to provide services to transmit data objects rendered in many different media types either to or from a remote server using the most common Internet access methods or the local file system. It provides plain C reference implementations of those specifications and is especially designed to be used on a large set of different platforms. Version 3.1 supports more than 20 Unix flavours, VMS, Windows NT, and ongoing work is being done to extend the set of platforms.

W3C-LA Technical Workshop Series

The Web of the Future

As part of the Esprit W3C-LA leveraging Action, a one-day technical workshop is planned at RAL on Monday 27 April 1998. This workshop has been designed to highlight the new tools and techniques that will make up the "Web of the Future".

Topics to be covered include HTTP 1.1, XML, CSS, RDF, PICS, P3P, CGM and SMIL.

Further details can be obtained from the W3C Office at RAL (w3c-ral@inf.rl.ac.uk).

Events in Europe

Events coming up in Europe in the next few weeks:

19-25 March CeBIT '98, Hanover
W3C-LA Awareness Symposia, March-April 1998:
- 30 March: Stockholm
- 1 April: Bonn
- 2 April: Utrecht
On 26 March, IEE is hosting a Lecture by Tim Berners-Lee entitled 'Whither the World-Wide Web?'. Tim will receive an Honorary Fellowship and the Lord Lloyd of Kilgerran Prize at the event. Contact informatics@iee.org.uk

New Members

Seven new W3C members joined in February. The number of members has now reached 242 with a regional break down of:

	Full	Affiliate
Americas	31	111
Europe	32	35
Asia-Oceania	15	18

The new members are:

Lotus, who are now part of IBM, rejoined as a result of the relaxing by W3C of the rules. Previously, subsidiaries of large companies were not able to join W3C in their own right. The rule has now changed so that a subsidiary can join at the appropriate level. So, for example, ICL is now eligible to join W3C despite being part of Fujitsu.
S.W.I.F.T. adds to the banking community that has joined W3C recently. It provides financial processing and communication services to the banking community in 164 countries.
CNGroup is the spin-off from CommerceNet, which has the mission of defining the next generation of Internet Commerce and has a strong interest in XML/EDI activities.
Microstar Software Ltd is a Canadian company concerned with corporate knowledge management using XML.
Enigma is a Massachusetts based electronic publishing organisation whose INSIGHT software turns large document collections into intelligent electronic publications.
Allaire is another Massachusetts based company concentrating on web development tools.
Intelink Management Office is a US government organisation concerned with information dissemination.
The National Center for Biotechnology Information is part of the US national Library for Medicine with a mission of putting relevant information online.

WWW7

The annual World-Wide Web Conference in Brisbane, Australia is now only a few weeks away (14-18 April). W3C will be running a track throughout the Conference describing many of the W3C activities. There will also make a major contribution to Developer's Day on the last day of the Conference. This is the best time of the year to visit Brisbane!