Chilton::CISD::W3C UK News

Issue 38: February 2001

The XML Information Set

The W3C is known for its major initiatives in the broad and exciting areas of expanding the capabilities of the Web into new application domains, such as WebTV, the Semantic Web, and Web Services. However, equally important is the vital work in maintaining and elaborating the core standards which hold the web together. One example is the XML Core Working Group that develops the technical infrastructure supporting the core XML language, ensuring it can be used smoothly and interoperably within the other activities of the Consortium and beyond.

One task of this working group has been the development of the XML Information Set, which is now undergoing a second Working Draft Last Call. Comments on this draft need to be submitted by 23 February 2000, so it seemed timely to review the purpose and content of this task.

We know that an XML "document" can be represented in more than one way. Familiar examples might be representation as a textual file or as a printed document (its "serialized form" in the jargon), but also an XML Document can be represented as a structure of Nodes in a computer program using the DOM. Further, it could take other forms such as the stream of events raised using the SAX API, or as relational database tables. What W3C want to ensure is that whatever the representation of the document, the same items of information can be found in that representation.

The XML Information Set defines

an abstract data set called the XML Information Set (Infoset). Its purpose is to provide a consistent set of definitions for use in other specifications that need to refer to the information in a well-formed XML document.

So, what the XML Information Set does is provide an abstract description of what information items you would expect to find in any representation of an XML document, and what properties the items should have. Thus the Information Set is something that other specifications, whether produced by the W3C or otherwise, should conform to. If the specification, for a new API, or new tool based on XML, uses a representation or model of XML documents, then it should state how each of the information items and its properties are treated. In this manner, a common vocabulary can be used across all XML applications.

The Information Items

The XML Information Set defines 17 separate information items which can be found in a well-formed XML document:

The Document Information Item (the only mandatory item)
Element Information Items
Attribute Information Items
Processing Instruction Information Items
Unexpanded Entity Reference Information Items
Character Information Items
Comment Information Items
The Document Type Declaration Information Item
Internal Entity Information Items
External Entity Information Items
Unparsed Entity Information Items
Notation Information Items
Entity Start Marker Information Items
Entity End Marker Information Items
Namespace Information Items
CDATA Start Marker Information Items
CDATA End Marker Information Items

It can be seen that the information items include the familiar components of an XML Document, such as elements, attributes, processing instructions, and entities and so on. Further, there is an item for namespaces. However, as the XML Information Set applies only to well-formed documents, while it may use a DTD for external entity declarations, it does not assume any validation, and thus there are no items for such DTD components as element declarations.

Each information item has a set of properties describing the information you would reasonably expect to find from the item. For example, the Element information item has the following properties:

namespace name The namespace name, if any, of the element type.
local name The local part of the element-type name.
prefix The namespace prefix part of the element-type name.
children An ordered list of child information items, in document order.
attributes An unordered set of attribute information items, one for each of the attributes (specified or defaulted from the DTD) of this element.
namespace attributes An unordered set of attribute information items, one for each of the namespaces declared either in the start-tag of this element or provided in the DTD for this element type.
in-scope namespaces An unordered set of namespace information items, one for each of the namespaces in effect for this element.
base URI The base URI of the element, as computed by the method of XML Base.
parent The document or element information item which contains this information item in its children property.

Thus each well-formed XML document has a corresponding information set of items. Included in this set of items will be exactly one Document Information Item, representing the whole document.

An Example

As an illustrative example, consider the following minimal XML Document.

<?xml version="1.0"?>
<msg:message doc:date="19990421"
                xmlns:doc="http://www.doc.example/namespaces/doc"
                xmlns:msg="http://www.message.example/"
>Hello World!</msg:message>

The information set for this XML document contains the following information items:

A document information item.
An external entity information item for the document entity.
Five internal entity information items for the built-in entities, & < > " and '.
An element information item with namespace name property "http://www.message.example/", local part property "message", and prefix property "msg".
An attribute information item with the namespace name property "http://www.doc.example/namespaces/doc", local part property "date", prefix property "doc", and normalized value property "19990421".
Two namespace information items for the http://www.doc.example/namespaces/doc, and http://www.message.example/ namespaces, and a default namespace information item for the XML Namespace namespace, http://www.w3.org/XML/1998/namespace,
Additionally, two attribute information items for the namespace attributes.
Twelve character information items for the character data.

Each information item will have the appropriate properties, especially with regard to the parent-child relation, so items such as the element item will be children of the document item, and the attribute and character items will be children of the element item.

Note that this abstract view of the document may bear no relation to how the document is actually represented. For example, it would be legitimate to represent the twelve characters as one string, and it would be superfluous to store the five built-in entities with every document. This presents an conceptual view of the information which does not vary with representation

Conclusion

The XML Information Set activity plays a vital role in providing the unifying infrastructure to underpin the integrity of other W3C activities. Without establishing such a common vocabulary to refer to, there is a danger that all the many and various activities which build upon the basic structure of XML may diverge in the way they treat these components, and the Web may cease to operate as universal medium.

W3C Launches Semantic Web Activity

W3C has announced the launch of the Semantic Web Activity. The Semantic Web is a vision: the idea of data on the Web defined and linked in a way that it can be used by machines for automation, integration and reuse. The Web can reach its full potential only if it becomes a place where data can be shared and processed by automated tools as well as by people. Learn more in the Semantic Web Activity statement.

Background

The Web was designed with the goal that it should be useful not only for human to human communication but also that machines would be able to participate and help. One of the current obstacles has been the absence of accompanying data in the Web to allow robots and other automated tools to interpret the information present in the Web.

The W3C's work on the Resource Description Framework (RDF), as an application of XML, has provided a common foundation that many communities are using. RDF is used to put data into the Web in a form that can be processed by machines with less prior arrangement through the use of a common data model and machine-interpretable data schemas. Applications originally targeted at one area can be repurposed and incorporate data collected and published for other objectives. Designers of applications from bibliographic metadata to newsfeed content summaries to Web sitemaps to business-to-business e-commerce have begun to realize the potential of this common Web-based framework.

The Semantic Web approach proposes languages for expressing information and the relationships between information. Initially these languages provide the means for humans to encode meaning in relatively abstract ways that facilitate other machine processing with human intervention. Over time, these languages will accommodate additional formal systems techniques for verification of logical consistency and for reasoning.

The Cambridge Communique describes a view of a layered data model architecture based on the XML foundation. Generic data models, schema languages, and query languages such as those of RDF and application-specific data models can share and build on this XML foundation. The Semantic Web advanced development will demonstrate the potential of this layered architectural approach by targeting specific applications.

In a Semantic Web Roadmap, W3C previously outlined one architectural vision for a partitioning of facilities that will lead to more automation in machine processing of data on the Web. This Activity has specific tasks that will permit the W3C Members to work together with each other and with interested groups outside of the W3C Membership to build the tools that will create the Semantic Web.

W3C Workshop on Web Services Announced

W3C has organized a workshop on Web services to bring together the community interested in XML-based Web service solutions, and the standardization of Web service components. This activity may become one of the major activities of W3C in the coming years and may go to the heart of operations of some companies.

The workshop will be held in San Jose, California (USA) on 11-12 April 2001. Workshop registration is open until 1 April 2001; participation limitations and requirements are indicated on the workshop description page. The deadline for W3C Member position papers that are to be included in the workshop program is 12 March 2001.

Scope of the workshop

From its early days, Web technologies have been used to provide an interface to distributed services (e.g., HTML forms calling CGI scripts). The advent of XML has accelerated this development, and has sparked the emergence of numerous XML-based environments that enable Web services. These environments are starting to encompass the classical components of distributed application environments such as protocol conventions, security mechanisms, mechanisms to ensure reliable delivery and provide transaction functionality, interface description languages, and marshalling mechanisms, all of which are adapted to the special needs of the Web environment, and the requirements of XML.

W3C has recently started to address some of these techniques in the XML Protocol Activity, and in the XML Protocol Working Group. Since the start of this work, Members have expressed interest in expanding the scope to also cover other aspects of an XML-based distributed application environment, such as Web service descriptions. The purpose of the Web services workshop is to gather the community interested in XML-based Web service solutions and standardization of components thereof, which includes both solution providers and users of this technology. The goal of the workshop is to advise the W3C about which further actions (Activity Proposals, Working Groups, etc.) should be taken with regard to Web services.

Topics likely to be discussed at this workshop include, but are not limited to:

Reliable messaging
Security
Privacy of business data
Transactions
Interface definition languages
Discovery of Web service applications
Web service descriptions
Message and protocol semantics
Development environments for Web services
Other components of Web services not yet addressed by the XML Protocol Activity

Workshop on Quality Assurance at W3C Announced

Registration is open through 28 March for the Workshop on Quality Assurance at W3C to be held in Washington, D.C. USA, on 3-4 April 2001. Participants can share their understanding of Web QA tools, conformance activities at W3C, and discuss a potential new W3C QA Activity. Position papers should be submitted to the Workshop Chairs by 16 March.

Background

Universality and Interoperability are core to W3C's goals and operating principles. In order for specifications developed at W3C to permit full interoperability and access to all, it is very important that the quality of implementation of these standards be given as much attention as their development.

In 2000, the W3C Team suggested taking a new lead in improving the quality of implementation for W3C technologies and received strong support from the membership. A new Conformance and Quality Assurance Activity is under consideration and as a first step W3C have started gathering and formalizing existing QA efforts for the various languages and protocols they develop (see the QA matrix under development).

As the complexity of W3C specifications and their interdependencies increases, QA will become even more important to ensuring their acceptance and deployment in the market. The past experiences of HTML, CSS or more recently SMIL (all implemented with various degrees of conformance by vendors) are strong incentives to start this activity with due diligence.

A workshop is the natural W3C way of gathering interest and establishing a charter for a new activity, so W3C have decided to hold one in partnership with NIST, a leader in the development of conformance tests, in particular with W3C technologies.

Workshop Goal

The main objective of the workshop is to have W3C, its membership and the Web community involved in QA at large to share their understanding of the state of affairs for Web QA tools, technical and business practices and conformance activities at W3C or related to W3C specifications.

Furthermore, as the start of a new W3C activity is planned, one of the goals is to get feedback on the best course of action within W3C that would improve the quality of W3C specifications' implementation in the field over time (i.e. what will be in the charter of this activity). To that effect, a DRAFT Activity Proposal will be circulated prior to and discussed during the workshop.

Scope

Besides giving shape to this new potential W3C QA activity, there are several areas of interest related to Quality Assurance and Conformance of W3C technologies that W3C would like to hear about at the workshop:

experience in the validation of Web content and documents (e.g. is this CSS page valid?)
online testing conformance of user agents (is this multimedia player correctly implementing SMIL1.0?)
quality of W3C specifications themselves (wrt conformance statement, tutorial, etc.)
conformance testing methodology (e.g. test design and components of a test suite)
certification/labelling of content, products or services
common framework/harness for running tests
coordination with W3C Working Groups developing specifications
IPR and funding model

Position papers focused on general software or business QA practices (unrelated to W3C specifications or to the items above) are not in scope for this workshop.

Should you have questions regarding the workshop, please feel free to contact Daniel Dardailler or Karl Dubost.

WWW10 News

Three announcements:

The Preliminary Program Information of the WWW10 conference is now available online.
Registration to the WWW10 Conference can now be completed securely online through the web. To register, please go to the WWW10 Conference Registration page

Important dates reminder

Event	Submission deadline	Link
Culture Track Proposal	January 31, 2001
Posters	February 5, 2001
Vendors Track Proposal	February 26, 2001
CFP: Workshop on WEB ENGINEERING	February 15, 2001

W3C Membership

The number of Members has risen to 503 (12th February 2001). New Members this month are:

Finnet Group: Since the 1880's private telecom companies have been operating in Finland. At that time business people and communities founded a large number of local telephone companies. In 1930's there were over 800 private telephone companies in Finland. The Finnet Association was established in 1921. It is a lobbying and cooperation organisation for the 49 telecom operators and other companies involved in the telecommunications in Finland. Finnet Focus Ltd was founded in 1998 and it is totally owned by the Finnet Association. Its fields of activities are training, communications and information services.
fusionOne, Inc.: fusionOne is a pioneer in the development of Internet Sync-- next-generation software and services that make information access seamless and simple across multiple communications and computing devices. With fusionOne, users enter information just once in any device with assurance that the information will automatically be updated in their other devices, including cell phones, PCs, and handhelds.
Office of the E-Envoy, London, UK: The Office of the e-Envoy is leading the drive to get the UK online - the UK Government's strategy for the information age
Harmonia, Inc.: Harmonia pioneers a new, simpler way to help companies make their applications accessible from all types of computing devices: a universal language for all devices called the User Interface Markup Language.
Ikimbo: Founded in October 1999 by Internet entrepreneur Jamey Harvey and seasoned e-business strategist Eric Wimer, Ikimbo, Inc. provides communication and collaboration software and services for enterprises that need more effective business coordination methods.
The Point Group, Lake Zurich, IL, USA: In the digital economy, achieving industry leadership and competitive advantage demands an entirely fresh approach to business integration. The Point Group leads this bold initiative by combining the strategies necessary to excel in the networked business world with the knowledge and reach needed to take on leading-edge challenges worldwide. At our core is a quality methodology, passion, and focus that helps clients build visionary business models, online communities, systems, and continuous improvement processes.
Presearch Incorporated, Fairfax, VA, USA: An innovative and high technology services and products (Program Management, Simulation and Modelling, Studies and Analyses, Information and Data Systems, Systems and Software Engineering) company serving a diversified client base.
SnowShore Networks, Inc.: building innovative enhanced voice services infrastructure products for next-generation optical IP networks.
Supply Solution Inc.: SupplySolution delivers application services that allow industry partners to cut costs and collaborate more closely on fulfilling direct material requirements. Its i-Supply Service enables a customer to provide suppliers with real-time visibility into its production planning system, via the Internet.
Toyohashi University of Technology, Toyohashi, Japan: Since being established in 1976, TUT has been striving to equip students with the knowledge and skills to confront challenging work of the present and the future. To help prepare students to meet the challenges successfully, TUT has developed an innovative approach to training which emphasizes international exchanges, cooperation with industry, and interdisciplinary education and research.
WebEx Communications, Inc.: WebEx (Nasdaq: WEBX) is the leader in real-time communications infrastructure for Web meetings. Their interactive multimedia communication services meeting-enable the websites of their customers and partners, including corporations, communications service providers, on-line service providers, Web-application vendors and on-line marketplaces. WebEx's services enable end-users to share content and applications spontaneously in a seamless environment with integrated audio, voice and video.