Contact us Heritage collections Image license terms
HOME ACL ACD C&A INF CCD CISD Literature
Further reading □ Overview □ 1998 □ 123456789101112 □ 1999 □ 131415161718192021222324 □ 2000 □ 252627282930313233343536 □ 2001 □ 373839404142434445464748 □ 2002 □ 495051525354555657585960 □ 2003 □ 616263646566676869707172 □ 2004 □ 737475767778798081828384 □ 2005 □ 858687888990919293949596 □ 2006 □ 979899100101102103104105106107108
Harwell Archives Contact us Heritage archives Image license terms

Search

   
CISD and DCILiteratureW3C UK News (1998-2006)
CISD and DCILiteratureW3C UK News (1998-2006)
ACL ACD C&A INF CCD CISD Archives
Further reading

Overview
1998
123456789101112
1999
131415161718192021222324
2000
252627282930313233343536
2001
373839404142434445464748
2002
495051525354555657585960
2003
616263646566676869707172
2004
737475767778798081828384
2005
858687888990919293949596
2006
979899100101102103104105106107108

Issue 38: February 2001

The XML Information Set

The W3C is known for its major initiatives in the broad and exciting areas of expanding the capabilities of the Web into new application domains, such as WebTV, the Semantic Web, and Web Services. However, equally important is the vital work in maintaining and elaborating the core standards which hold the web together. One example is the XML Core Working Group that develops the technical infrastructure supporting the core XML language, ensuring it can be used smoothly and interoperably within the other activities of the Consortium and beyond.

One task of this working group has been the development of the XML Information Set, which is now undergoing a second Working Draft Last Call. Comments on this draft need to be submitted by 23 February 2000, so it seemed timely to review the purpose and content of this task.

We know that an XML "document" can be represented in more than one way. Familiar examples might be representation as a textual file or as a printed document (its "serialized form" in the jargon), but also an XML Document can be represented as a structure of Nodes in a computer program using the DOM. Further, it could take other forms such as the stream of events raised using the SAX API, or as relational database tables. What W3C want to ensure is that whatever the representation of the document, the same items of information can be found in that representation.

The XML Information Set defines

an abstract data set called the XML Information Set (Infoset). Its purpose is to provide a consistent set of definitions for use in other specifications that need to refer to the information in a well-formed XML document.

So, what the XML Information Set does is provide an abstract description of what information items you would expect to find in any representation of an XML document, and what properties the items should have. Thus the Information Set is something that other specifications, whether produced by the W3C or otherwise, should conform to. If the specification, for a new API, or new tool based on XML, uses a representation or model of XML documents, then it should state how each of the information items and its properties are treated. In this manner, a common vocabulary can be used across all XML applications.

The Information Items

The XML Information Set defines 17 separate information items which can be found in a well-formed XML document:

  1. The Document Information Item (the only mandatory item)
  2. Element Information Items
  3. Attribute Information Items
  4. Processing Instruction Information Items
  5. Unexpanded Entity Reference Information Items
  6. Character Information Items
  7. Comment Information Items
  8. The Document Type Declaration Information Item
  9. Internal Entity Information Items
  10. External Entity Information Items
  11. Unparsed Entity Information Items
  12. Notation Information Items
  13. Entity Start Marker Information Items
  14. Entity End Marker Information Items
  15. Namespace Information Items
  16. CDATA Start Marker Information Items
  17. CDATA End Marker Information Items

It can be seen that the information items include the familiar components of an XML Document, such as elements, attributes, processing instructions, and entities and so on. Further, there is an item for namespaces. However, as the XML Information Set applies only to well-formed documents, while it may use a DTD for external entity declarations, it does not assume any validation, and thus there are no items for such DTD components as element declarations.

Each information item has a set of properties describing the information you would reasonably expect to find from the item. For example, the Element information item has the following properties:

  1. namespace name The namespace name, if any, of the element type.
  2. local name The local part of the element-type name.
  3. prefix The namespace prefix part of the element-type name.
  4. children An ordered list of child information items, in document order.
  5. attributes An unordered set of attribute information items, one for each of the attributes (specified or defaulted from the DTD) of this element.
  6. namespace attributes An unordered set of attribute information items, one for each of the namespaces declared either in the start-tag of this element or provided in the DTD for this element type.
  7. in-scope namespaces An unordered set of namespace information items, one for each of the namespaces in effect for this element.
  8. base URI The base URI of the element, as computed by the method of XML Base.
  9. parent The document or element information item which contains this information item in its children property.

Thus each well-formed XML document has a corresponding information set of items. Included in this set of items will be exactly one Document Information Item, representing the whole document.

An Example

As an illustrative example, consider the following minimal XML Document.

The information set for this XML document contains the following information items:

Each information item will have the appropriate properties, especially with regard to the parent-child relation, so items such as the element item will be children of the document item, and the attribute and character items will be children of the element item.

Note that this abstract view of the document may bear no relation to how the document is actually represented. For example, it would be legitimate to represent the twelve characters as one string, and it would be superfluous to store the five built-in entities with every document. This presents an conceptual view of the information which does not vary with representation

Conclusion

The XML Information Set activity plays a vital role in providing the unifying infrastructure to underpin the integrity of other W3C activities. Without establishing such a common vocabulary to refer to, there is a danger that all the many and various activities which build upon the basic structure of XML may diverge in the way they treat these components, and the Web may cease to operate as universal medium.


W3C Launches Semantic Web Activity

W3C has announced the launch of the Semantic Web Activity. The Semantic Web is a vision: the idea of data on the Web defined and linked in a way that it can be used by machines for automation, integration and reuse. The Web can reach its full potential only if it becomes a place where data can be shared and processed by automated tools as well as by people. Learn more in the Semantic Web Activity statement.

Background

The Web was designed with the goal that it should be useful not only for human to human communication but also that machines would be able to participate and help. One of the current obstacles has been the absence of accompanying data in the Web to allow robots and other automated tools to interpret the information present in the Web.

The W3C's work on the Resource Description Framework (RDF), as an application of XML, has provided a common foundation that many communities are using. RDF is used to put data into the Web in a form that can be processed by machines with less prior arrangement through the use of a common data model and machine-interpretable data schemas. Applications originally targeted at one area can be repurposed and incorporate data collected and published for other objectives. Designers of applications from bibliographic metadata to newsfeed content summaries to Web sitemaps to business-to-business e-commerce have begun to realize the potential of this common Web-based framework.

The Semantic Web approach proposes languages for expressing information and the relationships between information. Initially these languages provide the means for humans to encode meaning in relatively abstract ways that facilitate other machine processing with human intervention. Over time, these languages will accommodate additional formal systems techniques for verification of logical consistency and for reasoning.

The Cambridge Communique describes a view of a layered data model architecture based on the XML foundation. Generic data models, schema languages, and query languages such as those of RDF and application-specific data models can share and build on this XML foundation. The Semantic Web advanced development will demonstrate the potential of this layered architectural approach by targeting specific applications.

In a Semantic Web Roadmap, W3C previously outlined one architectural vision for a partitioning of facilities that will lead to more automation in machine processing of data on the Web. This Activity has specific tasks that will permit the W3C Members to work together with each other and with interested groups outside of the W3C Membership to build the tools that will create the Semantic Web.

W3C Workshop on Web Services Announced

W3C has organized a workshop on Web services to bring together the community interested in XML-based Web service solutions, and the standardization of Web service components. This activity may become one of the major activities of W3C in the coming years and may go to the heart of operations of some companies.

The workshop will be held in San Jose, California (USA) on 11-12 April 2001. Workshop registration is open until 1 April 2001; participation limitations and requirements are indicated on the workshop description page. The deadline for W3C Member position papers that are to be included in the workshop program is 12 March 2001.

Scope of the workshop

From its early days, Web technologies have been used to provide an interface to distributed services (e.g., HTML forms calling CGI scripts). The advent of XML has accelerated this development, and has sparked the emergence of numerous XML-based environments that enable Web services. These environments are starting to encompass the classical components of distributed application environments such as protocol conventions, security mechanisms, mechanisms to ensure reliable delivery and provide transaction functionality, interface description languages, and marshalling mechanisms, all of which are adapted to the special needs of the Web environment, and the requirements of XML.

W3C has recently started to address some of these techniques in the XML Protocol Activity, and in the XML Protocol Working Group. Since the start of this work, Members have expressed interest in expanding the scope to also cover other aspects of an XML-based distributed application environment, such as Web service descriptions. The purpose of the Web services workshop is to gather the community interested in XML-based Web service solutions and standardization of components thereof, which includes both solution providers and users of this technology. The goal of the workshop is to advise the W3C about which further actions (Activity Proposals, Working Groups, etc.) should be taken with regard to Web services.

Topics likely to be discussed at this workshop include, but are not limited to:

Workshop on Quality Assurance at W3C Announced

Registration is open through 28 March for the Workshop on Quality Assurance at W3C to be held in Washington, D.C. USA, on 3-4 April 2001. Participants can share their understanding of Web QA tools, conformance activities at W3C, and discuss a potential new W3C QA Activity. Position papers should be submitted to the Workshop Chairs by 16 March.

Background

Universality and Interoperability are core to W3C's goals and operating principles. In order for specifications developed at W3C to permit full interoperability and access to all, it is very important that the quality of implementation of these standards be given as much attention as their development.

In 2000, the W3C Team suggested taking a new lead in improving the quality of implementation for W3C technologies and received strong support from the membership. A new Conformance and Quality Assurance Activity is under consideration and as a first step W3C have started gathering and formalizing existing QA efforts for the various languages and protocols they develop (see the QA matrix under development).

As the complexity of W3C specifications and their interdependencies increases, QA will become even more important to ensuring their acceptance and deployment in the market. The past experiences of HTML, CSS or more recently SMIL (all implemented with various degrees of conformance by vendors) are strong incentives to start this activity with due diligence.

A workshop is the natural W3C way of gathering interest and establishing a charter for a new activity, so W3C have decided to hold one in partnership with NIST, a leader in the development of conformance tests, in particular with W3C technologies.

Workshop Goal

The main objective of the workshop is to have W3C, its membership and the Web community involved in QA at large to share their understanding of the state of affairs for Web QA tools, technical and business practices and conformance activities at W3C or related to W3C specifications.

Furthermore, as the start of a new W3C activity is planned, one of the goals is to get feedback on the best course of action within W3C that would improve the quality of W3C specifications' implementation in the field over time (i.e. what will be in the charter of this activity). To that effect, a DRAFT Activity Proposal will be circulated prior to and discussed during the workshop.

Scope

Besides giving shape to this new potential W3C QA activity, there are several areas of interest related to Quality Assurance and Conformance of W3C technologies that W3C would like to hear about at the workshop:

Position papers focused on general software or business QA practices (unrelated to W3C specifications or to the items above) are not in scope for this workshop.

Should you have questions regarding the workshop, please feel free to contact Daniel Dardailler or Karl Dubost.

WWW10 News

Three announcements:

W3C Membership

The number of Members has risen to 503 (12th February 2001). New Members this month are:

⇑ Top of page
© Chilton Computing and UKRI Science and Technology Facilities Council webmaster@chilton-computing.org.uk
Our thanks to UKRI Science and Technology Facilities Council for hosting this site