In September 2004 GlobalReach estimate that only 35% of web users are native users of the English language; 35.7% use other European languages, while 32.3% use Asian languages. Those non-English users will not have browsers that default to an English character set, or the English language. They may use operating systems that do not support the proprietary character set used to develop a web page. In the April 2002 Newsletter the editorial addressed the issue of stating the character set used on a web page. Here the issue of the actual language used is addressed.
Once the character set is stated, then the web page also needs to state the natural language in which it is written. If the natural language is not stated then translation tools may not be able to automatically translate the text; search engines may not be able to filter the page correctly; CSS2 may not be able to render the page as intended; browsers may not be able to select the appropriate font; and the page may not be usable by accessibility aids such as text to speech readers, with the consequence that the owners, authors and hosts could all be liable for prosecution in the UK under the Disability Discrimination Act, and in other countries under their appropriate legislation. The Disability Rights Commission is willing to provide support for test cases being brought by individual disabled people so this legislation should not be dismissed lightly.
Character encoding does not enable unambiguous identification of a natural language. The language attribute unambiguously specifies the 'natural language' of web page content. It should always be used to indicate the primary language of the web page (in the main page container element). If the language changes within the main page container element this should also be reflected in a sub container element, eg., span, div, td, p, etc.
In XML the special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document as shown below. The values of the attribute are language identifiers as defined by IETF RFC 3066.
<p xml:lang="en-GB">What colour is it?</p> <p xml:lang="en-US">What color is it?</p>
For HTML 4, language codes are specified by adding the lang attribute to the html tag as shown below for a document in Canadian French.
<html lang="fr-CA">
When serving XHTML as text/html, you should use both the lang
attribute and the xml:lang
attribute in the html
element. The xml:lang
attribute is the standard way
to identify language information in XML. The following shows how you would mark up the
previous example for XHTML 1.0 served as text/html.
<html lang="fr-CA" xml:lang="fr-CA" xmlns="http://www.w3.org/1999/xhtml">
The xml:lang
attribute is not actually useful for handling the file as
HTML, but takes over from the
lang
attribute any time you treat the document as XML for,
say, scripting or validation.
If you are serving XHTML 1.0 pages as XML (ie. using a MIME type such as
application/xhtml+xml) or serving pages as XHTML 1.1, you do not
need the lang
attribute, since this is part of the
HTML language. The xml:lang
attribute alone will
suffice.
<html xml:lang="fr-CA" xmlns="http://www.w3.org/1999/xhtml">
Few authors write web pages by hand, and most of us rely on editors and other development tools. But you should still check that the code that these are producing states the character set and the language of the page, and if not you need to decide if you will risk the consequences described above, change your tools, or manually edit the pages.
Further Guidance on setting language of web pages is available from W3C in Tutorial: Using language information in XHTML, HTML and CSS.
2004-11-30: This year, the World Wide Web Consortium celebrates its tenth anniversary - ten years of its mission to lead the Web to its full potential. On 1 December, W3C Members, Team, invited speakers, and international media gathered in Boston, USA to reflect on the progress of the Web, W3C's central role in its growth, and the risks and opportunities facing the Web during W3C's second decade. "This special anniversary brings the opportunity to acknowledge the impact of the Web and the W3C's stewardship role," said Tim Berners-Lee, W3C Director. "I hope it will also inspire ever more collaboration, creativity, and understanding across the globe." Sign the greeting card, read the press release and read more about the W3C Tenth Anniversary Celebration.
2004-12-09: In a proclamation issued 1 December, Massachusetts Governor Mitt Romney has declared December 2004 to be World Wide Web Consortium Month. Read by COO Steve Bratt at the W3C Tenth Anniversary Celebration, the proclamation cites W3C for "its good work and concern for the diverse users of the Web" and says W3C "earned their respect, trust and support." See the official document and read the full text.
2004-11-18: SFC Open Research Forum (ORF) (in Japanese) is an annual open house event of the Keio Research Institute of Shonan Fujisawa Campus (SFC), Keio University, Japan. At ORF 2004, W3C/Keio organized a talk session, "W3C Forum in ORF," on 24 November. Tatsuya Hagino chaired, and Masayasu Ishikawa, Martin Durst, Yoshio Fukushige and Kazhiro Kitagawa gave talks on Web technologies such as Compound Document Formats, Internationalization, the Semantic Web and Social Information Filtering. The event is open to interested companies and the general public.
Browse W3C in the Press. A selection of articles since the last Newsletter:
Browse upcoming W3C appearances and events.
Please welcome: