N1854

ISO/IEC JTC1/SC18/WG8

Document Processing and Relating Communication—

Document Description and Processing Languages

TITLE:Extended Naming Rules External Syntax
SOURCE: Rick Jelliffe
PROJECT:
PROJECT EDITOR:
STATUS:WG8 approved statement
ACTION:For adoption as national standards by national bodies as appropriate
Summary of major points:This document gives the syntax recommended by WG8 for external syntax declarations, to support of SGML names in non-Latin scripts. Though the syntax given is extra-standard, an SGML document can validly refer to a syntax declaration that uses a non-standard syntax, using a public identifer.
DATE:24 May 1996
DISTRIBUTION: WG8 and Liaisons
REFER TO:WG8 N1861
REPLY TO:Dr. James D. Mason
(ISO/IEC JTC1/SC18/WG8 Convenor)
Oak Ridge National Laboratory
Information Management Services
Bldg. 2506, M.S. 6302, P.O. Box 2008
Oak Ridge, TN 37831-6302 U.S.A.
Telephone: +1 423 574-6973
Facsimile: +1 423 574-6983
Network: masonjd@ornl.gov
http://www.ornl.gov/sgml/wg8/wg8home.htm
ftp://ftp.ornl.€

Extended Naming Rules Recommendation

This document describes a recommended extension of SGML known as the "Extended Naming Rules". The extension should be used only in SGML documents for which the normal naming rules are unsuitable (usually because of the size of the natural language character set). An SGML system need not support these Extended Naming Rules in order to be a conforming SGML system.

This recommendation is phrased in terms of revisions to be made to the body of the International Standard ISO 8879:1986. However, these revisions are only applicable in an entity referred to using the public identifier in the Syntax parameter of the SGML Declaration. This variant SGML syntax declaration syntax has the public identifier 'ISO/IEC JTC1/SC18/WG8 N1854//NOTATION Extended Naming Rules//EN'.

Extended Naming Rules

For many languages the distinction made in production [189] between uppercase and lowercase is not relevant. It is, therefore, necessary to modify clause 13.4.5 to allow for both an extended character set and for the use of character sets that do not have different cases. The changes required, in the order of their occurrence in 13.4.5, are:

  1. Replace production [189] with:
    [189] naming rules =
     "NAMING", ps+,
     "LCNMSTRT", (ps+, extended naming value)+,
     "UCNMSTRT", (ps+, extended naming value)+,
     ("NAMESTRT", (ps+, extended naming value)+)?,
     "LCNMCHAR", (ps+, extended naming value)+,
     "UCNMCHAR", (ps+, extended naming value)+,
     ("NAMECHAR", (ps+, extended naming value)+)?,
     "NAMECASE", ps+,
     "GENERAL", ps+, ("NO"| "YES"), ps+,
     "ENTITY", ps+, ("NO"| "YES") 
  2. In the "where" list change each occurrence of the phrase "in the literals (if any)" to "identified by the extended naming value (if any)"
  3. Add two new keywords to the "where" list:
    NAMESTRT
    means that each character identified by the extended naming value (if any) has the same effect as a character appearing in both UCNMSTRT and LCNMSTRT.
    NAMECHAR
    means that each character identified by the extended naming value (if any) has the same effect as a character appearing in both UCNMCHAR and LCNMCHAR.
  4. At the end of the clause, add:

    [189.1] extended naming value = parameter literal | character number | character range

    A character number may be used to specify a character that is defined in the syntax-reference character set but is not permitted in an SGML declaration.

    [189.2] character range = character number, ps*, "-", ps*, character number

    Specifying a character range is equivalent to specifying every character number from (and including) the character number that starts the range to (and including) the character number that ends the range.

Background (removed from original header section)

Note:
This document specifies the syntax recommended by WG8 for external syntax declarations, to support the use of SGML names in non-Latin scripts. Though the syntax given is extra-standard, an SGML document can validly refer to a syntax declaration that uses a non-standard syntax, using a public identifer.

The following statements occur in clause 0.2 of ISO 8879:1986:

  1. There must be no national language bias.

    The characters used for names can be augmented by any special national characters.

This is contradicted by the restriction, in production [189] of the current specification, that only a single parameter literal, whose length may not exceed 240 characters, can be used to specify name characters. This means that, for characters outside the ISO 646 character set which have to be specified using numeric character references, no more than 40 additional name characters can be specified. Clearly this is insufficient to support most languages, especially those with large character sets such as Japanese, Chinese and Korean.