ISO/IEC JTC1/WG4 N1981

ISO/IEC JTC1/WG4

Document Description Languages

TITLE: ISO 8879 TC 3
(C) Copyright 1998 International Organization for Standardization
SOURCE: JTC1/WG4 plenary
PROJECT: JTC1.18.15.1
PROJECT EDITOR: Charles F. Goldfarb
STATUS: Approved by WG4
ACTION: For ballot by JTC1
SUMMARY OF MAJOR POINTS: This document corrects errors and clarifies text in Annex K of ISO 8879 and informative Annex L that were observed during implementation.
DATE: 15 May 1998
DISTRIBUTION: WG4 members
REFER TO: ISO 8879
SUPERCEDES: WG4 N1959, N1963
REPLY TO: Dr. James D. Mason
(ISO/IEC JTC1/WG4 Convenor)
Lockheed Martin Energy Systems
Information Technology Services
1060 Commerce Park
Oak Ridge, TN 37830 U.S.A.
Telephone: +1 423 574-6973
Facsimile: +1 423 574-6983
Network: masonjd@ornl.gov
http://www.ornl.gov/sgml/wg4/
ftp://ftp.ornl.gov/pub/sgml/wg4/

ISO 8879 TC3

Renumber original notes 4 through 12 as 3 through 11.

Renumber original notes 20 through 21 as 22 through 23.

K.1 Conformance

Replace second paragraph of K.1 with:

This annex is organized as a set of replacement and new syntax productions, and is phrased in terms of modifications to be made to the body of this International Standard, though the numbers of the affected clauses are not necessarily cited. However, these modifications are applicable only when conforming to this annex.

Renumber original notes 22 through 23 as 25 through 26.

K.2 Definitions

Replace title and text of K.2.2 with:

K.2.2 Definitions related to validity assertions

K.2.2.1 fully-tagged document instance:

A document instance in which a start-tag with a generic identifier, and an end-tag, are present for every element, and the attribute name is present in every attribute specification in the start-tag.

Note 1: An SGML declaration requires document instances to be fully-tagged if it specifies OMITTAG NO and SHORTTAG STARTTAG EMPTY NO and ATTRIB OMITNAME NO. A system should offer means, such as a parameter to the invocation of processing, to request validation of whether an instance is fully-tagged even when the SGML declaration does not require it to be.

K.2.2.2 fully-declared document instance:

A document instance for which, for every markup declaration required for it by the body of this International Standard, either:

  1. the declaration is explicitly present in the instance's associated document type declaration; or
  2. the DTD properties expressed by that declaration are recognized from DTD data entities.

    The document type declaration itself could be an implied declaration, as provided in clause K.4.9.

    Note 2: An SGML declaration requires document instances to be fully-declared if it specifies IMPLYDEF ATTLIST NO and ELEMENT NO ENTITY NO NOTATION NO. A system should offer means, such as a parameter to the invocation of processing, to request validation of whether an instance is fully-declared even when the SGML declaration does not require it to be.

    K.2.2.3 type-valid document instance:

    A document instance that conforms to such markup declarations as are required for it by the body of this International Standard that are present in its associated document type declaration.

K.2.5 Definitions related to DTD notations

Add the following clause:

K.2.5 Definitions related to DTD notations

K.2.5.1 DTD notation:

A data notation that is capable of expressing DTD properties.

Note 1: This International Standard provides markup declarations as the means of expressing DTD properties. It also allows the use of DTD notations, but does not define any.

Note 2: A DTD notation can also be capable of expressing other information. For example, it could express constraints on a document's data content or structure that cannot be expressed by markup declarations but could be validated by an application. However, it is not an SGML error if the document fails to conform to such additional constraints.

Note 3: DTD notations are used in DTD data entities.

K.2.5.2 DTD properties:

Classes and properties that are specifiable by markup declarations. They are defined precisely by the following grove plan:

<!DOCTYPE grovplan PUBLIC "ISO/IEC 10744:1997//DTD Grove
Plan//EN">
<grovplan propset=SGMLProp id=dtdprops>
<title>DTD Properties Grove Plan</title>
<desc>
Classes and properties that are specifiable by DTD declarations. They
are needed to parse and validate a document instance.
</desc>
<inclmod>
prlgabs0 prlgabs1 dtgabs rankabs srabs subdcabs
fpiabs arcabs fsiabs dafeabs gadcabs pelement
</inclmod>
<inclclas>
sgmldoc doctpdcl
</inclclas>
<omitprop classes="sgmldoc">
appinfo epilog
</omitprop>
</grovplan>

K.2.5.3 Parameter entity (4.2.2.5)

An entity that either can be referenced from a markup declaration parameter or that is the document type declaration external subset entity.

K.2.5.4 DTD data entity:

An external parameter entity whose declaration includes a notation name.

K.3 SGML declaration

K.3.2 Version literal

Change the title of K.3.2 to "SGML declaration body".

Replace production 171.1 with:

[171.1] SGML declaration body = ps*,
minimum literal, ps+,
document character set, ps+,
capacity set, ps+,
concrete syntax scope, ps+,
concrete syntax, ps+,
feature use, ps+,
application-specific information,
(ps+, added requirements)?

K.3.5 Markup minimization features

Replace the text of K.3.5 with:

[196] markup minimization features =
"MINIMIZE", ps+,
"DATATAG", ps+, ("NO"|"YES"), ps+,
"OMITTAG", ps+, ("NO"|"YES"), ps+,
"RANK", ps+, ("NO"|"YES"), ps+,
"SHORTTAG", ps+, ("NO"|"YES"|
(start-tag options, ps+, end-tag options, ps+, attribute-options)),
(ps+, empty element ending rules,
ps+, implied default declarations)?

Note ??: Use of an enabled markup minimization feature may be affected by the operation of other provisions of this International Standard, including other markup minimization features.

K.3.5.1 SHORTTAG start-tag options

[196.1] start-tag options =
"STARTTAG", ps+,
"EMPTY", ps+, ("NO"|"YES"), ps+,
"UNCLOSED", ps+, ("NO"|"YES"), ps+,
"NETENABL", ps+, ("NO"|"ALL"|"IMMEDNET")
where
EMPTY YES enables empty start-tags.
UNCLOSED YES enables unclosed start-tags.
NETENABL ALL permits NET-enabling start-tags and NULL end-tags; and
IMMEDNET restricts them to elements with empty syntactic content that
  must be ended by a NET, so it cannot be specified if EMPTYNRM NO is
  specified.

Note 12: An element with empty syntactic content need not have been declared EMPTY.

K.3.5.2 SHORTTAG end-tag options

[196.2] end-tag options =
"ENDTAG", ps+,
"EMPTY", ps+, ("NO"|"YES"), ps+,
"UNCLOSED", ps+, ("NO"|"YES")
where
EMPTY YES enables empty end-tags.
UNCLOSED YES enables unclosed end-tags.

K.3.5.3 SHORTTAG attribute options

[196.3] attribute options =
"ATTRIB", ps+,
"DEFAULT", ps+, ("NO"|"YES"), ps+,
"OMITNAME", ps+, ("NO"|"YES"), ps+,
"VALUE", ps+, ("NO"|"YES")
where
DEFAULT YES enables attribute value defaulting (7.9.1.1).
OMITNAME YES allows attribute names and vi to be omitted for
  unique NMTOKEN values (7.9.1.2).
VALUE YES allows some attribute values to be specified without
  delimiters, rather than as literals (7.9.3.1).

Note 13: DEFAULT NO does not bar default values from attribute definition list declarations.

K.3.6 Empty element ending rules

Replace text of K.3.6 with:

[196.4] empty element ending rules =
"EMPTYNRM", ps+, ("NO"|"YES")
where
EMPTYNRM YES applies normal rules for the presence of end-tags, including
  markup minimization rules, to elements of a type declared EMPTY, or
  that are forced to be EMPTY by an explicit content reference
  attribute (7.3.).

Note 14: Specifying EMPTYNRM YES applies to mandatorily empty elements the same rules about the presence of end-tags that apply to other kinds of elements.

K.3.7 Implicit definitions

Replace title and text of K.3.7 with:

K.3.7 Implied default declarations

[196.5] implied default declarations =
"IMPLYDEF", ps+,
"ATTLIST", ps+, ("NO"|"YES"), ps+,
"DOCTYPE", ps+, ("NO"|"YES"), ps+,
"ELEMENT", ps+, ("NO"|"YES"), ps+,
"ENTITY", ps+, ("NO"|"YES"), ps+,
"NOTATION", ps+, ("NO"|"YES")
where
DOCTYPE YES means that an implied document type declaration includes
  an implied declaration for an external subset entity (see K.4.9); and
YES for the other parameters allows information of the specified type
  to be used in the document instance without an explicit declaration.
  A declaration is implied, as follows:
ATTLIST YES means an undeclared attribute is declared as: CDATA #IMPLIED
ELEMENT YES means an undeclared element type is declared as: - - ANY
ENTITY YES means an undeclared general entity is declared as: SYSTEM
NOTATION YES means an undeclared notation is declared as: SYSTEM

Note 15: IMPLYDEF DOCTYPE YES implies only a declaration for an external subset entity. Other declarations may be implied if needed, and if permitted by the applicable IMPLYDEF parameters.

When IMPLYDEF ENTITY is specified, #DEFAULT cannot be specified as a general entity name.

Implicitly declared definitions occur in the grove in the order in which it is necessary to imply their declarations. No attribute assignment node is created for an attribute unless the attribute is specified.

Note 16: An implied declaration is not necessarily the same as an explicit declaration for the same object that was ignored during parsing (Note 29). Nor does it constrain any explicit declaration that might be created for that object as a result of processing (for example, when generating an explicit DTD for a document that has none).

K.3.8 Other features

Replace text of K.3.8 with:

[198] other features =
"OTHER", ps+,
"CONCUR", ps+, ("NO"|("YES", ps+, number)), ps+,
"SUBDOC", ps+, ("NO"|("YES", ps+, number)), ps+,
"FORMAL", ps+, ("NO"|"YES"),
(urn feature, keeprsre feature, validity feature, entities feature)?

K.3.8.1 Universal Resource Names

[198.1] urn feature =
ps+, "URN", ps+, ("NO"|"YES")
where
URN YES means public identifiers are interpreted according to the
  applicable Internet Engineering Task Force RFC2141 governing
  Universal Resource Names.

If both URN and FORMAL are YES, public identifiers are interpreted either as formal public identifiers or as URNs.

K.3.8.2 White space in content

[198.2] keeprsre feature =
ps+, "KEEPRSRE", ps+, ("YES"|"NO")
where
KEEPRSRE YES means clause 7.6.1 does not apply.

Note 17: If KEEPRSRE YES is specified, all white space in mixed content is included in the grove as datachar nodes.

Note 18: This option does not affect delimited strings, such as attribute value literals, which have their own rules for normalizing white space (and which, in any case, do not occur in content).

K.3.8.3 Assertions

[198.3] validity feature = ps+,
"VALIDITY", ps+, ("NOASSERT"|"TYPE")
where
NOASSERT makes no validity assertion.
TYPE asserts that document instances are type-valid.
If the parameter is omitted, TYPE is assumed.

[198.4] entities feature = ps+,
"ENTITIES", ps+, ("NOASSERT"|
                 ("REF", ps+, ("NONE"|"INTERNAL"|"ANY"), ps+,
                 "INTEGRAL", ps+, ("NO"|"YES")))
where
NOASSERT makes no validity assertion.
REF asserts the document either has unconstrained entity references (ANY),
    is external-reference-free (INTERNAL), or is reference-free (NONE).
INTEGRAL YES asserts document instances are integrally-stored.
If the parameter is omitted, NOASSERT is assumed.

It is a reportable markup error if a document is less constrained than is asserted.

Note 19: For example, if an otherwise conforming document incorrectly asserts that it is integrally-stored, the document is non-conforming. Had the assertion not been made, the document would have been conforming.

Note 20: A system should offer means, such as a parameter to the invocation of processing, to check whether a particular VALIDITY or ENTITIES assertion could correctly be made for a document, even when the SGML declaration does not make that assertion.

Note 21: To satisfy the classical requirement for SGML conformance, a document instance must be fully-declared using only markup declarations, as well as type-valid.

K.4 General

K.4.4 Attribute definitions

Replace last sentence with:

However, a definition associated with ALL can be overridden by subsequent attribute declarations for specific element types or notations (including declarations specified with IMPLICIT), except when the attribute has already been specified. It is therefore an error to specify a data attribute that was declared with ALL and then to attempt to redeclare that attribute.

K.4.4.2 Data specification

Delete trailing comma from production [145.1].

Add note at end of K.4.4.2:

Note 24: An application may wish to verify that the value of a data specification attribute is meaningful in light of the specified notation name and data attributes, but it is not an SGML error if this is not the case.

K.4.5 Implied document type name

Replace text of K.4.5 with:

[111] document type name =
(generic identifier | (rni, "IMPLIED"))
where
IMPLIED means the document element can have any valid element type
name.

It is a reportable markup error if IMPLIED is specified and the start-tag of the document element is omitted or does not contain a generic identifier.

IMPLIED cannot be specified if LINK EXPLICIT YES or CONCUR YES are specified in the SGML declaration.

K.4.8.3 SGML subdocument entity

Change production number [2] to [3].

K.4.9 Parsing without respect to DTD declarations

Replace title and text of K.4.9 with:

K.4.9 Omitted prolog (7.1)

If LINK EXPLICIT NO and CONCUR NO are specified in the SGML declaration and either or both of IMPLYDEF DOCTYPE YES or IMPLYDEF ELEMENT YES are specified, the SGML document entity need not contain a prolog. In such cases, a prolog will be implied consisting solely of a single implied document type declaration, as follows:

Note 27: When both the document type name and the external subset entity are implied in a document type declaration, a system may be able to locate an appropriate external subset by considering the storage identifier of the SGML document entity and/or the generic identifier of the document element.

If permitted by the implied default declarations parameter of the SGML declaration, a document type declaration (whether explicit or implied) may lack declarations for element types, attributes, notations, and/or general entities. Declarations are implied for them as provided in K.3.7.

Note 28: Care should be taken to insure that markup minimization in the instance does not obscure the element structure and cause the parser to misunderstand the intended implied declarations.

Note 29: If some or all of an instance's associated document type declaration should be unavailable or for other reasons ignored during parsing, it is recommended that the constructed grove be the same as if implied declarations had replaced the ignored ones.

K.4.10 DTD data entities

Add new clause:

K.4.10 DTD data entities

K.4.10.1 DTD notations

To the extent that an SGML system can recognize properties expressed by DTD notations, it shall treat them identically to DTD properties that are expressed by markup declarations.

It is a reportable markup error if a DTD entity is referenced and its representation of required DTD properties is not understood by the system and it is not permissible to imply default declarations for them.

Note 4: Therefore, when implied default declarations are permitted, the use of a DTD notation that cannot be understood has the same effect as omitting the equivalent markup declarations.

K.4.10.2 External parameter entity declaration (10.5.5)

The declaration of an external parameter entity can include an entity type that specifies the name of a data notation.

K.4.10.3 External subset entity

Production 110 is replaced with:

[110] document type declaration =
mdo, "DOCTYPE", ps+, document type name,
(ps+, external identifier,
(ps+, ("CDATA"|"NDATA"|"SDATA"), ps+, notation name)?)?,
(ps+, dso, document type declaration subset, dsc)?,
ps*, mdc

The notation name must be declared within the internal subset of the document type declaration.

Note 5: If data attributes are to be specified for a DTD data entity, an external parameter entity must be used rather than an external subset entity.

K.4.11 Content model (11.2.4, 11.2.5)

Add new clause:

K.4.11 Content model (11.2.4, 11.2.5)

[129] primitive content token =
( rni , ("PCDATA" | "ALL" | "IMPLICIT")) | element token | data tag group
where
ALL and IMPLICIT have the same meaning as in K.4.4 and are equivalent
to optional repeatable OR groups.

#ALL and #IMPLICIT are allowed in the name groups in [139] and [140]. They have the same meaning as in K.4.4.

Annex L (informative): Additional Requirements for XML

Replace title and introductory text of Annex L with:

Annex L (informative): Added Requirements for XML

This annex illustrates the relationship of SGML declarations to "added requirements" by means of a real-world example, XML. It is not intended as a specification for XML.

L.1 Application summary

Replace first paragraph of L.1 with:

The Extensible Markup Language (XML) is the core subset of SGML functionality developed by the World Wide Web Consortium (W3C) for exchanging SGML documents over the World Wide Web. The current specification for XML can be found at the W3C web site at "http://www.w3.org/TR/" under the title "Extensible Markup Language (XML).

Replace the one occurrence of "both type-valid" with "fully-declared using only markup declarations, type-valid,".

L.2 SGML Declaration for XML

Replace text of L.2 with:

XML documents implicitly contain an SGML declaration, which differs slightly depending on whether the document is "valid". XML documents that are well-formed but not valid implicitly contain the following SGML declaration.

<!SGML -- SGML Declaration for XML --
     "ISO 8879:1986 (WWW)"
CHARSET
  BASESET
    "ISO Registration Number 176//CHARSET
    ISO/IEC 10646-1:1993 UCS-4 with implementation
    level 3//ESC 2/5 2/15 4/6"
  DESCSET
    0       9       UNUSED
    9       2       9
    11      2       UNUSED
    13      1       13
    14      18      UNUSED
    32      95      32
    127     1       UNUSED
    128     32      UNUSED
    160     55136   160
    55296   2048    UNUSED  -- surrogates --
    57344   8190    57344
    65534   2       UNUSED  -- FFFE and FFFF --
    65536   1048576 65536
CAPACITY NONE

SCOPE DOCUMENT

SYNTAX
  SHUNCHAR NONE
  BASESET "ISO Registration Number 176//CHARSET
          ISO/IEC 10646-1:1993 UCS-4 with implementation
          level 3//ESC 2/5 2/15 4/6"
  DESCSET
    0 1114112 0
  FUNCTION
    RE    13
    RS    10
    SPACE 32
    TAB   SEPCHAR 9
  NAMING
    LCNMSTRT ""
    UCNMSTRT ""
    NAMESTRT
      58 95 192-214 216-246 248-305 308-318 321-328
      330-382 384-451 461-496 500-501 506-535 592-680
      699-705 902 904-906 908 910-929 931-974 976-982
      986 988 990 992 994-1011 1025-1036 1038-1103
      1105-1116 1118-1153 1168-1220 1223-1224
      1227-1228 1232-1259 1262-1269 1272-1273
      1329-1366 1369 1377-1414 1488-1514 1520-1522
      1569-1594 1601-1610 1649-1719 1722-1726
      1728-1742 1744-1747 1749 1765-1766 2309-2361
      2365 2392-2401 2437-2444 2447-2448 2451-2472
      2474-2480 2482 2486-2489 2524-2525 2527-2529
      2544-2545 2565-2570 2575-2576 2579-2600
      2602-2608 2610-2611 2613-2614 2616-2617
      2649-2652 2654 2674-2676 2693-2699 2701
      2703-2705 2707-2728 2730-2736 2738-2739
      2741-2745 2749 2784 2821-2828 2831-2832
      2835-2856 2858-2864 2866-2867 2870-2873 2877
      2908-2909 2911-2913 2949-2954 2958-2960
      2962-2965 2969-2970 2972 2974-2975 2979-2980
      2984-2986 2990-2997 2999-3001 3077-3084
      3086-3088 3090-3112 3114-3123 3125-3129
      3168-3169 3205-3212 3214-3216 3218-3240
      3242-3251 3253-3257 3294 3296-3297 3333-3340
      3342-3344 3346-3368 3370-3385 3424-3425
      3585-3630 3632 3634-3635 3648-3653 3713-3714
      3716 3719-3720 3722 3725 3732-3735 3737-3743
      3745-3747 3749 3751 3754-3755 3757-3758 3760
      3762-3763 3773 3776-3780 3904-3911 3913-3945
      4256-4293 4304-4342 4352 4354-4355 4357-4359
      4361 4363-4364 4366-4370 4412 4414 4416 4428
      4430 4432 4436-4437 4441 4447-4449 4451 4453
      4455 4457 4461-4462 4466-4467 4469 4510 4520
      4523 4526-4527 4535-4536 4538 4540-4546 4587
      4592 4601 7680-7835 7840-7929 7936-7957
      7960-7965 7968-8005 8008-8013 8016-8023 8025
      8027 8029 8031-8061 8064-8116 8118-8124 8126
      8130-8132 8134-8140 8144-8147 8150-8155
      8160-8172 8178-8180 8182-8188 8486 8490-8491
      8494 8576-8578 12295 12321-12329 12353-12436
      12449-12538 12549-12588 19968-40869 44032-55203
    LCNMCHAR ""
    UCNMCHAR ""
    NAMECHAR
      45-46 183 720-721 768-837 864-865 903 1155-1158
      1425-1441 1443-1465 1467-1469 1471 1473-1474
      1476 1600 1611-1618 1632-1641 1648 1750-1764
      1767-1768 1770-1773 1776-1785 2305-2307 2364
      2366-2381 2385-2388 2402-2403 2406-2415
      2433-2435 2492 2494-2500 2503-2504 2507-2509
      2519 2530-2531 2534-2543 2562 2620 2622-2626
      2631-2632 2635-2637 2662-2673 2689-2691 2748
      2750-2757 2759-2761 2763-2765 2790-2799
      2817-2819 2876 2878-2883 2887-2888 2891-2893
      2902-2903 2918-2927 2946-2947 3006-3010
      3014-3016 3018-3021 3031 3047-3055 3073-3075
      3134-3140 3142-3144 3146-3149 3157-3158
      3174-3183 3202-3203 3262-3268 3270-3272
      3274-3277 3285-3286 3302-3311 3330-3331
      3390-3395 3398-3400 3402-3405 3415 3430-3439
      3633 3636-3642 3654-3662 3664-3673 3761
      3764-3769 3771-3772 3782 3784-3789 3792-3801
      3864-3865 3872-3881 3893 3895 3897 3902-3903
      3953-3972 3974-3979 3984-3989 3991 3993-4013
      4017-4023 4025 8400-8412 8417 12293 12330-12335
      12337-12341 12441-12442 12445-12446 12540-12542
  NAMECASE GENERAL NO  ENTITY NO
  DELIM
      GENERAL  SGMLREF
      HCRO     "&#38;#x"
               -- Ampersand followed by "#x" (without quotes) --
      NESTC    "/"
      NET      ">"
      PIC      "?>"
      SHORTREF NONE
  NAMES     SGMLREF
  QUANTITY  NONE
  ENTITIES  "amp" 38 "lt" 60 "gt" 62 "quot" 34 "apos" 39

FEATURES
  MINIMIZE
    DATATAG NO
    OMITTAG NO
    RANK    NO
    SHORTTAG
      STARTTAG EMPTY    NO  UNCLOSED NO  NETENABL IMMEDNET
      ENDTAG   EMPTY    NO  UNCLOSED NO
      ATTRIB   DEFAULT  YES OMITNAME NO  VALUE    NO
    EMPTYNRM YES
    IMPLYDEF ATTLIST  YES DOCTYPE  NO  ELEMENT  YES
             ENTITY   NO  NOTATION YES
  LINK  SIMPLE   NO  IMPLICIT NO  EXPLICIT NO
  OTHER CONCUR   NO  SUBDOC   NO  FORMAL   NO  URN  NO
        KEEPRSRE YES VALIDITY NOASSERT ENTITIES REF ANY  INTEGRAL YES

APPINFO NONE

SEEALSO "ISO 8879:1986//NOTATION
         Extensible Markup Language (XML) 1.0//EN"

>

The SGML declaration for valid XML documents differs in the following parameters.

FEATURES
  MINIMIZE
    IMPLYDEF ATTLIST  NO  ELEMENT  NO  NOTATION NO
  OTHER VALIDITY TYPE