Copyright © 1992, 1997 International Organization for Standardization. All rights reserved.

This electronic document is for use during development and review of International Standards. Official printed copies of International Standards can be purchased from the ISO and the national standards organization of your country.

Next ClausePrevious Clause  

homeParent clauseNext major clausePrevious major clauseNext clause at this levelPrevious clause at this level


A.2 Lexical Type Definition Requirements (LTDR)

A.2.3 HyTime lexical model notation (HyLex)

Subclauses:


This clause specifies the HyTime lexical model notation used in this International Standard and defines some useful instances of it.

A.2.3.1 Syntax

The syntax of a HyLex model is formally defined as that of an SGML content model, as specified in ISO 8879, conforming fully to the concrete syntax of lexical type sets, with the following differences:

  1. The HyLex model must be a model group, without inclusion or exclusion exceptions, but the grouping delimiters can be omitted if there is no occurrence indicator for the entire model group, or if the entire model group consists of a single lexical type name.

    NOTE 385 Grouping delimiters are mandatory for subordinate model groups.

  2. An alternate form of subordinate model group, opened by DSO instead of GRPO, and closed by DSC instead of GRPC, defines a "match token model". When HyLex is used to search data for a lexical pattern, those portions of the data that satisfy match token models are called "match tokens", and are returned as the result of the search. When HyLex is used to prescribe the lexical type of a piece of data, a match token model has no effect other than that of a subordinate model group.

  3. Neither the HyLex model group nor its subordinate model groups can contain an AND connector.

  4. A content token can be a literal as defined in ISO 8879 or the name of a previously declared lexical type.

  5. Any content token or model group can be preceded by a reserved name indicator and the keyword "NOT". A token or model group so qualified will match any character string except those that it would otherwise match.

    NOTE 386 HyLex does not provide a way to limit the number of characters matched by a #NOT-qualified model.

  6. Any content token or model group not preceded by #NOT can be followed by an occurrence indicator, or it may be followed by a reserved name indicator and one or more of the following keywords:

    ORDER

    This keyword and the following lexicographic ordering name specify a lexicographic ordering to be applied to data before matching it against the preceding content token or model group.

    CHECK

    This keyword and the following additional lexical constraint name specify an additional lexical constraint to apply to the data matching the preceding content token or model group. Failing an additional lexical constraint has no effect on the matching process.

    NOTE 387 While a content token or model group may not be directly followed by both an occurrence indicator and qualifiers, if a qualified token or model group is placed within a subordinate model group, the new subordinate model group can itself be followed by an occurrence indicator.

A.2.3.2 Normalized HyLex models

Normalized HyLex models employ a form of markup minimization intended for use when the lexical type to be defined is essentially a list of whitespace separated tokens. A "normalized" model can be converted to an equivalent unnormalized model by performing the following steps:

  1. Insert "#ORDER SGMLCASE" as a qualifier after each literal content token.

  2. Place each content token of the normalized model in its own match token model, including #ORDER and #CHECK qualifiers, but excluding occurrence indicators.

    NOTE 388 For example, the normalized model "(NAME #CHECK ID,NUMBER+)" becomes "([NAME #CHECK ID],[NUMBER]+)".

  3. Insert an "s+" in between each pair of sequential subordinate model groups.

    NOTE 389 For example, the normalized model "(NMTOKEN+,('#ANY',NUMBER)*)" becomes "([NMTOKEN]+,s+,(['#ANY' #ORDER SGMLCASE],s+,[NUMBER])*)"

  4. Replace subordinate model groups followed by PLUS or REP occurrence indicators as follows:

    1. (submodel)+ becomes ((submodel),(s+,(submodel))*)

    2. [submodel]* becomes ([submodel]?,(s+,[submodel])*)

      Note that if the original model was a match token model, the corresponding subordinate model groups in the replacement model are also match token models.

  5. If the HyLex model (i.e. the top level model) is an OR group, turn it into a subordinate model group of a new sequential HyLex model.

  6. Insert an "s*" at the beginning and end of the HyLex model.

NOTE 390 In the conventional comments that define lexical types for attributes and data content in this International Standard, "Lextype" signifies a HyLex model with the normalization attribute specified as "norm", while "Ulextype" signifies an unnormalized HyLex model with the normalization attribute specified as "unorm" (see 5 Notation).

A.2.3.3 Intrinsic lexical types

HyLex is an SGML-aware lexical modeling language. As such, it defines a set of intrinsic SGML lexical types that are automatically available in any HyLex expression.

<!--
This file is identified by the following public identifier:

"ISO/IEC 10744:1997//NONSGML LTDR LEXTYPES SGML Lexical Types//EN"

Unless otherwise specified, all non-model lexical types and
lexicographic orderings are relative to the declared concrete syntax
of the document from which they are referenced.
-->

                <!-- HyTime Lexical Model Notation -->
<!NOTATION HyLex
   PUBLIC "ISO/IEC 10744:1997//NOTATION
           HyTime lexical model notation (HyLex)//EN"
>
<!ATTLIST #NOTATION HyLex
   norm           -- Normalization --
      (norm|unorm)
      norm
>

                <!-- SGML lexicographic orderings -->

<!-- Note: For case-related ordering, the case rules that apply are
     the case rules of the document in which the lexicographic
     ordering (or lexical type that uses the lexicographic ordering)
     is used. -->

<!LEXORD
   SGMLCASE       -- SGML namecase substitution --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXORD Namecase substitution//EN"
>
<!LEXORD
   GENERAL        -- SGML general namecase substitution --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXORD
           General namecase substitution//EN"
>
<!LEXORD
   ENTITY         -- SGML entity namecase substitution --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXORD
           Entity namecase substitution//EN"
>
<!LEXORD
   RCSGENER       -- SGML reference concrete syntax general namecase
                     substitution --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXORD
           Reference concrete syntax general namecase
           substitution//EN"
>

                  <!-- SGML lexical constraints -->

<!LEXCON
   NAMELEN        -- SGML name length constraint --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXCON QUANTITY Name length//EN"
>
<!LEXCON
   PENTLEN        -- SGML parameter entity name length constraint --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXCON
           QUANTITY Parameter entity name length//EN"
>
<!LEXCON
   DTDORLPD       -- SGML DTD or LPD name --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXCON DTD or LPD name//EN"
>
<!LEXCON
   NOTATION       -- SGML Notation name --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXCON Notation name//EN"
>
<!LEXCON
   PARMENT        -- SGML Parameter entity name --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXCON Parameter entity name//EN"
>
<!LEXCON
   ENTITY         -- SGML General entity name --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXCON General entity name//EN"
>
<!LEXCON
   GI             -- SGML Generic identifier --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXCON Generic Identifier//EN"
>
<!LEXCON
   ID             -- SGML Unique identifier --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXCON Unique Identifier//EN"
>
<!LEXCON
   ATTNAME        -- SGML Attribute name --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXCON Attribute name//EN"
>
<!LEXCON
   compname       -- Property set component name --

   SPEC
   PUBLIC "ISO/IEC 10744:1997//NOTATION LEXCON
           Property Set Component Name//EN"
>

                     <!-- SGML Lexical Types -->

<!LEXTYPE
   char           -- Character --

   SPEC
   PUBLIC "ISO/IEC 10744:1997//NOTATION LEXTYPE Character//EN"
>

               <!-- SGML abstract character classes -->

<!LEXTYPE
   Digit          -- SGML digit --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE CLASS Digits (Digit)//EN"
>
<!LEXTYPE
   LCLetter       -- SGML lower-case letter --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Lower-case letters (LCLetter)//EN"
>
<!LEXTYPE
   Special        -- SGML special character --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Special minimum data characters (Special)//EN"
>
<!LEXTYPE
   UCLetter       -- SGML upper-case letter --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Upper-case letters (UCLetter)//EN"
>

               <!-- SGML concrete character classes -->

<!LEXTYPE
   NONSGML        -- Non-SGML characters --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Non-SGML characters (NONSGML)//EN"
>
<!LEXTYPE
   DATACHAR       -- SGML dedicated data characters --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Dedicated data characters (DATACHAR)//EN"
>
<!LEXTYPE
   DELMCHAR       -- SGML delimiter characters --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Delimiter characters (DELMCHAR)//EN"
>
<!LEXTYPE
   FUNCHAR        -- SGML inert function characters --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Inert function characters (FUNCHAR)//EN"
>
<!LEXTYPE
   LCNMCHAR       -- SGML lower-case name characters --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Lower-case name characters (LCNMCHAR)//EN"
>
<!LEXTYPE
   LCNMSTRT       -- SGML lower-case name start characters --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Lower-case name start characters (LCNMSTRT)//EN"
>
<!LEXTYPE
   MSICHAR        -- SGML markup-scan-in-characters --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Markup-scan-in-characters (MSICHAR)//EN"
>
<!LEXTYPE
   MSOCHAR        -- SGML markup-scan-out-characters --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Markup-scan-out-characters (MSOCHAR)//EN"
>
<!LEXTYPE
   MSSCHAR        -- SGML markup-scan-suppress characters --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Markup-scan-suppress characters (MSSCHAR)//EN"
>
<!LEXTYPE
   RE             -- SGML record end character --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Record end character (RE)//EN"
>
<!LEXTYPE
   RS             -- SGML record start character --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Record start character (RS)//EN"
>
<!LEXTYPE
   SEPCHAR        -- SGML separator characters --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Separator characters (SEPCHAR)//EN"
>
<!LEXTYPE
   SPACE          -- SGML space character --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Space character (SPACE)//EN"
>
<!LEXTYPE
   UCNMCHAR       -- SGML upper-case name characters --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Upper-case name characters (UCNMCHAR)//EN"
>
<!LEXTYPE
   UCNMSTRT       -- SGML upper-case name start characters --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           CLASS Upper-case name start characters (UCNMSTRT)//EN"
>

                       <!-- SGML delimiters -->

<!LEXTYPE
   AND            -- SGML and connector --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER And connector (AND)//EN"
>
<!LEXTYPE
   COM            -- SGML comment start or end --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Comment start or end (COM)//EN"
>
<!LEXTYPE
   CRO            -- SGML character reference open --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Character reference open (CRO)//EN"
>
<!LEXTYPE
   DSC            -- SGML character reference open --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Declaration subset close (DSC)//EN"
>
<!LEXTYPE
   DSO            -- SGML declaration subset open --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Declaration subset open (DSO)//EN"
>
<!LEXTYPE
   DTGC           -- SGML data tag group close --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Data tag group close (DTGC)//EN"
>
<!LEXTYPE
   DTGO           -- SGML data tag group open --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Data tag group open (DTGO)//EN"
>
<!LEXTYPE
   ERO            -- SGML entity reference open --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Entity reference open (ERO)//EN"
>
<!LEXTYPE
   ETAGO          -- SGML end-tag open --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER End-tag open (ETAGO)//EN"
>
<!LEXTYPE
   GRPC           -- SGML group close --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Group close (GRPC)//EN"
>
<!LEXTYPE
   GRPO           -- SGML group open --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Group open (GRPO)//EN"
>
<!LEXTYPE
   LIT            -- SGML literal start or end --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Literal start or end (LIT)//EN"
>
<!LEXTYPE
   LITA           -- SGML literal start or end (alternative) --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Literal start or end (alternative) (LITA)//EN"
>
<!LEXTYPE
   MDC            -- SGML markup declaration close --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Markup declaration close (MDC)//EN"
>
<!LEXTYPE
   MDO            -- SGML markup declaration open --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Markup declaration open (MDO)//EN"
>
<!LEXTYPE
   MINUS          -- SGML exclusion --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Exclusion (MINUS)//EN"
>
<!LEXTYPE
   MSC            -- SGML marked section close --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Marked section close (MSC)//EN"
>
<!LEXTYPE
   NET            -- SGML null end-tag --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Null end-tag (NET)//EN"
>
<!LEXTYPE
   OPT            -- SGML optional occurrence indicator --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Optional occurrence indicator (OPT)//EN"
>
<!LEXTYPE
   OR             -- SGML or connector --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Or connector (OR)//EN"
>
<!LEXTYPE
   PERO           -- SGML parameter entity reference open --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Parameter entity reference open (PERO)//EN"
>
<!LEXTYPE
   PIC            -- SGML processing instruction close --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Processing instruction close (PIC)//EN"
>
<!LEXTYPE
   PIO            -- SGML processing instruction open --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Processing instruction open (PIO)//EN"
>
<!LEXTYPE
   PLUS           -- SGML required and repeatable; inclusion --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Required and repeatable; inclusion (PLUS)//EN"
>
<!LEXTYPE
   REFC           -- SGML reference close --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Reference close (REFC)//EN"
>
<!LEXTYPE
   REP            -- SGML optional and repeatable --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Optional and repeatable (REP)//EN"
>
<!LEXTYPE
   RNI            -- SGML reserved name indicator --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Reserved name indicator (RNI)//EN"
>
<!LEXTYPE
   SEQ            -- SGML sequence connector --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Sequence connector (SEQ)//EN"
>
<!LEXTYPE
   SHORTREF       -- SGML short reference --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Short reference (SHORTREF)//EN"
>
<!LEXTYPE
   STAGO          -- SGML start-tag open --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Start-tag open (STAGO)//EN"
>
<!LEXTYPE
   TAGC           -- SGML tag close --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Tag close (TAGC)//EN"
>
<!LEXTYPE
   VI             -- SGML value indicator --

   SPEC
   PUBLIC "ISO 8879:1986//NOTATION LEXTYPE
           DELIMITER Value indicator (VI)//EN"
>

                 <!-- SGML modeled lexical types -->

<!LEXTYPE
   s              -- SGML "S" separator --

   "RE|RS|SEPCHAR|SPACE"
   HyLex [unorm]
>
<!LEXTYPE
   mindata        -- SGML minimum data --

   "(Digit|LCLetter|RE|RS|SPACE|Special|UCLetter)+"
   HyLex [unorm]
>
<!LEXTYPE
   nmchar         -- SGML name character --

   "Digit|LCNMCHAR|UCNMCHAR|nmstrt"
   HyLex [unorm]
>
<!LEXTYPE
   nmstrt         -- SGML name start character --

   "LCLetter|LCNMSTRT|UCLetter|UCNMSTRT"
   HyLex [unorm]
>
<!LEXTYPE
   csname         -- Case-sensitive name --

   "nmstrt,nmchar*"
   HyLex [unorm]
>
<!LEXTYPE
   NAME           -- SGML name --

   #ORDER GENERAL
   #CHECK NAMELEN

   "csname"
   HyLex [unorm]
>
<!LEXTYPE
   NAMES          -- SGML names --

   "NAME+"
   HyLex
>
<!LEXTYPE
   NUMBER         -- SGML number --

   #ORDER GENERAL
   #CHECK NAMELEN

   "Digit+"
   HyLex [unorm]
>
<!LEXTYPE
   NUMBERS        -- SGML numbers --

   "NUMBER+"
   HyLex
>
<!LEXTYPE
   NMTOKEN        -- SGML name token --

   #ORDER GENERAL
   #CHECK NAMELEN

   "nmchar+"
   HyLex [unorm]
>
<!LEXTYPE
   NMTOKENS       -- SGML name tokens --

   "NMTOKEN+"
   HyLex
>
<!LEXTYPE
   NUTOKEN        -- SGML number token --

   #ORDER GENERAL
   #CHECK NAMELEN

   "Digit,nmchar*"
   HyLex [unorm]
>
<!LEXTYPE
   NUTOKENS       -- SGML number tokens --

   "NUTOKEN+"
   HyLex
>

                <!-- SGML namespace lexical types -->

<!LEXTYPE
   ATTNAME        -- SGML attribute name --

   #CHECK ATTNAME

   "NAME"
   HyLex
>
<!LEXTYPE
   DTDORLPD       -- SGML document type or link type --

   #CHECK DTDORLPD

   "NAME"
   HyLex
>
<!LEXTYPE
   ENTITY         -- SGML entity name --

   #ORDER ENTITY
   #CHECK NAMELEN
   #CHECK ENTITY

   "nmstrt,nmchar*"
   HyLex [unorm]
>
<!LEXTYPE
   ENTITIES       -- SGML entity names --

   "ENTITY+"
   HyLex
>
<!LEXTYPE
   GI             -- SGML generic identifier --

   #CHECK GI

   "NAME"
   HyLex
>
<!LEXTYPE
   IDREF          -- SGML unique identifier reference --

   #CHECK ID

   "NAME"
   HyLex
>
<!LEXTYPE
   IDREFS         -- SGML unique identifier references --

   "IDREF+"
   HyLex
>
<!LEXTYPE
   NOTATION       -- SGML notation name --

   #CHECK NOTATION

   "NAME"
   HyLex
>
<!LEXTYPE
   PARMENT        -- SGML parameter entity name --

   #ORDER ENTITY
   #CHECK PENTLEN
   #CHECK PARMENT

   "nmstrt,nmchar*"
   HyLex [unorm]
>
<!LEXTYPE
   PENTITY        -- SGML parameter entity name prefixed by PERO --

   "PERO,PARMENT"
   HyLex [unorm]
>
<!LEXTYPE
   compname       -- Property set component name --

   #CHECK compname

   "NAME"
   HyLex [unorm]
>
<!LEXTYPE
   cnmlist        -- Property set component names --

   "compname+"
   HyLex
>

                  <!-- Other SGML lexical types -->

<!LEXTYPE
   fsi            -- Formal System Identifier --

   SPEC
   PUBLIC "ISO/IEC 10744:1997//NOTATION LEXTYPE
           Formal System Identifier//EN"
>
<!LEXTYPE
   literal        -- SGML literal --

   "(LIT,[#NOT LIT],LIT)|(LITA,[#NOT LITA],LITA)"
   HyLex [unorm]
>
<!LEXTYPE
   attspecs       -- Attribute specifications --

   '(NAME,"=",(NMTOKEN|literal))*'
   HyLex
>

Next ClausePrevious Clause  

Copyright © 1992, 1997 International Organization for Standardization. All rights reserved.

This electronic document is for use during development and review of International Standards. Official printed copies of International Standards can be purchased from the ISO and the national standards organization of your country.


HTML generated from the original SGML source using a DSSSL style specification and the SGML output back-end of the JADE DSSSL engine.