Copyright © 1992, 1997 International Organization for Standardization. All rights reserved.

This electronic document is for use during development and review of International Standards. Official printed copies of International Standards can be purchased from the ISO and the national standards organization of your country.

Next ClausePrevious Clause  

homeParent clauseNext major clausePrevious major clauseNext clause at this levelPrevious clause at this level


A.6 Formal System Identifier Definition Requirements (FSIDR)

A.6.5 Storage manager attribute definitions

Subclauses:


The designer of a storage object specification can optionally associate an attribute definition list declaration with the notion declaration used to identify the storage manager processor. The attributes are used to specify parameters to the storage access, in addition to the storage object identifier.

A starter set of standardized storage manager attributes is defined in these FSIDR requirements.

NOTE 506 Some storage managers might choose to include information represented by one or more of these attributes in its SOI syntax, in which case the attribute should not be defined for that SM.

NOTE 507 As with all SGML notations, it is possible to declare "data attributes" that can be passed as parameters to the notation handler. For auxiliary process notations, however, only the default value of the attributes can be specified; there is no way to specify attributes for individual uses of the notation. Therefore, if alternative sets of parameters are needed for a particular auxiliary process, a different notation should be declared for each set, but with the same external identifier. The several names will then invoke the identical auxiliary process, but with varying parameter sets.

A.6.5.1 Record-related attributes

The attribute form record-processing attributes (records) consists of attributes that control how the entity data is interpreted as records or lines.

The attribute record boundary indicator (records) identifies the character or characters that are interpreted as a record boundary in an SGML entity.

The keyword "ASIS" means that no attempt will be made to interpret the input as consisting of records. This keyword is used either because the entity body is to be read as blocks instead of characters (e.g. for data entities), or because the storage object already contains the record boundary characters required by the concrete syntax.

If it is known that one of the four line terminator conventions ("LF", "CR", "CRLF" and "LFCR" ) is used, it can be specified directly. Otherwise, "FIND" can be specified and the first of the four found in the storage object (if any) will be treated as the record boundary indicator for the rest of that storage object (unless the system uses RMS). If none is found, it is the equivalent of specifying "ASIS".

The keyword "RMS" stands for "Record Management System", wherein the storage manager recognizes record boundaries by means other than a character sequence (typically, it knows the length of each record).

When records are recognized in a storage object, a record start is inserted at the beginning of each record, and a record end at the end of each record. If there is a partial record (a record that doesn't end with the line terminator) at the end of the entity, then a record start will be inserted before it but no record end will be inserted after it.

NOTE 508 The literal SM can be used to insert a trailing record end, if desired.

The attribute record tracking (tracking) specifies whether the entity manager must include record count information in messages.

NOTE 509 Some implementations could improve performance by not tracking records, particularly in very large storage objects.

                <!-- Record-processing Attributes -->
<!attlist #NOTATION
-- records --     -- Record-processing attributes --
                  -- Clause: A.6.5.1 --
   #ALL

   records        -- Record boundary recognition --
      (asis|crlf|cr|find|lfcr|lf|rms)
      #IMPLIED    -- Default: find, except "asis" for NDATA entities
                     and those whose FSIs are in storage objects with
                     records=asis. --

   tracking       -- Record boundary tracking in messages --
                  -- Constraint: SGML entities only --
      (track|notrack)
      track
>

A.6.5.2 Encoding-related attributes

The attribute form entity encoding specification (encoding) consists of attributes that specify the encoding method used for the entity.

When an entity is stored, the characters of the entity must be converted from their internal representation to a representation as a sequence of storage octets. Since the internal representation of characters is system-dependent, this conversion is not specified directly. Instead a mapping from characters to octet sequences, known as an "encoding", is specified. The storage manager determines the conversion algorithm using the specified encoding together with its inherent knowledge of the internal representation used for characters.

The encoding can be specified in one of two ways:

  1. It may be specified completely using the encoding attribute. When so specified, the encoding is independent of the document character set of the document in which the entity is used.

  2. It may be specified partially by the attribute bit combination transformation format (bctf), which identifies an algorithm for mapping from fixed-size bit combinations to the octet sequences of a storage object. In this case the encoding is the combination of

    1. the mapping from characters to fixed-size bit combinations determined by the document character set, with

    2. the mapping from fixed-size bit combinations to octet sequences specified by the BCTF.

When an entity is read, the inverse of the encoding mapping is used.

NOTE 510 When a storage object is in a container its "octets" are actually the characters of the container entity. In such cases, the encoding or bctf attributes are normally specified for either the container or the contained object, but not both.

                <!-- Entity Encoding Specification -->
<!attlist #NOTATION
-- encoding --    -- Entity encoding specification --
                  -- Clause: A.6.5.2 --
   #ALL

   encoding       -- Encoding --
                  -- Constraint: at most one of encoding and bctf may
                     be specified --
                  -- Constraint: SGML, CDATA, SDATA entities only --
      NAME        -- Constraint: registered value notation --
      #IMPLIED    -- Default: same unless bctf is specified --

   bctf           -- Bit combination transformation format --
                  -- Constraint: SGML, CDATA, SDATA entities only --
                  -- Constraint: at most one of encoding and bctf may
                     be specified --
      NAME        -- Constraint: registered value notation --
      #IMPLIED    -- Default: storage object does not have
                     document-character-set-dependent character set --
>

A.6.5.2.1 Encoding notations

A starter set of notations for encodings is defined in this International Standard. The notations are:

ucs-2

This encoding is the two-octet BMP form of coded representation of the Universal Multiple-Octet Coded Character Set defined in ISO/IEC 10646.

ucs-4

This encoding is the four-octet canonical form of coded representation of the Universal Multiple-Octet Coded Character Set defined in ISO/IEC 10646.

utf-8

This encoding is the UCS Transformation Format 8 defined in Annex P of Amendment 1 to PDAM 1 of ISO/IEC 10646-1:1993. This encodes a character in the repertoire of ISO/IEC 10646 using between 1 and 6 octets.

utf-16

This encoding is the UCS Transformation Format 16 defined in Annex Q of Amendment 1 to ISO/IEC 10646-1:1993.

utf-7

This encoding is the UCS Transformation Format 7 encoding defined by IETF RFC 1642. This can be used to encode characters in the Basic Multilingual Plane of ISO/IEC 10646.

unicode

This represents each character in the Basic Multilingual Plane of ISO/IEC 10646 by two octets. The bytes representing the entire storage object may be preceded by a pair of bytes representing the byte order mark character (0xFEFF). The bytes representing each bit combination are in the system byte order, unless the byte order mark character is present, in which case the order of its bytes determines the byte order. When the storage object is read, any byte order mark character is discarded.

euc-jp

This encoding is the Extended UNIX Code Packed Format for the Japanese registered Internet character set.

sjis

This is the Shift JIS registered Internet character set.

is8859-N

where N can be any single digit other than 0. This encodes a character from ISO 8859-N with a single octet.

same

The encoding of this storage object is the same as the encoding of the storage object in which the SOS of this storage object is specified.

                     <!-- Encoding Notations -->
            <!-- THIS IS A NON-MANDATORY STARTER SET. -->
<!notation
   UCS-2          -- UCS-2 encoding --
                  -- This encoding is the two-octet BMP form of coded
                     representation of the Universal Multiple-Octet
                     Coded Character Set defined in ISO/IEC 10646. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           UCS-2 Encoding//EN"
>
<!notation
   UCS-4          -- UCS-4 encoding --
                  -- This encoding is the four-octet canonical form of
                     coded representation of the Universal
                     Multiple-Octet Coded Character Set defined in
                     ISO/IEC 10646. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           UCS-4 Encoding//EN"
>
<!notation
   UTF-8          -- UTF-8 encoding --
                  -- This encoding is the UCS Transformation Format 8
                     defined in Annex P to PDAM 1 of ISO/IEC
                     10646-1:1993.  This encodes a character in the
                     repertoire of ISO/IEC 10646 using between 1 and 6
                     octets. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           UTF-8 Encoding//EN"
>
<!notation
   UTF-16         -- UTF-16 encoding --
                  -- This encoding is the UCS Transformation Format 16
                     defined in Annex Q of Amendment 1 to ISO/IEC
                     10646-1:1993. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           UTF-16 Encoding//EN"
>
<!notation
   UTF-7          -- UTF-7 encoding --
                  -- This encoding is the UCS Transformation Format 7
                     encoding defined by IETF RFC 1642.  This can be
                     used to encode characters in the Basic
                     Multilingual Plane of ISO/IEC 10646. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           UTF-7 Encoding//EN"
>
<!notation
   UNICODE        -- UNICODE encoding --
                  -- This represents each character in the Basic
                     Multilingual Plane of ISO/IEC 10646 by two
                     octets. The bytes representing the entire storage
                     object may be preceded by a pair of bytes
                     representing the byte order mark character
                     (0xFEFF). The bytes representing each bit
                     combination are in the system byte order, unless
                     the byte order mark character is present, in
                     which case the order of its bytes determines the
                     byte order. When the storage object is read, any
                     byte order mark character is discarded. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           UNICODE Encoding//EN"
>
<!notation
   EUC-JP         -- EUC-JP encoding --
                  -- This encoding is the Extended UNIX Code Packed
                     Format for the Japanese registered Internet
                     character set. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           EUC-JP Encoding//EN"
>
<!notation
   SJIS           -- SJIS encoding --
                  -- This is the Shift JIS registered Internet
                     character set. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           SJIS Encoding//EN"
>
<!notation
   IS8859-1       -- ISO8859-1 encoding --
                  -- This encodes a character from ISO 8859-1 with a
                     single octet. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           ISO8859-1 Encoding//EN"
>
<!notation
   IS8859-2       -- ISO8859-2 encoding --
                  -- This encodes a character from ISO 8859-2 with a
                     single octet. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           ISO8859-2 Encoding//EN"
>
<!notation
   IS8859-3       -- ISO8859-3 encoding --
                  -- This encodes a character from ISO 8859-3 with a
                     single octet. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           ISO8859-3 Encoding//EN"
>
<!notation
   IS8859-4       -- ISO8859-4 encoding --
                  -- This encodes a character from ISO 8859-4 with a
                     single octet. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           ISO8859-4 Encoding//EN"
>
<!notation
   IS8859-5       -- ISO8859-5 encoding --
                  -- This encodes a character from ISO 8859-5 with a
                     single octet. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           ISO8859-5 Encoding//EN"
>
<!notation
   IS8859-6       -- ISO8859-6 encoding --
                  -- This encodes a character from ISO 8859-6 with a
                     single octet. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           ISO8859-6 Encoding//EN"
>
<!notation
   IS8859-7       -- ISO8859-7 encoding --
                  -- This encodes a character from ISO 8859-7 with a
                     single octet. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           ISO8859-7 Encoding//EN"
>
<!notation
   IS8859-8       -- ISO8859-8 encoding --
                  -- This encodes a character from ISO 8859-8 with a
                     single octet. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           ISO8859-8 Encoding//EN"
>
<!notation
   IS8859-9       -- ISO8859-9 encoding --
                  -- This encodes a character from ISO 8859-9 with a
                     single octet. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           ISO8859-9 Encoding//EN"
>
<!notation
   SAME           -- Same encoding --
                  -- The encoding of this storage object is the same
                     as the encoding of the storage object in which
                     the SOS of this storage object is specified. --

   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
           No change in encoding//EN"
>

A.6.5.2.2 BCTF algorithm notations

A starter set of notations for BCTF algorithms is defined in this International Standard. The notations are:

identity

Each bit combination is represented by a single octet; this BCTF can be used only for storage objects all of whose bit combinations have a value not exceeding 255.

fixed-2

Each bit combination is represented by exactly 2 octets, with the more significant octet first; this BCTF can be used only for storage objects all of whose bit combinations have a value not exceeding 65,535.

fixed-3

Each bit combination is represented by exactly 3 octets, with a more significant octet preceding any less significant octets; this BCTF can be used only for storage objects all of whose bit combinations have a value not exceeding 16,777,215.

fixed-4

Each bit combination is represented by exactly 4 octets, with a more significant octet preceding any less significant octets; this BCTF can be used only for storage objects all of whose bit combinations have a value not exceeding 4,294,967,295.

                  <!-- BCTF Algorithm Notations -->
            <!-- THIS IS A NON-MANDATORY STARTER SET. -->
<!NOTATION identity
   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR BCTF
           IDENTITY BCTF Algorithm//EN"
>
<!NOTATION fixed-2
   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR BCTF
           FIXED-2  BCTF Algorithm//EN"
>
<!NOTATION fixed-3
   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR BCTF
           FIXED-3  BCTF Algorithm//EN"
>
<!NOTATION fixed-4
   PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR BCTF
           FIXED-4  BCTF Algorithm//EN"
>

A.6.5.3 Common storage manager attributes

The attribute form common storage manager attributes (smcommon) consists of attributes that control common aspects of storage object access and SOS interpretation.

The attribute occupied extents (extents) is a marklist notation dimlist that specifies the extents of the storage object occupied by the entity. Multiple dimension specifications can be specified if the entity is segmented and distributed in several locations within the storage object.

NOTE 511 For example, this technique can be used to interleave the bodies of entities that are accessed concurrently.

The attribute zap end-of-file (zapeof) specifies whether the trailing octet sequence of a storage object should be excluded from the entity when it represents an end-of-file.

NOTE 512 Note that this is different functionality from the extents attribute because with zapeof the user doesn't have to know whether the trailing octets of a particular storage object were in fact the end-of-file control (e.g. the control-Z octet in DOS). Zapeof should normally be specified for DOS (this is the behavior of the standard C libraries with respect to text files).

The attribute storage manager character reference delimiter (smcrd) is a single character that delimits a reference to a character in the inherent character set of the storage manager (which may be different from the character set of the document in which the FSI occurs). The smcrd is recognized only in an SOI or attribute value, and only if followed by a decimal digit. The smcrd character, together with following decimal digits and an optional semicolon, are replaced by the character with the specified number in the inherent character set of the storage manager.

The attribute compression information (compress) can be used to identify a compression/decompression process, declared as a notation.

The attribute encryption information (encrypt) can be used to identify an encryption/decryption process, declared as a notation.

NOTE 513 It may be considered poor security practice to include this information, or even to indicate that the storage object is encrypted.

The attribute integrity information (seal) can be used to identify a verification process, declared as a notation. One registered value for this attribute is provided in the starter set, md5, the standard form of sealing used for transmission of documents over the Internet.

The attribute base for relative SOI (SOIbase) allows an alternative relative base to be specified. Its value must be an SOI for the SM of the specified storage object. SOIbase can be declared for any storage manager that allows a relative SOI.

NOTE 514 Using an SOIbase attribute may not be equivalent to prepending a string to the SOI for a storage manager that does searching. For example, with an FSI

<osfile soibase=entities>foo.sgm</osfile>
A storage manager might look first for a file entities/foo.sgm and then, if that did not exist, for a file pubtext/foo.sgm, whereas with
<osfile>entities/foo.sgm</osfile>
it might look first for a file entities/foo.sgm but then for a file pubtext/entities/foo.sgm.

              <!-- Common Storage Manager Attributes -->
<!attlist #NOTATION
-- smcommon --    -- Common storage manager attributes --
                  -- Clause: A.6.5.3 --
   #ALL

   extents        -- Dimensions of occupied extents of object --
                  -- Constraint: applies to storage octets; quantum is
                     an octet --
      CDATA       -- Lextype: (marker,marker)+ --
                  -- Constraint: interpreted as HyTime dimlist --
      "1 -1"      -- Default: entire object --

   zapeof         -- Zap end-of-file --
      (zapeof|nozapeof)
      zapeof

   smcrd          -- Storage manager character reference delimiter --
      CDATA       -- Lextype: char --
      #IMPLIED    -- Default: none --

   compress       -- Compression information --
      CDATA       -- Constraint: registered value notation --
      #IMPLIED    -- Default: none --

   encrypt        -- Encryption information --
      CDATA       -- Constraint: registered value notation --
      #IMPLIED    -- Default: none --

   seal           -- Integrity information --
      CDATA       -- Constraint: registered value notation --
      #IMPLIED    -- Default: none --

   SOIbase        -- Base for relative SOI --
      CDATA       -- Constraint: It is an SOI for this SM --
      #IMPLIED    -- Default: current storage object if it has the
                     same SM, else none --
>
<!notation md5
   PUBLIC "-//IETF/RFC1544//NOTATION FSIDR SEAL
           Content-MD5 Header Field//EN"
>

Next ClausePrevious Clause  

Copyright © 1992, 1997 International Organization for Standardization. All rights reserved.

This electronic document is for use during development and review of International Standards. Official printed copies of International Standards can be purchased from the ISO and the national standards organization of your country.


HTML generated from the original SGML source using a DSSSL style specification and the SGML output back-end of the JADE DSSSL engine.