![]() | ![]() | Copyright © 1992, 1997 International Organization for Standardization. All rights reserved. This electronic document is for use during development and review of International Standards. Official printed copies of International Standards can be purchased from the ISO and the national standards organization of your country. | ||
| Next Clause | Previous Clause | |||
A.6 Formal System Identifier Definition
Requirements (FSIDR)
The designer of a storage object specification can optionally associate an attribute definition list declaration with the notion declaration used to identify the storage manager processor. The attributes are used to specify parameters to the storage access, in addition to the storage object identifier.
A starter set of standardized storage manager attributes is defined in these FSIDR requirements.
NOTE 506 Some storage managers might choose to include information represented by one or more of these attributes in its SOI syntax, in which case the attribute should not be defined for that SM.
NOTE 507 As with all SGML notations, it is possible to declare "data attributes" that can be passed as parameters to the notation handler. For auxiliary process notations, however, only the default value of the attributes can be specified; there is no way to specify attributes for individual uses of the notation. Therefore, if alternative sets of parameters are needed for a particular auxiliary process, a different notation should be declared for each set, but with the same external identifier. The several names will then invoke the identical auxiliary process, but with varying parameter sets.
The attribute form record-processing attributes (records) consists of attributes that control how the entity data is interpreted as records or lines.
The attribute record boundary indicator (records) identifies the character or characters that are interpreted as a record boundary in an SGML entity.
The keyword "ASIS" means that no attempt will be made to interpret the input as consisting of records. This keyword is used either because the entity body is to be read as blocks instead of characters (e.g. for data entities), or because the storage object already contains the record boundary characters required by the concrete syntax.
If it is known that one of the four line terminator conventions ("LF", "CR", "CRLF" and "LFCR" ) is used, it can be specified directly. Otherwise, "FIND" can be specified and the first of the four found in the storage object (if any) will be treated as the record boundary indicator for the rest of that storage object (unless the system uses RMS). If none is found, it is the equivalent of specifying "ASIS".
The keyword "RMS" stands for "Record Management System", wherein the storage manager recognizes record boundaries by means other than a character sequence (typically, it knows the length of each record).
When records are recognized in a storage object, a record start is inserted at the beginning of each record, and a record end at the end of each record. If there is a partial record (a record that doesn't end with the line terminator) at the end of the entity, then a record start will be inserted before it but no record end will be inserted after it.
NOTE 508 The literal SM can be used to insert a trailing record end, if desired.
The attribute record tracking (tracking) specifies whether the entity manager must include record count information in messages.
NOTE 509 Some implementations could improve performance by not tracking records, particularly in very large storage objects.
<!-- Record-processing Attributes -->
<!attlist #NOTATION
-- records -- -- Record-processing attributes --
-- Clause: A.6.5.1 --
#ALL
records -- Record boundary recognition --
(asis|crlf|cr|find|lfcr|lf|rms)
#IMPLIED -- Default: find, except "asis" for NDATA entities
and those whose FSIs are in storage objects with
records=asis. --
tracking -- Record boundary tracking in messages --
-- Constraint: SGML entities only --
(track|notrack)
track
>The attribute form entity encoding specification (encoding) consists of attributes that specify the encoding method used for the entity.
When an entity is stored, the characters of the entity must be converted from their internal representation to a representation as a sequence of storage octets. Since the internal representation of characters is system-dependent, this conversion is not specified directly. Instead a mapping from characters to octet sequences, known as an "encoding", is specified. The storage manager determines the conversion algorithm using the specified encoding together with its inherent knowledge of the internal representation used for characters.
The encoding can be specified in one of two ways:
It may be specified completely using the encoding attribute. When so specified, the encoding is independent of the document character set of the document in which the entity is used.
It may be specified partially by the attribute bit combination transformation format (bctf), which identifies an algorithm for mapping from fixed-size bit combinations to the octet sequences of a storage object. In this case the encoding is the combination of
the mapping from characters to fixed-size bit combinations determined by the document character set, with
the mapping from fixed-size bit combinations to octet sequences specified by the BCTF.
When an entity is read, the inverse of the encoding mapping is used.
NOTE 510 When a storage object is in a container its "octets" are actually the characters of the container entity. In such cases, the encoding or bctf attributes are normally specified for either the container or the contained object, but not both.
<!-- Entity Encoding Specification -->
<!attlist #NOTATION
-- encoding -- -- Entity encoding specification --
-- Clause: A.6.5.2 --
#ALL
encoding -- Encoding --
-- Constraint: at most one of encoding and bctf may
be specified --
-- Constraint: SGML, CDATA, SDATA entities only --
NAME -- Constraint: registered value notation --
#IMPLIED -- Default: same unless bctf is specified --
bctf -- Bit combination transformation format --
-- Constraint: SGML, CDATA, SDATA entities only --
-- Constraint: at most one of encoding and bctf may
be specified --
NAME -- Constraint: registered value notation --
#IMPLIED -- Default: storage object does not have
document-character-set-dependent character set --
>A starter set of notations for encodings is defined in this International Standard. The notations are:
This encoding is the two-octet BMP form of coded representation of the Universal Multiple-Octet Coded Character Set defined in ISO/IEC 10646.
This encoding is the four-octet canonical form of coded representation of the Universal Multiple-Octet Coded Character Set defined in ISO/IEC 10646.
This encoding is the UCS Transformation Format 8 defined in Annex P of Amendment 1 to PDAM 1 of ISO/IEC 10646-1:1993. This encodes a character in the repertoire of ISO/IEC 10646 using between 1 and 6 octets.
This encoding is the UCS Transformation Format 16 defined in Annex Q of Amendment 1 to ISO/IEC 10646-1:1993.
This encoding is the UCS Transformation Format 7 encoding defined by IETF RFC 1642. This can be used to encode characters in the Basic Multilingual Plane of ISO/IEC 10646.
This represents each character in the Basic Multilingual Plane of ISO/IEC 10646 by two octets. The bytes representing the entire storage object may be preceded by a pair of bytes representing the byte order mark character (0xFEFF). The bytes representing each bit combination are in the system byte order, unless the byte order mark character is present, in which case the order of its bytes determines the byte order. When the storage object is read, any byte order mark character is discarded.
This encoding is the Extended UNIX Code Packed Format for the Japanese registered Internet character set.
This is the Shift JIS registered Internet character set.
where N can be any single digit other than 0. This encodes a character from ISO 8859-N with a single octet.
The encoding of this storage object is the same as the encoding of the storage object in which the SOS of this storage object is specified.
<!-- Encoding Notations -->
<!-- THIS IS A NON-MANDATORY STARTER SET. -->
<!notation
UCS-2 -- UCS-2 encoding --
-- This encoding is the two-octet BMP form of coded
representation of the Universal Multiple-Octet
Coded Character Set defined in ISO/IEC 10646. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
UCS-2 Encoding//EN"
>
<!notation
UCS-4 -- UCS-4 encoding --
-- This encoding is the four-octet canonical form of
coded representation of the Universal
Multiple-Octet Coded Character Set defined in
ISO/IEC 10646. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
UCS-4 Encoding//EN"
>
<!notation
UTF-8 -- UTF-8 encoding --
-- This encoding is the UCS Transformation Format 8
defined in Annex P to PDAM 1 of ISO/IEC
10646-1:1993. This encodes a character in the
repertoire of ISO/IEC 10646 using between 1 and 6
octets. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
UTF-8 Encoding//EN"
>
<!notation
UTF-16 -- UTF-16 encoding --
-- This encoding is the UCS Transformation Format 16
defined in Annex Q of Amendment 1 to ISO/IEC
10646-1:1993. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
UTF-16 Encoding//EN"
>
<!notation
UTF-7 -- UTF-7 encoding --
-- This encoding is the UCS Transformation Format 7
encoding defined by IETF RFC 1642. This can be
used to encode characters in the Basic
Multilingual Plane of ISO/IEC 10646. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
UTF-7 Encoding//EN"
>
<!notation
UNICODE -- UNICODE encoding --
-- This represents each character in the Basic
Multilingual Plane of ISO/IEC 10646 by two
octets. The bytes representing the entire storage
object may be preceded by a pair of bytes
representing the byte order mark character
(0xFEFF). The bytes representing each bit
combination are in the system byte order, unless
the byte order mark character is present, in
which case the order of its bytes determines the
byte order. When the storage object is read, any
byte order mark character is discarded. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
UNICODE Encoding//EN"
>
<!notation
EUC-JP -- EUC-JP encoding --
-- This encoding is the Extended UNIX Code Packed
Format for the Japanese registered Internet
character set. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
EUC-JP Encoding//EN"
>
<!notation
SJIS -- SJIS encoding --
-- This is the Shift JIS registered Internet
character set. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
SJIS Encoding//EN"
>
<!notation
IS8859-1 -- ISO8859-1 encoding --
-- This encodes a character from ISO 8859-1 with a
single octet. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
ISO8859-1 Encoding//EN"
>
<!notation
IS8859-2 -- ISO8859-2 encoding --
-- This encodes a character from ISO 8859-2 with a
single octet. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
ISO8859-2 Encoding//EN"
>
<!notation
IS8859-3 -- ISO8859-3 encoding --
-- This encodes a character from ISO 8859-3 with a
single octet. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
ISO8859-3 Encoding//EN"
>
<!notation
IS8859-4 -- ISO8859-4 encoding --
-- This encodes a character from ISO 8859-4 with a
single octet. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
ISO8859-4 Encoding//EN"
>
<!notation
IS8859-5 -- ISO8859-5 encoding --
-- This encodes a character from ISO 8859-5 with a
single octet. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
ISO8859-5 Encoding//EN"
>
<!notation
IS8859-6 -- ISO8859-6 encoding --
-- This encodes a character from ISO 8859-6 with a
single octet. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
ISO8859-6 Encoding//EN"
>
<!notation
IS8859-7 -- ISO8859-7 encoding --
-- This encodes a character from ISO 8859-7 with a
single octet. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
ISO8859-7 Encoding//EN"
>
<!notation
IS8859-8 -- ISO8859-8 encoding --
-- This encodes a character from ISO 8859-8 with a
single octet. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
ISO8859-8 Encoding//EN"
>
<!notation
IS8859-9 -- ISO8859-9 encoding --
-- This encodes a character from ISO 8859-9 with a
single octet. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
ISO8859-9 Encoding//EN"
>
<!notation
SAME -- Same encoding --
-- The encoding of this storage object is the same
as the encoding of the storage object in which
the SOS of this storage object is specified. --
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR ENCODING
No change in encoding//EN"
>A starter set of notations for BCTF algorithms is defined in this International Standard. The notations are:
Each bit combination is represented by a single octet; this BCTF can be used only for storage objects all of whose bit combinations have a value not exceeding 255.
Each bit combination is represented by exactly 2 octets, with the more significant octet first; this BCTF can be used only for storage objects all of whose bit combinations have a value not exceeding 65,535.
Each bit combination is represented by exactly 3 octets, with a more significant octet preceding any less significant octets; this BCTF can be used only for storage objects all of whose bit combinations have a value not exceeding 16,777,215.
Each bit combination is represented by exactly 4 octets, with a more significant octet preceding any less significant octets; this BCTF can be used only for storage objects all of whose bit combinations have a value not exceeding 4,294,967,295.
<!-- BCTF Algorithm Notations -->
<!-- THIS IS A NON-MANDATORY STARTER SET. -->
<!NOTATION identity
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR BCTF
IDENTITY BCTF Algorithm//EN"
>
<!NOTATION fixed-2
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR BCTF
FIXED-2 BCTF Algorithm//EN"
>
<!NOTATION fixed-3
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR BCTF
FIXED-3 BCTF Algorithm//EN"
>
<!NOTATION fixed-4
PUBLIC "ISO/IEC 10744:1997//NOTATION FSIDR BCTF
FIXED-4 BCTF Algorithm//EN"
>The attribute form common storage manager attributes (smcommon) consists of attributes that control common aspects of storage object access and SOS interpretation.
The attribute occupied extents (extents) is a marklist notation dimlist that specifies the extents of the storage object occupied by the entity. Multiple dimension specifications can be specified if the entity is segmented and distributed in several locations within the storage object.
NOTE 511 For example, this technique can be used to interleave the bodies of entities that are accessed concurrently.
The attribute zap end-of-file (zapeof) specifies whether the trailing octet sequence of a storage object should be excluded from the entity when it represents an end-of-file.
NOTE 512 Note that this is different functionality from the extents attribute because with zapeof the user doesn't have to know whether the trailing octets of a particular storage object were in fact the end-of-file control (e.g. the control-Z octet in DOS). Zapeof should normally be specified for DOS (this is the behavior of the standard C libraries with respect to text files).
The attribute storage manager character reference delimiter (smcrd) is a single character that delimits a reference to a character in the inherent character set of the storage manager (which may be different from the character set of the document in which the FSI occurs). The smcrd is recognized only in an SOI or attribute value, and only if followed by a decimal digit. The smcrd character, together with following decimal digits and an optional semicolon, are replaced by the character with the specified number in the inherent character set of the storage manager.
The attribute compression information (compress) can be used to identify a compression/decompression process, declared as a notation.
The attribute encryption information (encrypt) can be used to identify an encryption/decryption process, declared as a notation.
NOTE 513 It may be considered poor security practice to include this information, or even to indicate that the storage object is encrypted.
The attribute integrity information (seal) can be used to identify a verification process, declared as a notation. One registered value for this attribute is provided in the starter set, md5, the standard form of sealing used for transmission of documents over the Internet.
The attribute base for relative SOI (SOIbase) allows an alternative relative base to be specified. Its value must be an SOI for the SM of the specified storage object. SOIbase can be declared for any storage manager that allows a relative SOI.
NOTE 514 Using an SOIbase attribute may not be equivalent to prepending a string to
the SOI for a storage manager that does searching. For example, with an
FSI
<osfile soibase=entities>foo.sgm</osfile>
A storage manager might look first for a file entities/foo.sgm and then,
if that did not exist, for a file pubtext/foo.sgm, whereas with
<osfile>entities/foo.sgm</osfile>
it might look first for a file entities/foo.sgm but then for a file
pubtext/entities/foo.sgm.
<!-- Common Storage Manager Attributes -->
<!attlist #NOTATION
-- smcommon -- -- Common storage manager attributes --
-- Clause: A.6.5.3 --
#ALL
extents -- Dimensions of occupied extents of object --
-- Constraint: applies to storage octets; quantum is
an octet --
CDATA -- Lextype: (marker,marker)+ --
-- Constraint: interpreted as HyTime dimlist --
"1 -1" -- Default: entire object --
zapeof -- Zap end-of-file --
(zapeof|nozapeof)
zapeof
smcrd -- Storage manager character reference delimiter --
CDATA -- Lextype: char --
#IMPLIED -- Default: none --
compress -- Compression information --
CDATA -- Constraint: registered value notation --
#IMPLIED -- Default: none --
encrypt -- Encryption information --
CDATA -- Constraint: registered value notation --
#IMPLIED -- Default: none --
seal -- Integrity information --
CDATA -- Constraint: registered value notation --
#IMPLIED -- Default: none --
SOIbase -- Base for relative SOI --
CDATA -- Constraint: It is an SOI for this SM --
#IMPLIED -- Default: current storage object if it has the
same SM, else none --
>
<!notation md5
PUBLIC "-//IETF/RFC1544//NOTATION FSIDR SEAL
Content-MD5 Header Field//EN"
>| Next Clause | Previous Clause |
HTML generated from the original SGML source using a DSSSL style specification and the SGML output back-end of the JADE DSSSL engine.