Parsing Routines

The following parsing related routines of dtd.pl are defined:


Routine Descriptions

DTDread_dtd

&'DTDread_dtd(FILEHANDLE);

DTDread_dtd parses the SGML DTD specified by FILEHANDLE. Parsing of the DTD stops once the end of the file is reached. Any external entity references will be parsed if an entity to filename mapping exists (see DTDread_mapfile).

DTDread_dtd makes the following assumptions when parsing a DTD:

       
After DTDread_dtd is finished, the following associative arrays are filled (remember, all the arrays are within the scope of package dtd):

%ParEntity
Keys: Non-external parameter entities.
Values: Replacement value.
%PubParEntity
Keys: External public parameter entities (PUBLIC).
Values: Entity identifier, if defined.
%SysParEntity
Keys: External public parameter entities (SYSTEM).
Values: Entity identifier, if defined.
%GenEntity
Keys: Regular general entities.
Values: Entity value.
%StartTagEntity
Keys: STARTTAG general entities.
Values: Entity value.
%EndTagEntity
Keys: ENDTAG general entities.
Values: Entity value.
%MSEntity
Keys: MS general entities.
Values: Entity value.
%MDEntity
Keys: MD general entities.
Values: Entity value.
%PIEntity
Keys: PI general entities.
Values: Entity value.
%CDataEntity
Keys: CDATA general entities.
Values: Entity value.
%SDataEntity
Keys: SDATA general entities.
Values: Entity value.
%ElemCont
Keys: Element names.
Values: Base content of declaration of elements.
%ElemInc
Keys: Element names.
Values: Inclusion set declarations.
%ElemExc
Keys: Element names.
Values: Exclusion set declarations.
%ElemTag
Keys: Element names.
Values: Omitted tag minimization.
%Attribute
Keys: Element names.
Values: Attributes for elements. To access the data stored in %Attribute, it is best to use DTDget_elem_attr.

%PubNotation
Keys: PUBLIC Notation names.
Values: Notation identifier.
%SysNotation
Keys: SYSTEM Notation names.
Values: Notation identifier.
       
All entities are expanded when data is stored in %ElemCont, %ElemInc, %ElemInc, %ElemExc, %ElemTag, %Attribute arrays.

       
When trying to locate external entity parameter entity files, DTDread_dtd uses the environment variable P_SGML_PATH. P_SGML_PATH is a colon separated string telling DTDread_dtd where to locate external entities. By default, DTDread_dtd will look in the current working directory or the sub-directory called ents.

If DTDread_dtd cannot cannot resolve an external entity reference, it will issue a warning and continue parsing the DTD.

       
Current status of DTDread_dtd:

       
The performance of DTDread_dtd is not the best. DTDread_dtd makes frequent use of Perl's getc function. If SGML did not have such screwing grammer rules, I could have easily avoided getc (Perl needs better character I/O). I haven't bothered in trying to optimize DTDread_dtd's performance. So far it is working, and I do not feel like mucking with it.

DTDread_dtd is meant to process DTDs in separate files. If a document instance is in the file DTDread_dtd is parsing, God only knows what will happen.

       

DTDread_mapfile

&'DTDread_mapfile($filename);

DTDread_mapfile parses a entity map file specified $filename.

DTDread_mapfile uses the environment variable P_SGML_PATH as described in section DTDread_dtd to locate $filename. This way, one can put the map file in the same location of the entity files.

DTDread_mapfile makes the following assumptions when parsing $filename:

Example of a entity map file:

# DTDread_mapfile will ignore lines beginning with a `#' character.

#####################
# ISO entity files
#
ISO 8879-1986//ENTITIES General Technical//EN iso-tech.ent
ISO 8879-1986//ENTITIES Publishing//EN iso-pub.ent
ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN iso-num.ent
ISO 8879-1986//ENTITIES Greek Letters//EN iso-grk1.ent
ISO 8879-1986//ENTITIES Diacritical Marks//EN iso-dia.ent
ISO 8879-1986//ENTITIES Added Latin 1//EN iso-lat1.ent
ISO 8879-1986//ENTITIES Greek Symbols//EN iso-grk3.ent
ISO 8879-1986//ENTITIES Added Latin 2//EN ISOlat2
ISO 8879-1986//ENTITIES Added Math Symbols: Ordinary//EN ISOamso

#####################
# ArborText entity file
#
-//ArborText//ELEMENTS Math Equation Structures//EN ati-math.elm

#####################
# A sample SYSTEM entities
#
MyGraphics my_graphics.ent

# end of map file


       
If DTDread_mapfile cannot access $filename, it will issue a warning to that effect.

       

DTDset_comment_callback

&'DTDset_comment_callback($callback);

DTDset_comment_callback sets the function, $callback, to be called when a comment declaration is read during DTDread_dtd. $callback is called as follows:

&$callback(*comment_text);

*comment_text is a pointer to the string containing all the text within the SGML comment delaration (excluding the open and close delimiters).

       

DTDset_pi_callback

&'DTDset_pi_callback($callback);

DTDset_pi_callback sets the function, $callback, to be called when a processing instruction is read during DTDread_dtd. $callback is called as follows:

&$callback(*pi_text);

*pi_text is a pointer to the string containing all the text within the processing instruction (excluding the open and close delimiters).

       

DTDset_verbosity

&'DTDset_verbosity($value);

DTDset_verbosity sets the verbosity flag for DTDread_dtd. If $value is non-zero, then DTDread_dtd outputs status messages as it parses a DTD.

       

Back to dtd.pl.


Earl Hood, ehood@convex.com
dtd.pl 2.1.0