These Guidelines have been developed by the Text Encoding Initiative
(TEI); see
They provide means of representing those features of a text which
need to be identified explicitly in order to facilitate processing of
the text by computer programs. In particular, they specify a set of
markers (or
The Guidelines formulated in this document are intended for use in interchange between individuals and research groups using different programs and computer systems over a broad range of applications. Since they contain an inventory of the features most often found useful for text processing, the Guidelines also provide help to those creating texts in electronic form. They can also be used for the local storage of text which is to be processed with multiple software packages requiring different input formats.
The Guidelines apply to texts in any natural language, of any date,
in any literary genre or text type, without restriction on form or
content. They treat both continuous materials (
The rules and recommendations made in the these Guidelines conform to
ISO 8879, which defines the Standard Generalized Markup Language (SGML),
and make reference to ISO 646, which defines a standard seven-bit
character set in terms of which the recommendations on character-level
interchange are formulated. For more information on SGML see chapter
This document provides the authoritative statement of the requirements and usage of the TEI encoding scheme. Although it includes numerous small examples, it must be stressed that it is intended as a reference manual and that readers unfamiliar with SGML or text markup in general will find it difficult to learn the encoding scheme by reading this document alone.
This document will be complemented by a series of tutorials in text
encoding (document TEI U1 et seq.) and a case book of extended examples
with discussion of the rationale for various markup choices (TEI T1).
TEI
, in other cases
the TEI work group number, e.g. TEI AI5
), the type of document
(here U
and T
, meaning
The remainder of this chapter comprises three sections. The first
gives an overview of the structure and notational conventions used
throughout the document. The second enumerates the design principles
underlying the TEI scheme and the application environments in which it
may be found useful. Finally, the third section gives a brief account
of the origins and development of the Text Encoding Initiative itself.
Part I provides some relevant background information about the
Guidelines themselves (in this chapter); a brief technical
review of SGML (chapter Part II provides a systematic treatment of issues common to all
text types: character representation (chapter
Part III documents various Part IV documents various Part V defines certain specialized Part VI contains a number of technical discussions of a more
specialist interest. Topics covered include the notion of formal
Part VII consists of an alphabetical reference list of all elements
and element classes defined in the TEI encoding scheme. Part VIII provides further reference material: specifically, a
description of how to obtain current versions of the full TEI DTDs and
the set of standard Writing System Declarations, a sample Feature System
Declaration for basic grammatical annotation, sample tag documentation,
and a formal grammar for the subset of SGML used in the TEI
interchange format. In the back matter, a bibliography lists works cited in the text of
the Guidelines. A mechanically generated index is also provided, which
can serve, it is hoped, as a finding aid for the use of the Guidelines.
This section describes the typographic and stylistic conventions used
throughout this document. The use of many terms and concepts which have
not yet been defined is unavoidable in this section. All such terms and
concepts will be explained in later chapters of Part I.
When SGML elements are mentioned in the text, the mentions take the
form These Guidelines distinguish encoding practices, and SGML elements,
which are required, recommended, or optional. The phrases In the reference section in Part VII, elements and their
attributes are all classed as one of:
This reference section includes cross-references to the chapter or
section of the main text within which each element is discussed. Most
sections of the main text in which elements are defined begin with a
descriptive list of the elements concerned in the following format:
Not all attributes are always included in these lists; those which
are shared with other elements in a class are usually listed separately,
and those of relatively specialized interest are usually listed only in
the reference section. The values of the attribute are introduced with
one of the following formulaic phrases:
Each list of elements is followed by some discussion of its
semantics and usage, followed by one or more examples, taken
wherever possible from real texts, and presented in the following
format:
It should be noted that the examples demonstrate a variety of tagging
styles, mostly aimed at making the tagging legible while also showing
fairly explicitly where all elements begin and end. No claim is made or
implied as to the appropriateness of the style adopted here for other
purposes; in particular, those using SGML for local processing may often
prefer to use empty end-tags more frequently than is shown in the
examples, or to omit end-tags.
After the examples and usage notes, each section typically concludes
with a DTD fragment containing the formal declarations for the elements
described. Each DTD fragment is given a heading, and may contain
element and attribute list declarations, entity declarations, parameter
entity references, comments, and references to DTD fragments in other
sections. The DTD fragments of a single chapter almost invariably
belong to the same DTD file, the structure of which is typically
described (with references to the included fragments) in one of the
first or last sections of the chapter. The DTD fragments are identical to the DTDs distributed with
these Guidelines, with the following exceptions:
What appears in the text, therefore, as:
For further discussion, see chapter The planning conference held at Vassar College in November, 1987 (see
section Because of its roots in the humanistic research community, the TEI
scheme is driven by its original goal of serving the needs of research,
and is therefore committed to providing a maximum of comprehensibility,
flexibility, and extensibility. More specific design goals of the TEI
have been that the Guidelines should:
The goals of creating a common interchange format which is
application independent require the definition of a specific markup
syntax as well as the definition of a large predefined tag set. The
syntax of the recommendations made in this document conforms to the
international standard ISO 8879, which defines the Standard Generalized
Markup Language; reference is also made to ISO 646, which defines a
standard seven-bit character set. Full SGML document type declarations
are provided for the scheme described in these Guidelines. The goal of providing guidance for text encoding requires that
recommendations be made as to what textual features should be recorded
in various situations. This mandate is fulfilled by the explicit
specification, in the reference section for each tag, that the tag is
However, the TEI Guidelines make (with relatively rare exceptions)
no suggestions or restrictions as to the relative importance of textual
features. The philosophy of the Guidelines is The Guidelines have been written largely with a focus on text capture
(i.e. the representation in electronic form of an already existing copy
text in another medium) rather than text creation (where no such copy
text exists). Hence the frequent use of terms like
Concerning text capture the TEI Guidelines do not specify a
particular approach to the problem of fidelity to the source text and
recoverability of the original; such a choice is the responsibility of
the text encoder. The current version of these Guidelines, however,
provides a more fully elaborated set of tags for markup of rhetorical,
linguistic, and simple typographic characteristics of the text than for
detailed markup of page layout or for fine distinctions among type fonts
or manuscript hands.
In these Guidelines, no hard and fast distinction is drawn between
In general, the accuracy and the reliability of the encoding and the
appropriateness of the interpretation is for the individual user of the
text to determine. The Guidelines provide a means of documenting the
encoding in such a way that a user of the text can know the reasoning
behind that encoding, and the general interpretive decisions on which it
is based. It is strongly recommended that the TEI header be used to
give an account of these aspects of the encoding. The TEI header is
described in chapter In many situations more than one view of a text is needed. No
absolute recommendation to embody one specific view of text can apply to
all texts and all approaches to them. The syntax of SGML ensures that
some encodings can be ignored for some purposes. To enable encoding
multiple views, these Guidelines not only treat a variety of text
features, but they sometimes provide several alternative encodings for
what appear to be identical textual phenomena. These Guidelines
therefore offer the possibility of encoding many different views of the
text, simultaneously if necessary. However, the Guidelines are built on the assumption that there is a
common core of textual features shared by virtually all texts and
virtually all serious work on texts. This core set of tags is defined
in Chapter In brief, the TEI Guidelines define a general-purpose encoding
scheme which makes it possible to encode different views of text,
possibly intended for different applications, serving the majority of
scholarly purposes of text studies in the humanities. However, no
predefined encoding scheme can serve all research purposes. Therefore,
the TEI also provides means of modifying and extending the encoding
scheme defined by the Guidelines (see chapter We envisage three primary functions for these Guidelines:
The description of textual features found in the chapters which
follow should provide a useful checklist from which scholars planning to
create electronic texts should select the subset of features suitable
for their project. Problems specific to text creation or text
We include here only some general points which are often raised about
SGML and the process of data capture. SGML can appear distressingly verbose, particularly when (as in these
Guidelines) the names of tags and attributes are chosen for clarity and
not for brevity. Editor macros and keyboard shorthands can allow a
typist to enter frequently used tags with single keystrokes.
Special-purpose software may be purchased which scans word-processor or
scanner data and inserts SGML tags. SGML-aware software can help with
maintaining the hierarchical structure of the document, and display the
document with visual formatting rather than raw tags. The techniques described in chapter The SGML standard provides ways of abbreviating, omitting, or
otherwise When the TEI Guidelines are used for interchange, it is expected
that researchers using other encoding schemes in their work will
translate outgoing data from such schemes into the scheme described by
these Guidelines, and similarly translate incoming data from the scheme
described here into those used internally. For such translations to be
carried out without loss of information, the scheme proposed here must
be as expressive (in a formal sense) as any encoding scheme now known to
be in wide use for textual research. To ensure that this is the case, a
set of extension techniques is provided (see chapter For example, to translate from encoding scheme X into the TEI
scheme:
The ease with which this translation can be carried out will of
course depend on the clarity and explicitness with which scheme X
represents the features it encodes.
Translating from the TEI into scheme X follows the same pattern,
except that if a TEI feature has no equivalent in X, and X cannot be
extended, information must be lost in translation. Similar procedures may be followed where the TEI scheme is to be
used as an interlanguage for interchange among several different sites
or applications, although the degree of TEI-conformance may vary. In the simplest case, where two sites or individuals exchanging texts
know each other and know or can inquire what equipment the other is
using, these Guidelines serve primarily as documentation for a file
format, which can be referred to without actually being transmitted
together with the file. In the general case, where sender and recipient
cannot communicate such information, a stricter degree of The rules defining such strict conformance to the Guidelines are
given in some detail in chapter Note that the interchange format makes no formal restriction on the
character set to be used in interchange, as this will depend on the
medium of interchange and the local character sets in use by sender and
receiver. For further information, refer to chapter Machine-readable text can be manipulated in many ways; some users:
These applications cover a wide range of likely uses but are by no
means exhaustive. The aim has been to make the TEI Guidelines useful
for encoding the same texts for different purposes. We have avoided
anything which would restrict the use of the text for other
applications. We have also tried not to omit anything essential to any
single application. The Text Encoding Initiative grew out of a planning conference
sponsored by the Association for Computers and the Humanities (ACH) and
funded by the U.S. National Endowment for the Humanities (NEH), which
was held at Vassar College in November 1987. At this conference some
thirty representatives of text archives, scholarly societies, and
research projects met to discuss the feasibility of a standard encoding
scheme and to make recommendations for its scope, structure, content,
and drafting. During the conference, the Association for Computational
Linguistics and the Association for Literary and Linguistic Computing
agreed to join ACH as sponsors of a project to develop the Guidelines.
The outcome of the conference was this set of principles, which
determined the further course of the project. In the course of the work, some of these goals assumed greater, some
lesser importance; some proved easier, some harder to achieve. The
document in hand does define a standard form for the interchange of
textual material, and adumbrate principles for the creation of new
electronic texts. The only metalanguage used, however, is that of SGML,
and no formal definitions are given of other common encoding schemes.
These Guidelines do define a minimal set of conventions for text
encoding (i.e. those SGML elements classed as recommended or required),
though few researchers will be satisfied to encode The Text Encoding Initiative proper began in June 1988 with funding
from the NEH, soon followed by further funding from the Commission of
the European Communities, the Andrew W. Mellon Foundation, and the
Social Science and Humanities Research Council of Canada. Four working
committees, composed of distinguished scholars and researchers from both
Europe and North America, were named to deal with problems of text
documentation (resulting largely in chapter A first draft version (1.0) of the Guidelines was distributed in July
1990 under the title Extensive public comment and further work on areas not covered in
version 1 resulted in the drafting of a revised version, TEI P2,
distribution of which began in April 1992. This version includes
substantial amounts of new material, resulting from work carried out by
several specialist working groups, set up in 1990 and 1991 to propose
extensions and revisions to the text of P1. The overall organization,
both of the draft itself and of the scheme it describes, was entirely
revised and reorganized in response to public comment on the first
draft. In June, 1993, the Advisory Board of the Text Encoding Initiative met
to review the current state of the Guidelines, and recommended the
formal publication of the work done to that time. The present version
of the TEI Guidelines, TEI P3,
represents a further revision of all chapters
published under the document number TEI P2, and the addition of further
chapters. Although it will be subject to revision and amendment on the
basis of practical experience and public discussion, this version of the
Guidelines is published without the label Work on areas still not satisfactorily covered in this manual will
continue, and resulting recommendations will be issued as supplements to
the published Guidelines. Work is expected to continue in at least the
following areas:
The encoding recommended by this document may be used without fear
that future versions of the TEI scheme will be inconsistent with it in
fundamental ways. The TEI will be sensitive, in revising these
Guidelines, to the possible problems which revision might pose for those
who are already using this draft. Wherever consistent with the
long-term goals of the project, consistency with this version will be
preserved in future revisions. name
is the attname
is the name of the attribute.
Where the elements and attributes
thus mentioned are part of the TEI encoding scheme,
they are included in the index.must
,
is required to
, etc., mark practices and tags which are required
for TEI conformance. The phrases should
, it is recommended
that
, it is preferable to ...
, etc., are used in describing
practices which are recommended but not required for TEI conformance.
Modal verbs like may
, might
, etc., mark practices which
are strictly optional. Qualifying phrases like if desired
,
where appropriate
, or under some circumstances
are used
when some tag or practice described may be desirable or acceptable under
some circumstances and not under others.
italicized phrase
]]>
Attribute values are given indifferently in single quotes or double
quotes; unquoted attribute values are sometimes used where SGML requires
no quotation marks.
This has led to a number of important design decisions, such as:
These goals and principles are expounded in more detail below.if you want to encode
this feature, do it this way
--- but very few features are
mandatory.
These three functions are so thoroughly interwoven in practice that it
is hardly possible to address any one without addressing the others.
However, the distinction provides a useful framework for discussing the
possible role of the Guidelines in work with electronic texts.
The first case is unproblematic. The second requires an extension to
the TEI scheme, as described in chapter
coordinated by a steering committee of representatives of the principal
sponsoring organizations. sets of coding
conventions suited for various applications
, since consensus on
suitable conventions for different applications proved elusive; this
remains a goal for future work.