The XML Translations subsystem handles translations of texts. These may be input as interlinear, within the ATF files, or extralinear either within ATF files or separately. Interlinear translations are automatically aligned with the translated material at the line-level; extralinear translations may be aligned either at the level of labels (including label ranges like [o i 14 - o ii 3]) or at the level of units (units are normally sentences). We do not expect that XTR will be generated manually, though it could be; the normal practice will be to prepare translations in a simple format which is an extension to ATF. This format and the special facilities it provides for rendering translations are described here.
Translations are a special part of an ATF text; they may be given
within an ATF text which contains a transliteration or they may be in
their own ATF text. If they are in their own ATF text they require
their own &-line which must be identical to the
&-line used in the corresponding transliteration (see
the ATF tutorial for
further details on &-lines).
Because the ATF processor requires access to both the transliteration and the translation of a text it is easiest practice to give translations along with the transliteration. While this is not a requirement, because the ATF processor can read multiple files on a single run and process them all simultaneously, the multi-file facility is not presently available in the webservice.
Every translation must contain a code specifier which identifies
its project of origin, author, compilation or other salient key. This
code is restricted to letters and digits and is used to implement
XTR's support for multiple translations of the same text. For
interlinear translations there is no method of setting the code: the
code i (lowercase letter i) is reserved for interlinear
translations.
NEW: In order to facilitate cross-project sharing
of texts, the recommended best practice is now to use the translation
code project for the default translation of a work within
a project.
All translation types except interlinear are introduced by a
@translation command followed by the translation
type and a language code.
Interlinear translations are given using the ATF protocol #tr.<LANG>:; the translation follows. A
language code may be given after a period: these language codes must
follow the CDL rules for language codes as given in the GDL tutorial. For an English
translation the protocol would then be #tr.en:.
If no language code is given the default is en.
Parallel translations start with the command:
@translation parallel en tinney
(Here and in the following examples en may be any
standard language code; tinney may be any legal translation
code.)
The remainder of the translation must use exactly the same structural labeling as the transliteration; the ATF processor will automatically align the transliteration and translation using their common structure.
Labeled translations start with the command:
@translation labeled en saa1
Subsequent blocks of translation are introduced by
@label commands where the remainder of the line gives the
label. The label must be either a single label which matches a line
label in the translated source, or a range, i.e., a pair of labels
giving the start and end lines. For an explanation of how labels work
see the ... documentation.
N.B.: Labeling in translations must always be
done at the line level; a label such as r (reverse) is an
error in XTR.
Unitary translations--those whose translation blocks correspond to the annotated units (usually sentences) of a transliteration--start with the command.
@translation unitary en psd
Subsequent blocks of translation are introduced by
@unit commands where the remainder of the line gives a
unit number; these can be found in the unit-view of the source
texts--typically unit 1 is the first sentence, unit 2 is the second
sentence, and so on.
Unitary translations may also follow @unit with a
@span command which follows the same rules as a
@label line. This is not used for alignment but is
intended for the convenience of the translator; such
@span lines can be output automatically by translation
template generators.
A small set of essential block-level commands is available in labeled and unitary translation styles; if you want to use these block commands with interlinear or parallel styles, you can't. You must first convert your translation to use labeled or unitary style.
For labeled and unitary translation types the content of a translation unit is a single paragraph. A blank line is required to close the paragraph.
For parallel and interlinear translation types the content of a
translation unit is the rest of the line following the label or
#tr:, as well as any following lines which start with at
least one space and which do not consist only of spaces:
#tr: this is a long interlinear translation which is more comfortably
handled if split over more than one line.
In labeled and unitary translation types headings may be used
before the @labels. A heading consists of a paragraph
beginning with one of the commands: @h1,
@h2, @h3 specifying first, second and third
level headings respectively.
The content of a heading may use the same inline conventions as translation content.
@h1 Inana steals the @me @label 1 - 10 ...
Translations may contain dollar-lines, in which case they must ordinarily correspond on a 1:1 basis with the $-lines in the transliteration. See below for a discussion of how to handle situations which do not conform to this constraint.
In labeled and unitary translation types notes may be given immediately after either headings or translation paragraphs. These notes may begin with a note marker corresponding to a note marker in the preceding heading or translation content (see Inline Conventions below):
@label 1 The girl^1^ stood on the burning deck. @note ^1^ Three manuscripts read instead: boy. @label 2 ...
If no note marker is given the note is automatically linked to the entire preceding heading or translation unit.
A note may contain one or more paragraphs; paragraphs are separated by blank lines (the processor understands that a line containing only spaces is a "blank line"). A blank line is required to close the final paragraph of the note.
The character set used must be Unicode--no ATF translation of sz and similar conventions is done.
Text supplied for the sense is given in parentheses, e.g., He (Gudea) brought (stone) down from (the
mountain).
Where the literal rendering is known but inadequate for the
context, a word or words may be bracketed by matched pairs of @"..."@ commands, e.g., ...ignore the @"striking"@ among....
Foreign words are indicated by placing an '@'-sign before the word,
e.g., Inana took the @me.
Uncertain translations are bracketed by matched pairs of
@?...?@ commands (note that the close-uncertain form is
query then at-sign), e.g., @?he built the temple?@.
Untranslateable passages should be indicated by an ellipsis
(...). At the end of a sentence, a four-dot ellipsis
should be used (....).
Note markers may be given by placing numbers between matched pairs
of caret (^) characters: this is
noted.^1^. Multiple notes may be referenced in a single marker
by separating them with commas.
Certain aspects of translation are project-specific and should be defined in a project style-manual. These include, among others, practices for normalizing proper-names and whether or not to indicate breakage on the original object in the translation.
For most situations the default handling of alignment of transliteration and translation is adequate. In the normal case, each labeled translation block must have a corresponding label in the translation, and each translation dollar-line must have a matching dollar-line in the transliteration.
For special purposes--including migrating legacy data--several non-standard combinations of alignment are supported in ATF, however.
This does not necessarily require any special action; the translation automatically aligns on labels. An empty $-line may be used in the translation, however, the difference between no dollar line and and an empty dollar line being that in the former case the block-alignment of a labeled translation will extend to include the transliteration's $-line; in the latter case, the two $-lines will align, so the translation block will stop at the line before the transliteration's $-line.
To create this effect an empty $-line must be entered in the transliteration.
To do this, the transliteration $-line must first be given a label
using the syntax $@(LABEL), where
LABEL is any unique label within the text. To achieve
the alignment, the translation must then use the same label; this
label is purely a convenience and is never rendered.
Again, the $-line must be labeled in order to achieve this
effect--in this case, the label of the transliteration line must be
given as in, e.g., $@(r 1).
If the transliteration should have space facing it, or some kind of comment, simply use an empty $-line with a label, as in the previous example.
This effect can be obtained by creating a dummy transliteration
line, which must have a unique label and contain only the inline
comment (#DUMMY#). The translation must use the dummy
line's label; the dummy line creates no output other than blank
space.
&Q000887 = Gudea 1 [E3/1.1.7.1]
1. {d}ba-u2#
#tr.en: For Bau
2. munus sag9-ga
#tr.en: the good woman
3. dumu an-na
#tr.en: child of An
4. nin iri-kug-ga
#tr.en: lady of Iri-kug
5. nin-a-ni
#tr.en: his lady
6. gu3-de2-a
#tr.en: Gudea
7. ensi2
#tr.en: ruler
8. lagasz{ki}-ke4
#tr.en: of Lagaš
9. e2 iri-kug-ga-ni
#tr.en: her house in Iri-kug
10. mu-na-du3
#tr.en: he built.
&Q000887 = Gudea 1 [E3/1.1.7.1] @translation parallel en cdli 1. For Bau, 2. the good woman, 3. child of An, 4. lady of Iri-kug, 5. his lady, 6. Gudea, 7. ruler 8. of Lagaš, 9. her house in Iri-kug 10. he built.
&Q000887 = Gudea 1 [E3/1.1.7.1] @translation labeled en tinney @label 1 - 5 For Bau, the good woman, child of An, lady of Iri-kug, his lady, @label 6 - 8 Gudea, ruler of Lagaš, @label 9 - 10 built her house in Iri-kug.
&Q000887 = Gudea 1 [E3/1.1.7.1] @translation unitary en psd @unit 1 @span 1 - 10 For Bau, the good woman, daughter of An, lady of Iri-kug, his lady, Gudea, ruler of Lagaš built her house in Iri-kug.
The remainder of this document provides the RNG schema used by the XML version of translations and explains implementation details useful for programmers; if you are typing translations in ATF you don't need to read past this point.
The ATF processor transparently handles the relationship between the different surface forms of translations and the single internal data structure and serialized XML form. Linkage between the transliterations and their corresponding translations is also handled automatically.
A multi-pass approach is used to achieve this:
All of the inputs are read, skipping transliterations and interlinear translations but processing extralinear translations.
When a translation is processed it is parsed into an internal data structure which is the same for all surface forms of translation; the type of translation input is preserved. Note that this applies to interlinear translations as well as extralinear ones.
Each unit in the translation is assigned an ID based on the memory address of the translation unit; this is guaranteed to be unique but is irreproducible.
When a transliteration which has an extralinear translation is being parsed a table of lines indexed by labels is built at parse-time by the processor.
When a transliteration which has an interlinear translation is being parsed the translation parsing takes place simultaneously. The end result is that interlinear and extralinear translations both end up in the same internal form.
After parsing, extralinear translations are processed again with each label in the internal structure being looked up in the transliteration index. If the label is not found a warning is emitted. If the label is found the ID for the label is stored in the translation; the translation unit's ID is also added to the transliteration line's node.
In the case of unitary translations, references to the unit ID are
generated at this point. Note that the @span, even if
present, is not used to set start/end references; these can, however,
be looked up at a later time based on the document-centric version of
the XTF document which is generated during linguistic annotation
processing.
The transliteration and translation are then output to their own
files. The transliteration is named by the ID and the extension
.xtf as usual; the translation is named by the ID, a
language subscript and the extension .xtr, e.g., Gudea 1
is Q000887 in the CDL composites catalog so that English and German
translations would be named Q000887-en.xtr and Q000887-de.xtr respectively.
XTR translations are defined as a tiny subset of XHTML so that
rendering them is little more than a matter of inserting appropriate
linking conventions to support user-navigation between the IDREFs in
xtf:ref attributes and the IDs in either the
transliteration or the translation.
Each translation unit is expressed as an xhtml:p
element with a unique ID in the id attribute. This ID is
not derivable by an externally reproducible algorithm, but access to
the translation unit is enabled by setting references to the IDs of
translation units in the xtr:ref attribute on the
relevant line of the XTF tree. In labeled translations only one
translation unit can begin on each line; if you are translating a text
which calls for multiple translation units on a single line you must
use the unitary translation style and annotate the units appropriately in
the ATF transliteration.
Back references from the translation to the transliteration are
given by placing the ID of the first corresponding line in the
transliteration on the xhtml:p element's
xtr:ref in the XTR tree.
The values of the @label, @unit and
@span commands are preserved in XTR attributes with
corresponding names.
namespace xtr = "http://emegir.info/xtr"
namespace xh = "http://www.w3.org/1999/xhtml"
translation =
element xtr:translation {
attribute ref { xsd:NMTOKEN },
attribute n { text },
attribute xml:lang { xsd:NMTOKEN },
attribute xtr:code { xsd:NMTOKEN },
attribute xtr:type {
"interlinear" | "parallel" | "labeled" | "unitary"
},
attribute xtr:cols { xsd:nonNegativeInteger },
(trans-unit | trans-note | h)*,
map?
}
id = attribute xml:id { xsd:ID }
ctr = attribute class { "tr" }
trans-note =
element xh:div {
attribute class { "note" },
id,
ctr,
element xh:p { htext }*
}
trans-unit =
element xh:p {
id,
ctr,
( ref
| (refs,label?)
| (unit,refs?,label?)),
(innerp+ | htext*)
}
h = h1 | h2 | h3
h1 = element xh:h1 { id , ctr, htext }
h2 = element xh:h2 { id , ctr, htext }
h3 = element xh:h3 { id , ctr, htext }
innerp = element xh:innerp { htext }
htext = (text | trcell | foreign | literal | marker | supplied | uncertain)
trcell = element xh:span { attribute class { "cell" },
attribute xtr:span { xsd:nonNegativeInteger }?,
htext }
foreign = element xh:span { attribute class { "tr foreign" }, htext }
literal = element xh:span { attribute class { "tr literal" }, htext }
supplied = element xh:span { attribute class { "tr supplied" }, htext }
uncertain = element xh:span { attribute class { "tr uncertain" },htext }
marker =
element xh:a {
attribute class { "marker" },
nrefs,
xsd:string { pattern = "\d+[a-z]*" }
}
ref = attribute xtr:ref { xsd:IDREF }
refs = start-ref , end-ref , all-refs? , rows, overlap?
start-ref = attribute xtr:sref { xsd:IDREF }
end-ref = attribute xtr:eref { xsd:IDREF }
all-refs = attribute xtr:refs { xsd:IDREFS }
uref = attribute xtr:uref { xsd:IDREF }
nrefs = attribute xtr:nrefs { xsd:IDREFS }
rows = attribute xtr:rows { xsd:integer }
overlap = attribute xtr:overlap { xsd:boolean }
label = attribute xtr:label { text }
unit = attribute xtr:unit { text }
map =
element xtr:map {
element xtr:l2t {
attribute lid { xsd:IDREF },
attribute tid { xsd:IDREF }
}*
}
Questions about this document may be directed to Steve Tinney (stinney at sas dot upenn dot edu).