Home


Introduction


Location


Codes


Types

Interlinear

Parallel

Labeled

Unitary


Conventions

Block

Paragraphs

Headings

Dollar Lines

Notes

Inline

Characters

Supplied

Literal

Foreign

Uncertain

Untranslateable

Note markers


Projects


Alignment


Examples


xtr.rnc

Processing

Schema


Resources

XTR: XML Translations

(http://emegir.info/xtr)

Steve Tinney
Version of 2009-11-19

Introduction

The XML Translations subsystem handles translations of texts. These may be input as interlinear, within the ATF files, or extralinear either within ATF files or separately. Interlinear translations are automatically aligned with the translated material at the line-level; extralinear translations may be aligned either at the level of labels (including label ranges like [o i 14 - o ii 3]) or at the level of units (units are normally sentences). We do not expect that XTR will be generated manually, though it could be; the normal practice will be to prepare translations in a simple format which is an extension to ATF. This format and the special facilities it provides for rendering translations are described here.

Location

Translations are a special part of an ATF text; they may be given within an ATF text which contains a transliteration or they may be in their own ATF text. If they are in their own ATF text they require their own &-line which must be identical to the &-line used in the corresponding transliteration (see the ATF tutorial for further details on &-lines).

Because the ATF processor requires access to both the transliteration and the translation of a text it is easiest practice to give translations along with the transliteration. While this is not a requirement, because the ATF processor can read multiple files on a single run and process them all simultaneously, the multi-file facility is not presently available in the webservice.

Codes

Every translation must contain a code specifier which identifies its project of origin, author, compilation or other salient key. This code is restricted to letters and digits and is used to implement XTR's support for multiple translations of the same text. For interlinear translations there is no method of setting the code: the code i (lowercase letter i) is reserved for interlinear translations.

NEW: In order to facilitate cross-project sharing of texts, the recommended best practice is now to use the translation code project for the default translation of a work within a project.

Types

All translation types except interlinear are introduced by a @translation command followed by the translation type and a language code.

Interlinear

Interlinear translations are given using the ATF protocol #tr.<LANG>:; the translation follows. A language code may be given after a period: these language codes must follow the CDL rules for language codes as given in the GDL tutorial. For an English translation the protocol would then be #tr.en:.

If no language code is given the default is en.

Parallel

Parallel translations start with the command:

@translation parallel en tinney

(Here and in the following examples en may be any standard language code; tinney may be any legal translation code.)

The remainder of the translation must use exactly the same structural labeling as the transliteration; the ATF processor will automatically align the transliteration and translation using their common structure.

Labeled

Labeled translations start with the command:

@translation labeled en saa1

Subsequent blocks of translation are introduced by @label commands where the remainder of the line gives the label. The label must be either a single label which matches a line label in the translated source, or a range, i.e., a pair of labels giving the start and end lines. For an explanation of how labels work see the ... documentation.

N.B.: Labeling in translations must always be done at the line level; a label such as r (reverse) is an error in XTR.

Unitary

Unitary translations--those whose translation blocks correspond to the annotated units (usually sentences) of a transliteration--start with the command.

@translation unitary en psd

Subsequent blocks of translation are introduced by @unit commands where the remainder of the line gives a unit number; these can be found in the unit-view of the source texts--typically unit 1 is the first sentence, unit 2 is the second sentence, and so on.

Unitary translations may also follow @unit with a @span command which follows the same rules as a @label line. This is not used for alignment but is intended for the convenience of the translator; such @span lines can be output automatically by translation template generators.

Conventions

Block

A small set of essential block-level commands is available in labeled and unitary translation styles; if you want to use these block commands with interlinear or parallel styles, you can't. You must first convert your translation to use labeled or unitary style.

Paragraphs

For labeled and unitary translation types the content of a translation unit is a single paragraph. A blank line is required to close the paragraph.

For parallel and interlinear translation types the content of a translation unit is the rest of the line following the label or #tr:, as well as any following lines which start with at least one space and which do not consist only of spaces:

#tr: this is a long interlinear translation which is more comfortably
     handled if split over more than one line.

Headings

In labeled and unitary translation types headings may be used before the @labels. A heading consists of a paragraph beginning with one of the commands: @h1, @h2, @h3 specifying first, second and third level headings respectively.

The content of a heading may use the same inline conventions as translation content.

@h1 Inana steals the @me

@label 1 - 10
...

Dollar Lines

Translations may contain dollar-lines, in which case they must ordinarily correspond on a 1:1 basis with the $-lines in the transliteration. See below for a discussion of how to handle situations which do not conform to this constraint.

Notes

In labeled and unitary translation types notes may be given immediately after either headings or translation paragraphs. These notes may begin with a note marker corresponding to a note marker in the preceding heading or translation content (see Inline Conventions below):

@label 1

The girl^1^ stood on the burning deck.

@note ^1^ Three manuscripts read instead: boy.

@label 2
...

If no note marker is given the note is automatically linked to the entire preceding heading or translation unit.

A note may contain one or more paragraphs; paragraphs are separated by blank lines (the processor understands that a line containing only spaces is a "blank line"). A blank line is required to close the final paragraph of the note.

Inline

Characters

The character set used must be Unicode--no ATF translation of sz and similar conventions is done.

Supplied

Text supplied for the sense is given in parentheses, e.g., He (Gudea) brought (stone) down from (the mountain).

Literal

Where the literal rendering is known but inadequate for the context, a word or words may be bracketed by matched pairs of @"..."@ commands, e.g., ...ignore the @"striking"@ among....

Foreign

Foreign words are indicated by placing an '@'-sign before the word, e.g., Inana took the @me.

Uncertain

Uncertain translations are bracketed by matched pairs of @?...?@ commands (note that the close-uncertain form is query then at-sign), e.g., @?he built the temple?@.

Untranslateable

Untranslateable passages should be indicated by an ellipsis (...). At the end of a sentence, a four-dot ellipsis should be used (....).

Note markers

Note markers may be given by placing numbers between matched pairs of caret (^) characters: this is noted.^1^. Multiple notes may be referenced in a single marker by separating them with commas.

Projects

Certain aspects of translation are project-specific and should be defined in a project style-manual. These include, among others, practices for normalizing proper-names and whether or not to indicate breakage on the original object in the translation.

Alignment

For most situations the default handling of alignment of transliteration and translation is adequate. In the normal case, each labeled translation block must have a corresponding label in the translation, and each translation dollar-line must have a matching dollar-line in the transliteration.

For special purposes--including migrating legacy data--several non-standard combinations of alignment are supported in ATF, however.

$-line only in Transliteration

This does not necessarily require any special action; the translation automatically aligns on labels. An empty $-line may be used in the translation, however, the difference between no dollar line and and an empty dollar line being that in the former case the block-alignment of a labeled translation will extend to include the transliteration's $-line; in the latter case, the two $-lines will align, so the translation block will stop at the line before the transliteration's $-line.

$-line only in Translation

To create this effect an empty $-line must be entered in the transliteration.

$-line in Transliteration aligns with Translation block

To do this, the transliteration $-line must first be given a label using the syntax $@(LABEL), where LABEL is any unique label within the text. To achieve the alignment, the translation must then use the same label; this label is purely a convenience and is never rendered.

$-line in Translation aligns with text line in Transliteration

Again, the $-line must be labeled in order to achieve this effect--in this case, the label of the transliteration line must be given as in, e.g., $@(r 1).

Transliteration line is Untranslated

If the transliteration should have space facing it, or some kind of comment, simply use an empty $-line with a label, as in the previous example.

Translation has no Corresponding Transliteration

This effect can be obtained by creating a dummy transliteration line, which must have a unique label and contain only the inline comment (#DUMMY#). The translation must use the dummy line's label; the dummy line creates no output other than blank space.

Examples

Gudea 1 in Interlinear style

&Q000887 = Gudea 1 [E3/1.1.7.1]
1.	{d}ba-u2#
#tr.en: For Bau

2.	munus sag9-ga
#tr.en: the good woman

3.	dumu an-na
#tr.en: child of An

4.	nin iri-kug-ga
#tr.en: lady of Iri-kug

5.	nin-a-ni
#tr.en: his lady

6.	gu3-de2-a
#tr.en: Gudea

7.	ensi2
#tr.en: ruler

8.	lagasz{ki}-ke4
#tr.en: of Lagaš

9.	e2 iri-kug-ga-ni
#tr.en: her house in Iri-kug

10.	mu-na-du3
#tr.en: he built.

Gudea 1 in Parallel style

&Q000887 = Gudea 1 [E3/1.1.7.1]
@translation parallel en cdli

1.	For Bau,
2.	the good woman,
3.	child of An,
4.	lady of Iri-kug,
5.	his lady,
6.	Gudea,
7.	ruler
8.	of Lagaš,
9.	her house in Iri-kug
10.	he built.

Gudea 1 in Labeled style

&Q000887 = Gudea 1 [E3/1.1.7.1]
@translation labeled en tinney

@label 1 - 5

For Bau, the good woman, child of An, lady of Iri-kug, his lady,

@label 6 - 8

Gudea, ruler of Lagaš,

@label 9 - 10
	
built her house in Iri-kug.

Gudea 1 in Unitary style

&Q000887 = Gudea 1 [E3/1.1.7.1]
@translation unitary en psd
@unit 1
@span 1 - 10

For Bau, the good woman, daughter of An, lady of Iri-kug, his
lady, Gudea, ruler of Lagaš built her house in Iri-kug.

xtr.rnc

The remainder of this document provides the RNG schema used by the XML version of translations and explains implementation details useful for programmers; if you are typing translations in ATF you don't need to read past this point.

Processing

The ATF processor transparently handles the relationship between the different surface forms of translations and the single internal data structure and serialized XML form. Linkage between the transliterations and their corresponding translations is also handled automatically.

A multi-pass approach is used to achieve this:

Schema

XTR translations are defined as a tiny subset of XHTML so that rendering them is little more than a matter of inserting appropriate linking conventions to support user-navigation between the IDREFs in xtf:ref attributes and the IDs in either the transliteration or the translation.

Each translation unit is expressed as an xhtml:p element with a unique ID in the id attribute. This ID is not derivable by an externally reproducible algorithm, but access to the translation unit is enabled by setting references to the IDs of translation units in the xtr:ref attribute on the relevant line of the XTF tree. In labeled translations only one translation unit can begin on each line; if you are translating a text which calls for multiple translation units on a single line you must use the unitary translation style and annotate the units appropriately in the ATF transliteration.

Back references from the translation to the transliteration are given by placing the ID of the first corresponding line in the transliteration on the xhtml:p element's xtr:ref in the XTR tree.

The values of the @label, @unit and @span commands are preserved in XTR attributes with corresponding names.

namespace xtr = "http://emegir.info/xtr"
namespace xh  = "http://www.w3.org/1999/xhtml"

translation =
  element xtr:translation {
    attribute ref      { xsd:NMTOKEN },
    attribute n        { text },
    attribute xml:lang { xsd:NMTOKEN },
    attribute xtr:code { xsd:NMTOKEN },
    attribute xtr:type { 
      "interlinear" | "parallel" | "labeled" | "unitary" 
    },
    attribute xtr:cols { xsd:nonNegativeInteger },
    (trans-unit | trans-note | h)*,
    map?
  }

id  = attribute xml:id { xsd:ID }
ctr = attribute class { "tr" }

trans-note = 
  element xh:div {
    attribute class { "note" },
    id,
    ctr,
    element xh:p { htext }*
  }

trans-unit = 
  element xh:p {
    id,
    ctr,
    ( ref
    | (refs,label?) 
    | (unit,refs?,label?)),
    (innerp+ | htext*)
  }

h = h1 | h2 | h3
h1 = element xh:h1 { id , ctr, htext }
h2 = element xh:h2 { id , ctr, htext }
h3 = element xh:h3 { id , ctr, htext }

innerp = element xh:innerp { htext }
htext = (text | trcell | foreign | literal | marker | supplied | uncertain)

trcell	  = element xh:span { attribute class { "cell" },  
			      attribute xtr:span { xsd:nonNegativeInteger }?,
			      htext }
foreign   = element xh:span { attribute class { "tr foreign" },  htext }
literal   = element xh:span { attribute class { "tr literal" },  htext }
supplied  = element xh:span { attribute class { "tr supplied" }, htext }
uncertain = element xh:span { attribute class { "tr uncertain" },htext }

marker = 
  element xh:a {
    attribute class { "marker" },
    nrefs,
    xsd:string { pattern = "\d+[a-z]*" }
  }

ref       = attribute xtr:ref     { xsd:IDREF  }
refs      = start-ref , end-ref , all-refs? , rows, overlap?
start-ref = attribute xtr:sref    { xsd:IDREF  }
end-ref   = attribute xtr:eref    { xsd:IDREF  }
all-refs  = attribute xtr:refs    { xsd:IDREFS }
uref      = attribute xtr:uref    { xsd:IDREF  }
nrefs     = attribute xtr:nrefs   { xsd:IDREFS }
rows 	  = attribute xtr:rows    { xsd:integer }
overlap   = attribute xtr:overlap { xsd:boolean }

label = attribute xtr:label { text }
unit  = attribute xtr:unit  { text }

map =
  element xtr:map {
    element xtr:l2t {
      attribute lid { xsd:IDREF },
      attribute tid { xsd:IDREF }
    }*
  }

Resources

gudea1.atf
ASCII TRANSLITERATION FORMAT file.
oxtr.xdf
XDF source for this documentation.
xtr-HTML.xsl
XSL transform from xtr to HTML.
xtr.rnc
Xtr Relax NG Compact Syntax grammar.
xtr.rng
Xtr Relax NG grammar.
xtr.xdf
.

Questions about this document may be directed to Steve Tinney (stinney at sas dot upenn dot edu).