The basic unit in the CDL system is the project. This document gives a basic introduction to how projects are organized and how to create and work with them.
Before we begin, it is useful to explain the fundamentals which are available to all projects.
While it may not be obvious, the most fundamental part of any project is the catalogue which provides the text metadata--at the very least the CDLI ID and a human-readable designation--which provides the organizational basis for all other components of the project.
The easiest way to provide a catalogue for a corpus is to derive the project dynamically from the CDLI catalogue. However, some projects have special needs and in those cases it is possible to tailor the catalogue processing software to the required metadata fields and values.
Most projects relate in some way to a text corpus. The texts are entered or converted to the ATF format and may have translations. The project management software takes care of turning the ATF sources into the various formats used for web display and other purposes.
The ATF format supports lemmatization, which is the process of adding references to dictionary headwords into the texts. If a corpus is lemmatized, it can be used to generate glossaries directly from the texts with no glossary-editing at all. Normally, however, the glossary and text corpus are used together: the glossary is maintained and may be edited or augmented with bibliography, and the corpus is synchronized with the glossary so that all of the instances of terms are instantly reachable from the glossary articles.
The pager is the name given to the web-interface which enables users to interact with the corpus. The pager understands how to present long lists of results in pages, and also how to assemble metadata, texts and translations into pages displaying individual texts.
The link to the pager display for a project `cams' is:
http://cdl.museum.upenn.edu/cgi-bin/cdlpager?project=cams
A project may use the pager directly as the user interface, or it may have additional pages some of which contain links to the pager. The website may be on the same server as the project data, or it may be located elsewhere.
ANY FILES FOR A PROJECT WEBSITE WHICH ARE LOCATED ON THE CDL
SERVER *MUST* BE PLACED IN THE websources/
DIRECTORY.
The link to the website for a project `cams' is:
http://cdl.museum.upenn.edu/cams
The initial installation redirects this URL to the pager.
The effect of a static HTML page for any given text in any given project is achieved via the following CGI call:
http://cdl.museum.upenn.edu/cgi-bin/cdlpager?prod=html&project=PROJECT&item=PQID
Where PROJECT is the project name and
PQID is the P- or Q-number of the text. Thus, to
retrieve the HTML version of SAA 01 01, with all SAA styling, the call
would be:
http://cdl.museum.upenn.edu/cgi-bin/cdlpager?prod=html&project=saa&item=P336297
This form is suitable for referencing in the
<object> tag. A typical sample code fragment would
look something like this (the example has been formatted to fit the
width of the text; delete backslash-newline-space sequences to use
this example in your HTML):
<object type="text/html"
data="http://cdl.museum.upenn.edu/cgi-bin/cdlpager\
?prod=html&project=saa&item=P334164"
style="height: 1350px; width: 600px; display: block;">
<p>You are seeing this message because your browser does not
support the <object> tag. The transliteration and
translation of this text is available at <a
href="http://cdl.museum.upenn.edu/cgi-bin/cdlpager\
?prod=html&project=saa&item=P334164"
class="external" title="Link opens in new
window">http://cdl.museum.upenn.edu/cgi-bin/cdlpager\
?prod=html&project=saa&item=P334164"</a><span
class="externallinktext">
[http://cdl.museum.upenn.edu/cgi-bin/cdlpager\
?prod=html&project=saa&item=P334164"]</span></p>
</object>
An example of how to use to use this may be found on Knowledge and Power Highlights page.
The user-callable mechanism for emulating the lists of texts
displayed by the pager is the adhoc producer which has
the following paradigmatic form (long lines split as above):
http://cdl.museum.upenn.edu/cgi-bin/cdlpager\ ?prod=adhoc&caller=PROJECT&input=PQIDS&project=PROJECT
Where PROJECT is the project name and
PQIDS is a comma-separated list of P- or Q-ids. To get
an pager display of P334278 and P334279 in project SAA you might
say (long lines split as above):
http://cdl.museum.upenn.edu/cgi-bin/cdlpager\ ?caller=saa;prod=adhoc&input=P334278,P334279&project=SAA
The project organization is intended for use with multi-user systems. At the operating system level, each project is a user with a password and a home directory.
Projects can also own subprojects, which also means that regular users on a system can have their own personal projects.
The files used by a project live in several different folders (aka directories). The most important of these are:
web/ when the project is rebuilt.Two interfaces are presently provided for project management tasks: the command-line interface (CLI) and the menu-driven Emacs interface. The latter is not documented on a separate page.
Access to the CLI is generally provided via the Secure Sockets
Layer (SSL) program ssh, either from the user's
computer's commandline or from a graphical user interface.
Once logged in as the project-user on the server, most tasks are
accomplished via the program cdlproject. Each of the
following headings corresponds to an invocation of
cdlproject followed by the heading--for example, the
heading `rebuild' means that in the CLI you type:
cdlproject rebuild
N.B.: you must be in the home folder/directory when using cdlproject.
If you are using the CDLI catalogue then no action is required. If
you are using your own catalogue, the project must be correctly
configured, then the catalogue updates must be placed in the
catalogue folder with the file name(s) the project has
been configured to use.
There is a separate page about setting up your own project catalogue.
Transliterations should be placed in the sources/
folder. There can be one big file, one file per text, or something in
between; the rebuild process uses all the relevant files in
sources/.
When new texts are added, simply run:
cdlproject rebuild
to update the website, indexes, etc.
The recommended workflow for glossary building is:
Questions about this document may be directed to Steve Tinney (stinney at sas dot upenn dot edu).