Postgres Database Structure

Schema capitularia:

The teiHeader contains in msDesc/msPart references to capitularies and loci but no reference to chapters. The teiHeader includes capitularies not yet transcribed. This relation is caught in table mss_capitularies. The map client uses this relation.

The body contains finer grained references to chapters and loci in the @corresp and @xml:id, but only for already transcribed material. This relation is caught in table mss_chapters. The collation tool uses this relation.

The mss_chapters_text table contains the chapter’s text preprocessed for the collation tool.

Note that the relation between the mss_chapters and msparts tables was inferred using the loci, because there are no milestones for msParts/msItems in the body.

Note that the mscap_n of the mss_capitularies and mss_chapters tables do not indicate the same concept.

G manuscripts             manuscripts           ms_id VARCHAR title VARCHAR filename VARCHAR status VARCHAR siglum VARCHAR msparts             msparts           ms_id VARCHAR msp_part VARCHAR locus_cooked ARRAY date INT4RANGE leaf ARRAY written ARRAY msparts->manuscripts ms_id capitularies             capitularies           cap_id VARCHAR title VARCHAR date INT4RANGE chapters             chapters           cap_id VARCHAR chapter VARCHAR chapters->capitularies cap_id mss_capitularies             mss_capitularies                       according to <msDesc>           ms_id VARCHAR cap_id VARCHAR mscap_n INTEGER msp_part VARCHAR locus VARCHAR locus_cooked ARRAY mss_capitularies->msparts ms_id, msp_part mss_capitularies->capitularies cap_id mss_chapters             mss_chapters                       according to <body>           ms_id VARCHAR cap_id VARCHAR mscap_n INTEGER chapter VARCHAR msp_part VARCHAR locus VARCHAR locus_index INTEGER locus_cooked INTEGER transcribed INTEGER xml NULL mss_chapters->msparts ms_id, msp_part mss_chapters->chapters cap_id, chapter mss_chapters_text             mss_chapters_text           ms_id VARCHAR cap_id VARCHAR mscap_n INTEGER chapter VARCHAR type VARCHAR text TEXT mss_chapters_text->mss_chapters ms_id, cap_id, mscap_n, chapter

Schema capitularia

Schema gis:

Note: the gis.geonames table and relatives contain standard hierarchies of place names, extracted from the geonames, dnb, and viaf databases.

Note: the gis.geoplaces table and relatives contain a different, custom hierarchy of place names, which is the one displayed in the manuscript search box. This hierarchy is extracted from the file mss/lists/capitularia_geo.xml. Unfortunately it is linked to manuscripts and not to manuscript parts.

G gis.mn_msparts_geonames             gis.mn_msparts_geonames           ms_id VARCHAR msp_part VARCHAR geo_id VARCHAR geo_source VARCHAR gis.geonames             gis.geonames           geo_id VARCHAR geo_source VARCHAR parent_id VARCHAR geo_name VARCHAR geo_fcode VARCHAR geom NULL blob JSONB gis.mn_msparts_geonames->gis.geonames geo_source, geo_id msparts             msparts           ms_id VARCHAR msp_part VARCHAR locus_cooked ARRAY date INT4RANGE leaf ARRAY written ARRAY gis.mn_msparts_geonames->msparts ms_id, msp_part gis.geoplaces             gis.geoplaces           geo_id VARCHAR parent_id VARCHAR gis.geonames->gis.geoplaces parent_id->geo_id gis.geoplaces->gis.geoplaces parent_id->geo_id manuscripts             manuscripts           ms_id VARCHAR title VARCHAR filename VARCHAR status VARCHAR siglum VARCHAR msparts->manuscripts ms_id gis.geoareas             gis.geoareas           geo_id VARCHAR geo_source VARCHAR geo_name VARCHAR geo_fcode VARCHAR geo_color VARCHAR geo_label_x DOUBLE PRECISION geo_label_y DOUBLE PRECISION geom NULL gis.geoplaces_names             gis.geoplaces_names           geo_id VARCHAR geo_lang VARCHAR geo_name VARCHAR gis.geoplaces_names->gis.geoplaces geo_id gis.mn_mss_geoplaces             gis.mn_mss_geoplaces           ms_id VARCHAR geo_id VARCHAR gis.mn_mss_geoplaces->gis.geoplaces geo_id gis.mn_mss_geoplaces->manuscripts ms_id

Schema gis

db.py

This module contains the sqlalchemy classes that initialize the database structure.

To create a new database: (must be database superuser)

sudo -u postgres psql
CREATE USER capitularia PASSWORD '<password>';
CREATE DATABASE capitularia OWNER capitularia;

\c capitularia
CREATE EXTENSION pg_trgm WITH SCHEMA public;
CREATE EXTENSION postgis WITH SCHEMA public;
CREATE SCHEMA capitularia AUTHORIZATION capitularia;
CREATE SCHEMA gis AUTHORIZATION capitularia;
ALTER DATABASE capitularia SET search_path = capitularia, gis, public;
\q
make rebuild_db
class Capitularies(**kwargs)

All capitularies catalogued according to BK or Mordek.

G capitularies             capitularies           cap_id VARCHAR title VARCHAR date INT4RANGE
cap_id

The capitulary number, eg. “BK.42”

title

The capitulary title assigned by BK.

class Chapters(**kwargs)

All chapters catalogued according to BK or Mordek.

G chapters             chapters           cap_id VARCHAR chapter VARCHAR
chapter

The chapter number from 1 to N. Also: 1_inscription, etc.

class GeoAreas(**kwargs)

Custom defined geographic areas

G gis.geoareas             gis.geoareas           geo_id VARCHAR geo_source VARCHAR geo_name VARCHAR geo_fcode VARCHAR geo_color VARCHAR geo_label_x DOUBLE PRECISION geo_label_y DOUBLE PRECISION geom NULL
class GeoPlaces(**kwargs)

Data extracted from capitularia_geo.xml

G gis.geoplaces             gis.geoplaces           geo_id VARCHAR parent_id VARCHAR
class GeoPlacesNames(**kwargs)

Data extracted from capitularia_geo.xml

G gis.geoplaces_names             gis.geoplaces_names           geo_id VARCHAR geo_lang VARCHAR geo_name VARCHAR
class Geonames(**kwargs)

Data scraped from geonames.org et al. and cached here.

G gis.geonames             gis.geonames           geo_id VARCHAR geo_source VARCHAR parent_id VARCHAR geo_name VARCHAR geo_fcode VARCHAR geom NULL blob JSONB
class Manuscripts(**kwargs)
G manuscripts             manuscripts           ms_id VARCHAR title VARCHAR filename VARCHAR status VARCHAR siglum VARCHAR
filename

The filename of the TEI file containing the transcription of the manuscript.

ms_id

The manuscript id assigned by the Capitularia project.

status

The Wordpress publication status: either ‘publish’ or ‘private’

title

The official title of the manuscript.

class MnManuscriptsGeoPlaces(**kwargs)

The M:N relationship between manuscripts and geoplaces

G gis.mn_mss_geoplaces             gis.mn_mss_geoplaces           ms_id VARCHAR geo_id VARCHAR
class MnMsPartsGeonames(**kwargs)

The M:N relationship between msparts and geonames

G gis.mn_msparts_geonames             gis.mn_msparts_geonames           ms_id VARCHAR msp_part VARCHAR geo_id VARCHAR geo_source VARCHAR
class MsParts(**kwargs)

The parts of a manuscript

G msparts             msparts           ms_id VARCHAR msp_part VARCHAR locus_cooked ARRAY date INT4RANGE leaf ARRAY written ARRAY
date

When did the manuscript part originate? Range of years.

leaf

Size of the leaf.

locus_cooked

Ranges of cooked loci.

msp_part

The official designation of the manuscript part.

written

Size of the written area.

class MssCapitularies(**kwargs)

A capitulary in a manuscript according to <msDesc>.

This table also contains capitularies that are not yet transcribed.

A finer granularity (chapters instead of capitularies) can be found in the MssChapters table, albeit only already transcribed ones.

G mss_capitularies             mss_capitularies                       according to <msDesc>           ms_id VARCHAR cap_id VARCHAR mscap_n INTEGER msp_part VARCHAR locus VARCHAR locus_cooked ARRAY
locus

The locus of this capitulary instance in the ms as recorded by the editor, eg. 42ra-45vb.

locus_cooked

Ranges of cooked loci.

mscap_n

The n_th occurence of the capitulary in the manuscript. Default is 1.

Since msItem does not contain milestones, this value is inferred by counting the number of preceding loci that contain this capitulary.

N.B.: The value in the mss_chapters table is found in a different way.

msp_part

The official designation of the manuscript part.

class MssChapters(**kwargs)

A chapter in a manuscript according to <body>.

This table contains only chapters that were already transcribed.

Note: The table MssCapitularies relates manuscripts to capitularies yet untranscribed.

G mss_chapters             mss_chapters                       according to <body>           ms_id VARCHAR cap_id VARCHAR mscap_n INTEGER chapter VARCHAR msp_part VARCHAR locus VARCHAR locus_index INTEGER locus_cooked INTEGER transcribed INTEGER xml NULL
locus

The locus of the chapter in the manuscript. As recorded by the editor, eg. 42ra

locus_cooked

The cooked locus. Locus transformed to a sortable integer.

locus_index

The index at the locus, eg. the ‘1’ in 42ra-1

mscap_n

This chapter was found in the n_th occurence of the capitulary in the manuscript. Default is 1.

The value is read from the milestone, eg.:

<milestone unit=’capitulare’ n=’BK.139_2’ />

marks the second occurence of capitulary 139 in this manuscript. All chapters of BK.139 following this milestone get a value of 2 in this field.

N.B.: The value in the mss_capitularies table is found in a different way.

msp_part

The official designation of the manuscript part.

transcribed

Is this chapter already transcribed? 0 == no, 1 == partially, 2 == completed

xml

The XML text of the chapter.

class MssChaptersText(**kwargs)

Various kinds of preprocessed texts extracted from the chapter.

There may be more than one text extracted from the same chapter: the original hand and later corrector hands.

G mss_chapters_text             mss_chapters_text           ms_id VARCHAR cap_id VARCHAR mscap_n INTEGER chapter VARCHAR type VARCHAR text TEXT
text

The preprocessed plain text of the chapter.

type_

Either ‘original’ or ‘later_hands’. The type of preprocessing applied. Whether the original hand was followed or a later corrector.

class XML
bind_processor(dialect)

Return a conversion function for processing bind values.

Returns a callable which will receive a bind parameter value as the sole positional argument and will return a value to send to the DB-API.

If processing is not necessary, the method should return None.

Note

This method is only called relative to a dialect specific type object, which is often private to a dialect in use and is not the same type object as the public facing one, which means it’s not feasible to subclass a types.TypeEngine class in order to provide an alternate _types.TypeEngine.bind_processor() method, unless subclassing the _types.UserDefinedType class explicitly.

To provide alternate behavior for _types.TypeEngine.bind_processor(), implement a _types.TypeDecorator class and provide an implementation of _types.TypeDecorator.process_bind_param().

See also

types_typedecorator

Parameters:

dialect – Dialect instance in use.

result_processor(dialect, coltype)

Return a conversion function for processing result row values.

Returns a callable which will receive a result row column value as the sole positional argument and will return a value to return to the user.

If processing is not necessary, the method should return None.

Note

This method is only called relative to a dialect specific type object, which is often private to a dialect in use and is not the same type object as the public facing one, which means it’s not feasible to subclass a types.TypeEngine class in order to provide an alternate _types.TypeEngine.result_processor() method, unless subclassing the _types.UserDefinedType class explicitly.

To provide alternate behavior for _types.TypeEngine.result_processor(), implement a _types.TypeDecorator class and provide an implementation of _types.TypeDecorator.process_result_value().

See also

types_typedecorator

Parameters:
  • dialect – Dialect instance in use.

  • coltype – DBAPI coltype argument received in cursor.description.