Scripts

Documentation of the Python scripts.

import_data.py

Inserts and updates data from TEI files into Postgres.

  • Inserts the contents of each single capitulary chapter into the Postgres database.

  • Inserts the geographic information into Postgres.

This tool is run at regular intervals by cron to keep the postgres database in sync with the data in the TEI file collection.

usage: import_data.py [-h] [-v] [-c CONFIG_FILE] [--my-cnf .MY.CNF.FILE] [--init] [--mss FILES [FILES ...]] [--cap-list FILES [FILES ...]] [--extracted FILES [FILES ...]] [--fulltext FILES [FILES ...]]
                      [--solr FILES [FILES ...]] [--geoareas FILES [FILES ...]] [--publish] [--geoplaces GEOPLACES] [--geonames] [--dnb] [--viaf] [--truncate]
-h, --help

show this help message and exit

-v, --verbose

increase output verbosity

-c <config_file>, --config-file <config_file>

the config file (default=’server.conf’)

--my-cnf <.my.cnf.file>

the mysql .my.cnf file to use (default:~/.my.cnf) (reads client and mysql sections)

--init

initialize the Postgres database

--mss <files>

the manuscript files (or the corpus file) to import

--cap-list <files>

the capitularies lists to import

--extracted <files>

import per-chapter extracted XML from files

--fulltext <files>

import per-chapter extracted fulltext from files

--solr <files>

index per-chapter extracted fulltext from files with Solr

--geoareas <files>

import geoareas from geojson file

--publish

get the publish status from Wordpress Ajax API

--geoplaces <geoplaces>

import geoplaces XML

--geonames

lookup geonames.org

--dnb

lookup dnb.de

--viaf

lookup viaf.org

--truncate

truncate the relative Postgres table before importing into it

import_solr.py

Indexes TEI files and Wordpress pages into Solr.

  • Connects to the Wordpress mysql database and indexes all pages and posts to Solr.

  • Indexes the header (Mordek) sections of TEI files to Solr.

This program is run nightly by cron.

usage: import_solr.py [-h] [-v] [-c CONFIG_FILE] [--my-cnf .MY.CNF.FILE] [--wordpress] [--incremental TOUCHFILE] [--mss FILE.TEI [FILE.TEI ...]]
-h, --help

show this help message and exit

-v, --verbose

increase output verbosity

-c <config_file>, --config-file <config_file>

the config file (default=’server.conf’)

--my-cnf <.my.cnf.file>

the mysql .my.cnf file to use (default:~/.my.cnf) (reads client and mysql sections)

--wordpress

indexes all Wordpress pages and posts to Solr

--incremental <touchfile>

only indexes pages changed after the date of the given file

--mss <file.tei>

the manuscript files to index