M-SALT API 0.1 Documentation¶
An API for consulting dictionaries in Sanskrit and Pāli and maybe other ancient South Asian languages.
Warning
This API is not yet finalized!
Overview¶
An article is the atomic unit of content of a dictionary. An article can have one or more headwords. The article may be available in different formats, eg. as fully marked up TEI, or as a series of scanned images. The article may also be available in different transliterations.
The API is designed to allow a client to search multiple dictionaries at once. See Headword Search.
All responses of this API are in JSON format.
Overview of Endpoints¶
This is an overview of the API structure, with all endpoints and the respective response classes.
Transliterations¶
Transliteration (t13n) is the act of representing Devanagari script in Latin script.
The API defines the following private use language subtags to indicate transliterations to Latin script:
subtag | Name | Description |
---|---|---|
Deva | Devanagari | (official subtag, no transliteration) |
Latn-x-hk | Harvard-Kyoto | https://en.wikipedia.org/wiki/Harvard-Kyoto |
Latn-x-iast | IAST | https://en.wikipedia.org/wiki/IAST |
Latn-x-iso | ISO 15919 | https://en.wikipedia.org/wiki/ISO_15919 |
Latn-x-itrans | ITRANS | https://en.wikipedia.org/wiki/ITRANS |
Latn-x-slp1 | SLP1 | https://en.wikipedia.org/wiki/SLP1 |
Latn-x-velthuis | Velthuis | https://en.wikipedia.org/wiki/Velthuis |
Latn-x-wx | WX notation | https://en.wikipedia.org/wiki/WX_notation |
The server MUST answer with valid RFC 5646 language tags, that is, it should prepend a valid language tag. These are valid tags:
- sa-Latn-x-iso
- pi-Deva
- x-slp1
When declaring transliterations in HTML use the lang attribute. Examples:
- <span lang=”sa-Latn-x-iso”>…</span>
- <span lang=”pi-Deva”>…</span>
- <span lang=”x-slp1”>…</span>
See also:
Embedded HTML¶
The client using the API may wish to display the HTML of the article embedded in
a page of the client’s choice. To make this seamless the HTML must be sanitized
by the client and the CSS needed for correct display of your HTML should be
provided by the GET /v1
endpoint.
Sanitized HTML in articles may contain only the tags: div, p, span, i, b, em, strong, sup, sub, br, all with an optional class attribute.
Sanitized HTML in headwords may contain only the tags: i, sup, sub, without attributes. This is to allow the client to merge the headwords it got from different dictionaries in a visually pleasing way.
Endpoints¶
-
GET
/v1
¶ Get information about the dictionary. Used during client initialization.
Example request:
GET /v1 HTTP/1.1 Host: api.cpd.uni-koeln.de
Example response:
HTTP/1.1 200 OK Content-Type: application/json { "css": "span.smalltext { font-size: smaller }", "main_page_url": "http://cpd.uni-koeln.de/", "name": "A Critical P\u0101li Dictionary", "short_name": "CPD", "supported_langs_query": [ "pi-Latn-x-iso" ] }
Response Headers: - Content-Type – application/json
Status Codes: - 200 OK – no error
Response JSON Object: - short_name (string) – A siglum or short name of the dictionary. Max. 10 Unicode characters.
- name (string) – A longer name of the dictionary. Max. 80 Unicode characters.
- main_page_url (url) – The URL of the main page of the dictionary.
- css (string) – Optional. Any CSS needed to display the HTML version of your articles. Use either css or css_url or none. See embedded HTML.
- css_url (url) – Optional. An URL to your CSS sheet. Use either css or css_url or none. See embedded HTML.
- supported_langs_query (array) – The transliterations supported by the server for queries, in order of preference.
When sending the query to the server, the client MAY transliterate the user’s chosen transliteration to one accepted by the server. The client MUST display an error message if unable to do so. The client SHOULD use the user’s chosen transliteration scheme if the server accepts it.
-
GET
/v1/headwords
¶ Get a list of headwords. Used during search.
Example request:
GET /v1/headwords?q=ahimsa*&lang=x-slp1&limit=3 HTTP/1.1 Host: api.cpd.uni-koeln.de
Example response:
HTTP/1.1 200 OK Content-Type: application/json { "data": [ { "articles_url": "v1/articles/11411", "headwords_url": "v1/headwords/43681", "lang": "pi-Latn-x-iso", "normalized_text": "a-hi\u1e41sa", "text": "[a-hi\u1e41sa" }, { "articles_url": "v1/articles/11412", "headwords_url": "v1/headwords/43685", "lang": "pi-Latn-x-iso", "normalized_text": "a-hi\u1e41sa", "text": "a-hi\u1e41sa" }, { "articles_url": "v1/articles/11413", "headwords_url": "v1/headwords/43687", "lang": "pi-Latn-x-iso", "normalized_text": "a-hi\u1e41saka", "text": "a-hi\u1e41saka" } ], "limit": 3 }
Query Parameters: - q – The query. Restrict the result to headwords matching this query.
- fulltext – Full-text query. Restrict the result to headwords of articles matching this text.
- lang – transliteration scheme of the q and fulltext parameters. Default “x-iso”.
- limit – limit number. Default 100.
- offset – offset number. Default 0.
Response Headers: - Content-Type – application/json
Status Codes: - 200 OK – no error
- 400 Bad Request – Bad Request. If the server does not support fulltext searches.
Response JSON Object: - limit (string) – The limit applied by the server to the number of headwords returned. This MUST NOT be higher but MAY be lower than the limit requested in the query. The limit actually used by the server MUST be indicated in the response.
- articles_url (url) – the article endpoint URL of the article relative to the API root.
- headwords_url (url) – the headword endpoint URL relative to the API root.
- normalized_text (string) – the headword as it would be sent in the q parameter.
- lang (string) – The transliteration applied to the headword. Default “x-iso”.
- text (string) – the headword. Some HTML allowed.
If both q and fulltext are specified the filters are both applied. If neither q nor fulltext are specified, this call retrieves a list of all headwords.
q is allowed to contain globs, eg. the character “*” stands for any sequence of characters and the character “?” stands for any single character.
The lang parameter on the request is the transliteration used in the q and fulltext parameters. The transliteration used in the response may be different and is indicated in the response’s lang parameter.
A server not supporting fulltext searches MUST return a http status 400 bad request.
See also: the
GET /v1
endpoint.
-
GET
/v1/headwords/
(id)¶ Get one headword.
Example request:
GET /v1/headwords/43704 HTTP/1.1 Host: api.cpd.uni-koeln.de
Example response:
HTTP/1.1 200 OK Content-Type: application/json { "data": [ { "articles_url": "v1/articles/11421", "headwords_url": "v1/headwords/43704", "lang": "pi-Latn-x-iso", "normalized_text": "a-hi\u1e41s\u0101", "text": "a-hi\u1e41s\u0101" } ], "limit": 100 }
Parameters: - id – The headword id. See:
GET /v1/articles/(id)
.
Response Headers: - Content-Type – application/json
Status Codes: - 200 OK – no error
- 404 Not Found – headword not found
For the response object parameters see:
GET /v1/headwords
.- id – The headword id. See:
-
GET
/v1/headwords/
(id)/context
¶ Get some headwords that alphabetically surround the article’s headword.
Example request:
GET /v1/headwords/43704/context?limit=1 HTTP/1.1 Host: api.cpd.uni-koeln.de
Example response:
HTTP/1.1 200 OK Content-Type: application/json { "data": [ { "articles_url": "v1/articles/11420", "headwords_url": "v1/headwords/43700", "lang": "pi-Latn-x-iso", "normalized_text": "a-hi\u1e41sayat", "text": "a-hi\u1e41sayat" }, { "articles_url": "v1/articles/11421", "headwords_url": "v1/headwords/43704", "lang": "pi-Latn-x-iso", "normalized_text": "a-hi\u1e41s\u0101", "text": "a-hi\u1e41s\u0101" }, { "articles_url": "v1/articles/11437", "headwords_url": "v1/headwords/43733", "lang": "pi-Latn-x-iso", "normalized_text": "a-hita", "text": "a-hita" } ], "limit": 1 }
Parameters: - id – The article id. See:
GET /v1/articles/(id)
.
Query Parameters: - limit – limit number of returned headwords. The call returns limit headwords before the headword, the headword, and limit headwords after the headword, totaling (limit * 2 + 1) headwords. Default 100.
Response Headers: - Content-Type – application/json
Status Codes: - 200 OK – no error
- 404 Not Found – article not found
For the response object parameters see:
GET /v1/headwords
- id – The article id. See:
-
GET
/v1/articles
¶ Get a list of articles.
Example request:
GET /v1/articles?offset=3&limit=3 HTTP/1.1 Host: api.cpd.uni-koeln.de
Example response:
HTTP/1.1 200 OK Content-Type: application/json { "data": [ { "articles_url": "v1/articles/4" }, { "articles_url": "v1/articles/5" }, { "articles_url": "v1/articles/6" } ], "limit": 3 }
Query Parameters: - limit – limit number. Default 100.
- offset – offset number. Default 0.
Response Headers: - Content-Type – application/json
Status Codes: - 200 OK – no error
- 404 Not Found – article not found
Response JSON Object: - articles_url (url) – The endpoint URL of the article.
-
GET
/v1/articles/
(id)¶ Get the article.
Example request:
GET /v1/articles/42 HTTP/1.1 Host: api.cpd.uni-koeln.de
Example response:
HTTP/1.1 200 OK Content-Type: application/json { "data": [ { "articles_url": "v1/articles/42" } ], "limit": 100 }
Parameters: - id – The article id. Can be any string that is convenient to the server and does not contain URL special characters.
Response Headers: - Content-Type – application/json
Status Codes: - 200 OK – no error
- 404 Not Found – article not found
Response JSON Object: - articles_url (url) – The endpoint URL of the article.
Right now a quite pointless endpoint. Included as placeholder for a later POST method, and for symmetry with
GET /v1/headwords/(id)
.
-
GET
/v1/articles/
(id)/formats
¶ Get a list of an article’s available formats. Used during article retrieval.
Example request:
GET /v1/articles/42/formats HTTP/1.1 Host: api.cpd.uni-koeln.de
Example response:
HTTP/1.1 200 OK Content-Type: application/json [ { "embeddable": true, "lang": "pi-Latn-x-iso", "mimetype": "text/x-html-literal", "text": "<div>...</div>" }, { "canonical": true, "embeddable": true, "lang": "pi-Latn-x-iso", "mimetype": "text/html", "root" : "article", "urls": [ "http://cpd.uni-koeln.de/search?article_id=42" ] } { "embeddable": true, "lang" : "pi-Deva", "mimetype" : "text/html", "root" : "article", "urls" : ["https://..."] }, { "mimetype" : "application/xml+tei", "urls" : ["https://..."] }, { "embeddable": true, "lang" : "pi-Deva", "mimetype" : "image/jpeg", "urls" : ["https://img1", "https://img2", "..."] } ]
Parameters: - id – The article id. Can be any string that is convenient to the server and does not contain URL special characters.
Response Headers: - Content-Type – application/json
Status Codes: - 200 OK – no error
- 404 Not Found – article not found
Response JSON Object: - mimetype (string) – The mimetype of the resource pointed to by url or contained in text.
- embeddable (boolean) – Optional. True if the resource is embeddable.
- canonical (boolean) – Optional. True if this URL is the citeable canonical URL for the article.
- lang (string) – The transliteration applied to that article. Default “x-iso”.
- urls (url) – Optional. An array of URLs to a series of resources containing the article.
- text (string) – Optional. Alternatively the article can be included literally. Some HTML allowed.
- root (string) – Optional. A CSS selector pointing to the element in the DOM that contains the article proper. Default is the root element of the returned resource.
The article can be served in the fashion most convenient for the server.
If the article is available as HTML, then the URL to the HTML version SHOULD always be included. If the article is available as image (or as a series of images) the URLs to all images SHOULD be provided in the correct order of reading.
The client will pick the most appropriate resource from the list depending on user preferences. The client will allow the user to cycle through a set of images.
The mimetype parameter indicates the mimetype of the resource. It MUST be the same as the content-type of the server’s response.
The type text/x-html-literal is a custom mimetype used to indicate that the article HTML has been included literally in the text parameter instead of being referenced by URL. Including the article’s text may save the client one trip to the server.
The embeddable parameter SHOULD be true if the resource (or the element pointed to by root) is embeddable, eg. if the resource
- contains only the article proper and
- is self-contained HTML,
but it MUST NOT be true if the resource is not embeddable.
The canonical parameter MUST be true iff the url represents the canonical, citeable URL for the article.
The lang parameter indicates which transliteration was used for Devanagari script in the article.
The urls parameter is always an array even with only one URL returned.
The root parameter is a CSS selector to the root element in the HTML containing the article proper. Set this if the HTML you serve contains extraneous information like headers, footers, navigation bars, etc. Default “article”.
-
GET
/v1/articles/
(id)/headwords
¶ Get a list of an article’s headwords.
Example request:
GET /v1/articles/11412/headwords HTTP/1.1 Host: api.cpd.uni-koeln.de
Example response:
HTTP/1.1 200 OK Content-Type: application/json { "data": [ { "articles_url": "v1/articles/11412", "headwords_url": "v1/headwords/43685", "lang": "pi-Latn-x-iso", "normalized_text": "a-hi\u1e41sa", "text": "a-hi\u1e41sa" }, { "articles_url": "v1/articles/11412", "headwords_url": "v1/headwords/43683", "lang": "pi-Latn-x-iso", "normalized_text": "a-hi\u1e41sat", "text": "a-hi\u1e41sat" } ], "limit": 100 }
Parameters: - id – The article id. See:
GET /v1/articles/(id)
.
Query Parameters: - limit – limit number. Default 100.
- offset – offset number. Default 0.
Response Headers: - Content-Type – application/json
Status Codes: - 200 OK – no error
- 404 Not Found – article not found
For the response object parameters see:
GET /v1/headwords
- id – The article id. See: