Command-Line Interface#
The netrc file is supported by commands that interact with Elasticsearch.
sphinx#
Prints the URL and the documents to index from the OCDS documentation as JSON.
ocdsindex sphinx DIRECTORY BASE_URL
DIRECTORY
: the directory to crawl, containing language directories and HTML filesBASE_URL
: the URL of the website whose files are crawled
Example:
ocdsindex sphinx path/to/standard/build/ https://standard.open-contracting.org/staging/1.1-dev/ > data.json
The output looks like:
{
"base_url": "https://standard.open-contracting.org/staging/1.1-dev/",
"created_at": 1577880000,
"documents": {
"en": [
{
"url": "https://standard.open-contracting.org/staging/1.1-dev/en/#about",
"title": "Open Contracting Data Standard: Documentation - About",
"text": "The Open Contracting Data Standard …"
}
]
}
}
with additional keys for each language and additional objects for each document.
extension-explorer#
Prints the URL and the documents to index from the Extension Explorer as JSON.
ocdsindex extension-explorer FILE
FILE
: the Extension Explorer’s extensions.json file
Example:
ocdsindex extension-explorer path/to/extension_explorer/data/extensions.json > data.json
index#
Adds documents to Elasticsearch indices.
ocdsindex index HOST FILE
HOST
: the connection URI for Elasticsearch, likehttps://user:pass@host:9200
FILE
: the file containing the output of thesphinx
orextension-explorer
command
Example:
ocdsindex index https://user:pass@host:9200 data.json
copy#
Adds a document with a DESTINATION base URL for each document with a SOURCE base URL.
ocdsindex copy HOST SOURCE DESTINATION
HOST
: the connection URI for Elasticsearch, likehttps://user:pass@host:9200
SOURCE
: the base URL of the documents to copyDESTINATION
: the base URL of the documents to create
Example:
ocdsindex copy https://user:pass@host:9200 https://standard.open-contracting.org/staging/latest/ https://standard.open-contracting.org/latest/
expire#
Deletes documents from Elasticsearch indices that were crawled more than 180 days ago.
ocdsindex expire HOST --exclude-file FILENAME
HOST
: the connection URI for Elasticsearch, likehttps://user:pass@host:9200
--exclude-file FILENAME
: exclude any document whose base URL is equal to a line in this file
Example:
ocdsindex expire https://user:pass@host:9200 --exclude-file exclude.txt
Where exclude.txt
contains:
https://standard.open-contracting.org/latest/
https://standard.open-contracting.org/1.1/