From e8e58a3bb9dc233dacf573973457c5b48d369503 Mon Sep 17 00:00:00 2001 From: Terry Truong Date: Tue, 30 Aug 2022 12:27:42 +1000 Subject: Add scripts for generating eol/enwiki mappings - New data sources: OTOL taxonomy, EOL provider-ids, Wikidata dump - Add 'node_iucn' table - Remove 'redirected' field from 'wiki_ids' table - Make 'eol_ids' table have 'name' as the primary key - Combine name-generation scripts into genNameData.py - Combine description-generation scripts into genDescData.py --- backend/tolData/wikidata/README.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 backend/tolData/wikidata/README.md (limited to 'backend/tolData/wikidata/README.md') diff --git a/backend/tolData/wikidata/README.md b/backend/tolData/wikidata/README.md new file mode 100644 index 0000000..db45b3c --- /dev/null +++ b/backend/tolData/wikidata/README.md @@ -0,0 +1,18 @@ +This directory holds files obtained via [Wikidata](https://www.wikidata.org/). + +# Downloaded Files +- `latest-all.json.bz2`
+ Obtained from (on 23/08/22). + Format info can be found at . + +# Other Files +- genTaxonSrcData.py
+ Used to generate a database holding taxon information from the dump. +- offsets.dat
+ Holds bzip2 block offsets for the dump. Generated and used by + genTaxonSrcData.py for parallel processing of the dump. +- taxonSrcs.db
+ Generated by genTaxonSrcData.py.
+ Tables:
+ - `src_id_to_title`: `src TEXT, id INT, title TEXT, PRIMARY KEY(src, id)` + - `title_iucn`: `title TEXT PRIMARY KEY, status TEXT` -- cgit v1.2.3