aboutsummaryrefslogtreecommitdiff
path: root/backend/data
diff options
context:
space:
mode:
authorTerry Truong <terry06890@gmail.com>2022-05-07 11:09:03 +1000
committerTerry Truong <terry06890@gmail.com>2022-05-07 11:09:03 +1000
commitad82c9dc1eb35036c4078b9cd36ae0924e1ff0d2 (patch)
tree83db4a0308009e7d516daf864bf23897224f0508 /backend/data
parent5f8c7e12b6978e50850b434efbdf4062a4284979 (diff)
Update README line breaks
Diffstat (limited to 'backend/data')
-rw-r--r--backend/data/README.md17
-rw-r--r--backend/data/enwiki/README.md12
-rw-r--r--backend/data/eol/README.md8
-rw-r--r--backend/data/otol/README.md8
4 files changed, 25 insertions, 20 deletions
diff --git a/backend/data/README.md b/backend/data/README.md
index 329de09..209a2cc 100644
--- a/backend/data/README.md
+++ b/backend/data/README.md
@@ -25,12 +25,17 @@ File Generation Process
data.db tables
==============
-nodes: name TEXT PRIMARY KEY, children TEXT, parent TEXT, tips INT, p\_support INT
-names: name TEXT, alt\_name TEXT, pref\_alt INT, PRIMARY KEY(name, alt\_name)
-eol\_ids: id INT PRIMARY KEY, name TEXT
-spellfix\_alt\_names
-images: eol\_id INT PRIMARY KEY, source\_url TEXT, license TEXT, copyright\_owner TEXT
-descs: name TEXT PRIMARY KEY, desc TEXT, redirected INT
+- nodes <br>
+ name TEXT PRIMARY KEY, children TEXT, parent TEXT, tips INT, p\_support INT
+- names <br>
+ name TEXT, alt\_name TEXT, pref\_alt INT, PRIMARY KEY(name, alt\_name)
+- eol\_ids <br>
+ id INT PRIMARY KEY, name TEXT
+- spellfix\_alt\_names
+- images <br>
+ eol\_id INT PRIMARY KEY, source\_url TEXT, license TEXT, copyright\_owner TEXT
+- descs <br>
+ name TEXT PRIMARY KEY, desc TEXT, redirected INT
spellfix.so
===========
diff --git a/backend/data/enwiki/README.md b/backend/data/enwiki/README.md
index 8e748c9..e4e1aae 100644
--- a/backend/data/enwiki/README.md
+++ b/backend/data/enwiki/README.md
@@ -1,22 +1,22 @@
Downloaded Files
================
-- enwiki\_content/enwiki-20220420-pages-articles-*.xml.gz:
+- enwiki\_content/enwiki-20220420-pages-articles-*.xml.gz <br>
Obtained via https://dumps.wikimedia.org/backup-index.html (site suggests downloading from a mirror).
Contains text content and metadata for pages in English Wikipedia (current revision only, excludes talk pages).
Some file content and format information was available from
https://meta.wikimedia.org/wiki/Data_dumps/What%27s_available_for_download.
-- enwiki-20220420-page.sql.gz:
+- enwiki-20220420-page.sql.gz <br>
Obtained like above. Contains page-table information including page id, namespace, title, etc.
Format information was found at https://www.mediawiki.org/wiki/Manual:Page_table.
-- enwiki-20220420-redirect.sql.gz:
+- enwiki-20220420-redirect.sql.gz <br>
Obtained like above. Contains page-redirection info.
Format information was found at https://meta.wikimedia.org/wiki/Data_dumps/What%27s_available_for_download.
Generated Files
===============
-- enwiki\_content/enwiki-*.xml and enwiki-*.sql:
+- enwiki\_content/enwiki-*.xml and enwiki-*.sql <br>
Uncompressed versions of downloaded files.
-- enwikiData.db:
+- enwikiData.db <br>
An sqlite database representing data from the enwiki dump files.
Generation:
1 Install python, and packages mwsql, mwxml, and mwparsefromhell. Example:
@@ -31,5 +31,5 @@ Generated Files
4 Run genDescData.py, which reads the page-content xml dumps, and the 'pages' and 'redirects' tables,
and associates page ids with (potentially redirect-resolved) pages, and attempts to parse some
wikitext within those pages to obtain the first descriptive paragraph, with markup removed.
-- .venv:
+- .venv <br>
Provides a python virtual environment for packages needed to generate data.
diff --git a/backend/data/eol/README.md b/backend/data/eol/README.md
index d863099..6f1f6c6 100644
--- a/backend/data/eol/README.md
+++ b/backend/data/eol/README.md
@@ -1,15 +1,15 @@
Downloaded Files
================
-- imagesList.tgz:
+- imagesList.tgz <br>
Obtained from https://opendata.eol.org/dataset/images-list on 24/04/2022.
Listed as being last updated on 05/02/2020.
-- vernacularNames.csv:
+- vernacularNames.csv <br>
Obtained from https://opendata.eol.org/dataset/vernacular-names on 24/04/2022.
Listed as being last updated on 27/10/2020.
Generated Files
===============
-- imagesList/:
+- imagesList/ <br>
Obtained by extracting imagesList.tgz.
-- imagesList.db:
+- imagesList.db <br>
Represents data from eol/imagesList/*, and is created by genImagesListDb.sh.
diff --git a/backend/data/otol/README.md b/backend/data/otol/README.md
index 58aad3c..a6f13c2 100644
--- a/backend/data/otol/README.md
+++ b/backend/data/otol/README.md
@@ -1,6 +1,6 @@
Downloaded Files
================
-- labelled\_supertree\_ottnames.tre
- Obtained from https://tree.opentreeoflife.org/about/synthesis-release/v13.4
-- annotations.json
- Obtained from https://tree.opentreeoflife.org/about/synthesis-release/v13.4
+- labelled\_supertree\_ottnames.tre <br>
+ Obtained from https://tree.opentreeoflife.org/about/synthesis-release/v13.4.
+- annotations.json <br>
+ Obtained from https://tree.opentreeoflife.org/about/synthesis-release/v13.4.