aboutsummaryrefslogtreecommitdiff
path: root/backend/tolData/README.md
diff options
context:
space:
mode:
authorTerry Truong <terry06890@gmail.com>2022-08-30 17:54:10 +1000
committerTerry Truong <terry06890@gmail.com>2022-08-30 17:54:10 +1000
commit0cd58b3c1a8c5297579ea7a24a14d82ae8fed169 (patch)
tree17c02e7578a0f7b09461f3bca0fa785301292744 /backend/tolData/README.md
parent0f39be89c3d5620b8187b1d7621b7680800c268b (diff)
Add node-popularity data for search-sugg ordering
Add Wikipedia pageview dumps to enwiki/pageview/ Add scripts to generate viewcount averages Update backend to sort search suggestions by popularity
Diffstat (limited to 'backend/tolData/README.md')
-rw-r--r--backend/tolData/README.md12
1 files changed, 10 insertions, 2 deletions
diff --git a/backend/tolData/README.md b/backend/tolData/README.md
index ece07b4..3b78af8 100644
--- a/backend/tolData/README.md
+++ b/backend/tolData/README.md
@@ -45,7 +45,10 @@ This directory holds files used to generate the tree-of-life database data.db.
## Other
- `node_iucn` <br>
Format: `name TEXT PRIMARY KEY, iucn TEXT` <br>
- Associated nodes with IUCN conservation status strings (eg: 'endangered')
+ Associates nodes with IUCN conservation status strings (eg: 'endangered')
+- `node_pop` <br>
+ Format: `name TEXT PRIMARY KEY, pop INT` <br>
+ Associates nodes with popularity values (higher means more popular)
# Generating the Database
@@ -135,7 +138,12 @@ Some of the scripts use third-party packages:
images of it's children. Adds the `linked_imgs` table, and uses the
`nodes`, `edges`, and `node_imgs` tables.
-## Do some Post-Processing
+## Generate Reduced Trees
1. Run genReducedTrees.py, which generates multiple reduced versions of the tree,
adding the `nodes_*` and `edges_*` tables, using `nodes` and `names`. Reads from
pickedNodes.txt, which lists names of nodes that must be included (1 per line).
+
+## Generate Node Popularity Data
+1. Obtain 'page view files' in enwiki/Run genPopData.py, as specified in it's README.
+2. Run genPopData.py, which adds the `node_pop` table, using data in enwiki/,
+ and the `wiki_ids` table.