From 0cd58b3c1a8c5297579ea7a24a14d82ae8fed169 Mon Sep 17 00:00:00 2001 From: Terry Truong Date: Tue, 30 Aug 2022 17:54:10 +1000 Subject: Add node-popularity data for search-sugg ordering Add Wikipedia pageview dumps to enwiki/pageview/ Add scripts to generate viewcount averages Update backend to sort search suggestions by popularity --- backend/tolData/README.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) (limited to 'backend/tolData/README.md') diff --git a/backend/tolData/README.md b/backend/tolData/README.md index ece07b4..3b78af8 100644 --- a/backend/tolData/README.md +++ b/backend/tolData/README.md @@ -45,7 +45,10 @@ This directory holds files used to generate the tree-of-life database data.db. ## Other - `node_iucn`
Format: `name TEXT PRIMARY KEY, iucn TEXT`
- Associated nodes with IUCN conservation status strings (eg: 'endangered') + Associates nodes with IUCN conservation status strings (eg: 'endangered') +- `node_pop`
+ Format: `name TEXT PRIMARY KEY, pop INT`
+ Associates nodes with popularity values (higher means more popular) # Generating the Database @@ -135,7 +138,12 @@ Some of the scripts use third-party packages: images of it's children. Adds the `linked_imgs` table, and uses the `nodes`, `edges`, and `node_imgs` tables. -## Do some Post-Processing +## Generate Reduced Trees 1. Run genReducedTrees.py, which generates multiple reduced versions of the tree, adding the `nodes_*` and `edges_*` tables, using `nodes` and `names`. Reads from pickedNodes.txt, which lists names of nodes that must be included (1 per line). + +## Generate Node Popularity Data +1. Obtain 'page view files' in enwiki/Run genPopData.py, as specified in it's README. +2. Run genPopData.py, which adds the `node_pop` table, using data in enwiki/, + and the `wiki_ids` table. -- cgit v1.2.3