From 551fbe163b90cc1f318612c167fbdfe738dd7132 Mon Sep 17 00:00:00 2001 From: Terry Truong Date: Fri, 1 Jul 2022 19:28:12 +1000 Subject: Generate 3 reduced trees, keeping the original, and serve only those Generate a 'trimmed' reduced tree instead of changing the original. Generate an 'images-only' reduced tree, and use it as the default. Combine 'picked' reduced tree code with that of other reduced trees. Adapt server API to allow selecting between more than 2 trees. Add client setting for selecting between 3 trees. --- backend/data/README.md | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) (limited to 'backend/data/README.md') diff --git a/backend/data/README.md b/backend/data/README.md index 776ff17..13aeb89 100644 --- a/backend/data/README.md +++ b/backend/data/README.md @@ -38,13 +38,11 @@ This directory holds files used to generate data.db, which contains tree-of-life Associates a node with an image from another node. `otol_ids` can be an otol ID, or two comma-separated otol IDs or empty strings. The latter is used for compound nodes. -## Reduced-tree data -- `r_nodes`
- Format: `name TEXT PRIMARY KEY, tips INT`
- Like `nodes`, but for a reduced tree. -- `r_edges`
- Format: `node TEXT, child TEXT, p_support INT, PRIMARY KEY (node, child)`
- Like `edges` but for a reduced tree. +## Reduced tree data +- `nodes_t`, `nodes_i`, `nodes_p`
+ These are like `nodes`, but describe the nodes for various reduced trees. +- `edges_t`, `edges_i`, `edges_p`
+ Like `edges` but for reduced trees. # Generating the Database @@ -147,9 +145,8 @@ Some of the python scripts require third-party packages: - pickedNames.txt: Has lines of the form `nodeName1|altName1|prefAlt1`. These correspond to entries in the `names` table. `prefAlt` should be 1 or 0. A line like `name1|name1|1` causes a node to have no preferred alt-name. -3. Run genReducedTreeData.py, which generates a second, reduced version of the tree, - adding the `r_nodes` and `r_edges` tables, using `nodes` and `names`. Reads from - pickedReducedNodes.txt, which lists names of nodes that must be included (1 per line). -4. Optionally run trimTree.py, which tries to remove some 'low significance' nodes, - for the sake of performance and content-relevance. Otherwise, some nodes may have - over 10k children, which can take a while to render (took over a minute in testing). +3. Run genReducedTrees.py, which generates multiple reduced versions of the tree, + adding the `nodes_*` and `edges_*` tables, using `nodes` and `names`. Reads from + pickedNodes.txt, which lists names of nodes that must be included (1 per line). + The original tree isn't used for web-queries, as some nodes would have over + 10k children, which can take a while to render (took over a minute in testing). -- cgit v1.2.3