aboutsummaryrefslogtreecommitdiff
path: root/backend/data/README.md
diff options
context:
space:
mode:
authorTerry Truong <terry06890@gmail.com>2022-07-01 19:28:12 +1000
committerTerry Truong <terry06890@gmail.com>2022-07-01 19:28:12 +1000
commit551fbe163b90cc1f318612c167fbdfe738dd7132 (patch)
tree00286538d754fdf686751a3d4c1689d799ecd65e /backend/data/README.md
parentc2b9a8b7a706cdca58dab7f4a980401e1c20a602 (diff)
Generate 3 reduced trees, keeping the original, and serve only those
Generate a 'trimmed' reduced tree instead of changing the original. Generate an 'images-only' reduced tree, and use it as the default. Combine 'picked' reduced tree code with that of other reduced trees. Adapt server API to allow selecting between more than 2 trees. Add client setting for selecting between 3 trees.
Diffstat (limited to 'backend/data/README.md')
-rw-r--r--backend/data/README.md23
1 files changed, 10 insertions, 13 deletions
diff --git a/backend/data/README.md b/backend/data/README.md
index 776ff17..13aeb89 100644
--- a/backend/data/README.md
+++ b/backend/data/README.md
@@ -38,13 +38,11 @@ This directory holds files used to generate data.db, which contains tree-of-life
Associates a node with an image from another node.
`otol_ids` can be an otol ID, or two comma-separated otol IDs or empty strings.
The latter is used for compound nodes.
-## Reduced-tree data
-- `r_nodes` <br>
- Format: `name TEXT PRIMARY KEY, tips INT` <br>
- Like `nodes`, but for a reduced tree.
-- `r_edges` <br>
- Format: `node TEXT, child TEXT, p_support INT, PRIMARY KEY (node, child)` <br>
- Like `edges` but for a reduced tree.
+## Reduced tree data
+- `nodes_t`, `nodes_i`, `nodes_p` <br>
+ These are like `nodes`, but describe the nodes for various reduced trees.
+- `edges_t`, `edges_i`, `edges_p` <br>
+ Like `edges` but for reduced trees.
# Generating the Database
@@ -147,9 +145,8 @@ Some of the python scripts require third-party packages:
- pickedNames.txt: Has lines of the form `nodeName1|altName1|prefAlt1`.
These correspond to entries in the `names` table. `prefAlt` should be 1 or 0.
A line like `name1|name1|1` causes a node to have no preferred alt-name.
-3. Run genReducedTreeData.py, which generates a second, reduced version of the tree,
- adding the `r_nodes` and `r_edges` tables, using `nodes` and `names`. Reads from
- pickedReducedNodes.txt, which lists names of nodes that must be included (1 per line).
-4. Optionally run trimTree.py, which tries to remove some 'low significance' nodes,
- for the sake of performance and content-relevance. Otherwise, some nodes may have
- over 10k children, which can take a while to render (took over a minute in testing).
+3. Run genReducedTrees.py, which generates multiple reduced versions of the tree,
+ adding the `nodes_*` and `edges_*` tables, using `nodes` and `names`. Reads from
+ pickedNodes.txt, which lists names of nodes that must be included (1 per line).
+ The original tree isn't used for web-queries, as some nodes would have over
+ 10k children, which can take a while to render (took over a minute in testing).