From eb72584af8f5a598740a87ee024d0d899fdffc8d Mon Sep 17 00:00:00 2001 From: Terry Truong Date: Thu, 26 May 2022 01:06:16 +1000 Subject: Trim otol tree to avoid certain slowdowns Some nodes had multiple ancestors with over 10k children, and jump-searching to them could take almost a minute for vue to load. --- backend/data/README.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) (limited to 'backend/data/README.md') diff --git a/backend/data/README.md b/backend/data/README.md index a1bc287..8cfa960 100644 --- a/backend/data/README.md +++ b/backend/data/README.md @@ -4,7 +4,8 @@ File Generation Process 1 Tree Structure Data 1 Obtain data in otol/, as specified in it's README. 2 Run genOtolData.py, which creates data.db, and adds - 'nodes' and 'edges' tables using data in otol/*. + 'nodes' and 'edges' tables using data in otol/*, as well as + namesToKeep.txt, if present. 2 Name Data for Search 1 Obtain data in eol/, as specified in it's README. 2 Run genEolNameData.py, which adds 'names' and 'eol\_ids' tables to data.db, @@ -57,3 +58,12 @@ Other Files tries to associate tree-of-life node names wth DBpedia node labels. It writes data about them to conflicts.txt, which can be manually edited to resolve them. +- namesToKeep.txt
+ Contains names to avoid trimming off the tree data generated by + genOtolData.py. Usage is optional, but, without it, a large amount + of possibly-significant nodes are removed, using a short-sighted + heuristic.
+ One way to generate this list is to generate the files as usual, + then get node names that have an associated image, linked-image, + description, or presence in r_nodes. Then run the genOtolData.py + and genEolNameData.py scripts again. -- cgit v1.2.3