File Generation Process ======================= 1 Tree Structure Data 1 Obtain data in otol/, as specified in it's README. 2 Run genOtolData.py, which creates data.db, and adds 'nodes' and 'edges' tables using data in otol/*. 2 Name Data for Search 1 Obtain data in eol/, as specified in it's README. 2 Run genEolNameData.py, which adds 'names' and 'eol\_ids' tables to data.db, using data in eol/vernacularNames.csv and the 'nodes' table. 3 Image Data 1 Use downloadImgsForReview.py to download EOL images into imgsForReview/. It uses data in eol/imagesList.db, and the 'eol\_ids' table. 2 Use reviewImgs.py to filter images in imgsForReview/ into EOL-id-unique images in imgsReviewed/ (uses 'names' and 'eol\_ids' to display extra info). 3 Use genImgsForWeb.py to create cropped/resized images in img/, using images in imgsReviewed, and also to add an 'images' table to data.db. 4 Node Description Data - Using DBpedia 1 Obtain data in dbpedia/, as specified in it's README. 2 Run genDbpData.py, which adds a 'descs' table to data.db, using data in dbpedia/dbpData.db, dbpPickedLabels.txt, and the 'nodes' table. - Supplementing with Wikipedia dump 1 Obtain data in enwiki/, as specified in it's README. 2 Run genEnwikiData.py, which adds to the 'descs' table, using data in enwiki/enwikiData.db, reducedTol/names.txt, and the 'nodes' table. 5 Reduced Tree Structure Data 1 Run genReducedTreeData.py, which adds 'r_nodes' and 'r_edges' tables to data.db, using reducedTol/names.txt, and the 'nodes' and 'names' tables. data.db Tables ============== - nodes: name TEXT PRIMARY KEY, tips INT - edges: node TEXT, child TEXT, p\_support INT, PRIMARY KEY (node, child) - names: name TEXT, alt\_name TEXT, pref\_alt INT, PRIMARY KEY(name, alt\_name) - eol\_ids: id INT PRIMARY KEY, name TEXT - images: eol\_id INT PRIMARY KEY, source\_url TEXT, license TEXT, copyright\_owner TEXT - descs: name TEXT PRIMARY KEY, desc TEXT, redirected INT - r\_nodes: name TEXT PRIMARY KEY, tips INT - r\_edges: node TEXT, child TEXT, p\_support INT, PRIMARY KEY (node, child) Other Files =========== - dbpPickedLabels.txt
Contains DBpedia labels, one per line. Used by genDbpData.py to help resolve conflicts when associating tree-of-life node names with DBpedia node labels. Was generated by manually editing the output of genDbpConflicts.py. - genDbpConflicts.py
Reads data from dbpedia/dbpData.db, and the 'nodes' table of data.db, and looks for potential conflicts that would arise when genDbpData.db tries to associate tree-of-life node names wth DBpedia node labels. It writes data about them to conflicts.txt, which can be manually edited to resolve them.