aboutsummaryrefslogtreecommitdiff
path: root/backend/tol_data/eol/README.md
blob: 580310dd393e7f7c7f40516234c362b087e53518 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
This directory holds files obtained via the [Encyclopedia of Life](https://eol.org/).

# Mapping Files
-   `provider_ids.csv.gz` <br>
    Obtained from <https://opendata.eol.org/dataset/identifier-map> on 22/08/22 (says last updated 27/07/22).
    Associates EOL IDs with taxon IDs from sources like NCBI and Index Fungorium.

# Name Data Files
-   `vernacularNames.csv` <br>
    Obtained from <https://opendata.eol.org/dataset/vernacular-names> on 24/04/2022 (last updated on 27/10/2020).
    Contains alternative-node-names data from EOL.

# Image Metadata Files
-   `imagesList.tgz` <br>
    Obtained from <https://opendata.eol.org/dataset/images-list> on 24/04/2022 (last updated on 05/02/2020).
    Contains metadata for images from EOL.
-   `imagesList/` <br>
    Extracted from imagesList.tgz.
-   `gen_images_list_db.py` <br>
    Creates a database, and imports imagesList/*.csv files into it.
-   `images_list.db` <br>
    Created by running genImagesListDb.py <br>
    Tables: <br>
    -   `images`:
        `content_id INT PRIMARY KEY, page_id INT, source_url TEXT, copy_url TEXT, license TEXT, copyright_owner TEXT`

# Image Generation Files
-   `download_imgs.py` <br>
    Used to download image files into imgs_for_review/.
-   `review_imgs.py` <br>
    Used to review images in imgs_for_review/, moving acceptable ones into imgs/.