aboutsummaryrefslogtreecommitdiff
path: root/backend/hist_data/README.md
diff options
context:
space:
mode:
authorTerry Truong <terry06890@gmail.com>2022-10-02 21:15:53 +1100
committerTerry Truong <terry06890@gmail.com>2022-10-02 21:15:53 +1100
commit3e256d2fd048997370b2c043ea293ea9a3e2430c (patch)
tree14ea5dc358720ce9adedaaae3240e0b3d8f18793 /backend/hist_data/README.md
parent149dc178c491d8e447a05ff3705fdc6ceddf129e (diff)
Add gen_imgs.py
Add package.json, for using npm package smartcrop-cli Add unit test
Diffstat (limited to 'backend/hist_data/README.md')
-rw-r--r--backend/hist_data/README.md11
1 files changed, 9 insertions, 2 deletions
diff --git a/backend/hist_data/README.md b/backend/hist_data/README.md
index 5b64462..c5cf66f 100644
--- a/backend/hist_data/README.md
+++ b/backend/hist_data/README.md
@@ -21,6 +21,12 @@ This directory holds files used to generate the history database data.db.
- `pop`: <br>
Format: `id INT PRIMARY KEY, pop INT` <br>
Associates each event with a popularity measure (currently an average monthly viewcount)
+- `images`: <br>
+ Format: `id INT PRIMARY KEY, url TEXT, license TEXT, artist TEXT, credit TEXT` <br>
+ Holds metadata for available images
+- `event_imgs`: <br>
+ Format: `id INT PRIMARY KEY, img_id INT` <br>
+ Assocates events with images
# Generating the Database
@@ -46,10 +52,11 @@ Some of the scripts use third-party packages:
1. In enwiki/, run `download_img_license_info.py`, which downloads licensing info for found
images, and adds them to the image database.
1. In enwiki/, run `download_imgs.py`, which downloads images into enwiki/imgs/.
-1. Run
+1. Run `gen_imgs.py`, which creates resized/cropped images in img/, from images in enwiki/imgs/.
+ Adds the `imgs` and `event_imgs` tables.
## Generate Description Data
1. Obtain an enwiki dump in enwiki/, as specified in the README.
1. In enwiki/, run `gen_dump_index.db.py`, which generates a database for indexing the dump.
1. In enwiki/, run `gen_desc_data.py`, which extracts page descriptions into a database.
-1. Run
+1. Run