From 3e256d2fd048997370b2c043ea293ea9a3e2430c Mon Sep 17 00:00:00 2001 From: Terry Truong Date: Sun, 2 Oct 2022 21:15:53 +1100 Subject: Add gen_imgs.py Add package.json, for using npm package smartcrop-cli Add unit test --- backend/hist_data/README.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'backend/hist_data/README.md') diff --git a/backend/hist_data/README.md b/backend/hist_data/README.md index 5b64462..c5cf66f 100644 --- a/backend/hist_data/README.md +++ b/backend/hist_data/README.md @@ -21,6 +21,12 @@ This directory holds files used to generate the history database data.db. - `pop`:
Format: `id INT PRIMARY KEY, pop INT`
Associates each event with a popularity measure (currently an average monthly viewcount) +- `images`:
+ Format: `id INT PRIMARY KEY, url TEXT, license TEXT, artist TEXT, credit TEXT`
+ Holds metadata for available images +- `event_imgs`:
+ Format: `id INT PRIMARY KEY, img_id INT`
+ Assocates events with images # Generating the Database @@ -46,10 +52,11 @@ Some of the scripts use third-party packages: 1. In enwiki/, run `download_img_license_info.py`, which downloads licensing info for found images, and adds them to the image database. 1. In enwiki/, run `download_imgs.py`, which downloads images into enwiki/imgs/. -1. Run +1. Run `gen_imgs.py`, which creates resized/cropped images in img/, from images in enwiki/imgs/. + Adds the `imgs` and `event_imgs` tables. ## Generate Description Data 1. Obtain an enwiki dump in enwiki/, as specified in the README. 1. In enwiki/, run `gen_dump_index.db.py`, which generates a database for indexing the dump. 1. In enwiki/, run `gen_desc_data.py`, which extracts page descriptions into a database. -1. Run +1. Run -- cgit v1.2.3