diff options
| author | Terry Truong <terry06890@gmail.com> | 2022-10-02 21:15:53 +1100 |
|---|---|---|
| committer | Terry Truong <terry06890@gmail.com> | 2022-10-02 21:15:53 +1100 |
| commit | 3e256d2fd048997370b2c043ea293ea9a3e2430c (patch) | |
| tree | 14ea5dc358720ce9adedaaae3240e0b3d8f18793 /backend/hist_data/README.md | |
| parent | 149dc178c491d8e447a05ff3705fdc6ceddf129e (diff) | |
Add gen_imgs.py
Add package.json, for using npm package smartcrop-cli
Add unit test
Diffstat (limited to 'backend/hist_data/README.md')
| -rw-r--r-- | backend/hist_data/README.md | 11 |
1 files changed, 9 insertions, 2 deletions
diff --git a/backend/hist_data/README.md b/backend/hist_data/README.md index 5b64462..c5cf66f 100644 --- a/backend/hist_data/README.md +++ b/backend/hist_data/README.md @@ -21,6 +21,12 @@ This directory holds files used to generate the history database data.db. - `pop`: <br> Format: `id INT PRIMARY KEY, pop INT` <br> Associates each event with a popularity measure (currently an average monthly viewcount) +- `images`: <br> + Format: `id INT PRIMARY KEY, url TEXT, license TEXT, artist TEXT, credit TEXT` <br> + Holds metadata for available images +- `event_imgs`: <br> + Format: `id INT PRIMARY KEY, img_id INT` <br> + Assocates events with images # Generating the Database @@ -46,10 +52,11 @@ Some of the scripts use third-party packages: 1. In enwiki/, run `download_img_license_info.py`, which downloads licensing info for found images, and adds them to the image database. 1. In enwiki/, run `download_imgs.py`, which downloads images into enwiki/imgs/. -1. Run +1. Run `gen_imgs.py`, which creates resized/cropped images in img/, from images in enwiki/imgs/. + Adds the `imgs` and `event_imgs` tables. ## Generate Description Data 1. Obtain an enwiki dump in enwiki/, as specified in the README. 1. In enwiki/, run `gen_dump_index.db.py`, which generates a database for indexing the dump. 1. In enwiki/, run `gen_desc_data.py`, which extracts page descriptions into a database. -1. Run +1. Run |
