From 0e5e46cedaaeacf59cfd0f2e30c1ae6923466870 Mon Sep 17 00:00:00 2001 From: Terry Truong Date: Fri, 30 Dec 2022 23:28:09 +1100 Subject: Generate event_disp data before image-generation Make gen_disp_data.py delete non-displayable events Make reduce_event_data.py also delete from 'dist' and 'event_disp' Remove MAX_IMGS_PER_CTG from enwiki/gen_img_data.py Make gen_desc_data.py include events without images --- backend/hist_data/enwiki/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'backend/hist_data/enwiki/README.md') diff --git a/backend/hist_data/enwiki/README.md b/backend/hist_data/enwiki/README.md index 29fc2ff..262ebdb 100644 --- a/backend/hist_data/enwiki/README.md +++ b/backend/hist_data/enwiki/README.md @@ -38,7 +38,7 @@ This directory holds files obtained/derived from [English Wikipedia](https://en. Used to download licensing metadata for image names, via wikipedia's online API, and store them into a database. - `img_data.db`
Used to hold metadata about infobox images for a set of page IDs. - Generated using `get_enwiki_img_data.py` and `download_img_license_info.py`.
+ Generated using `gen_img_data.py` and `download_img_license_info.py`.
Tables:
- `page_imgs`: `page_id INT PRIMARY KEY, title TEXT UNIQUE, img_name TEXT`
`img_name` may be NULL, which means 'none found', and is used to avoid re-processing page IDs. -- cgit v1.2.3