From 1b4fc8667714ef4ce9f326bd14f795fc2417ecb9 Mon Sep 17 00:00:00 2001 From: Terry Truong Date: Sat, 1 Oct 2022 23:14:08 +1000 Subject: Add per-event-category image limit --- backend/hist_data/enwiki/README.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'backend/hist_data/enwiki/README.md') diff --git a/backend/hist_data/enwiki/README.md b/backend/hist_data/enwiki/README.md index e50c7e2..dd090ca 100644 --- a/backend/hist_data/enwiki/README.md +++ b/backend/hist_data/enwiki/README.md @@ -29,6 +29,18 @@ This directory holds files obtained/derived from [English Wikipedia](https://en. - `redirects`: `id INT PRIMARY KEY, target TEXT` - `descs`: `id INT PRIMARY KEY, desc TEXT` +# Page View Files +- `pageviews/pageviews-*-user.bz2` + Each holds wikimedia article page view data for some month. + Obtained via . + Some format info was available from . +- `gen_pageview_data.py`
+ Reads pageview/* and `dump_index.db`, and creates a database holding average monthly pageview counts. +- `pageview_data.db`
+ Generated using `gen_pageview_data.py`.
+ Tables:
+ - `views`: `title TEXT PRIMARY KEY, id INT UNIQUE, views INT` + # Image Files - `gen_img_data.py`
Used to find infobox image names for page IDs, and store them into a database. @@ -46,15 +58,3 @@ This directory holds files obtained/derived from [English Wikipedia](https://en. Might lack some matches for `img_name` in `page_imgs`, due to licensing info unavailability. - `download_imgs.py`
Used to download image files into imgs/. - -# Page View Files -- `pageviews/pageviews-*-user.bz2` - Each holds wikimedia article page view data for some month. - Obtained via . - Some format info was available from . -- `gen_pageview_data.py`
- Reads pageview/* and `dump_index.db`, and creates a database holding average monthly pageview counts. -- `pageview_data.db`
- Generated using `gen_pageview_data.py`.
- Tables:
- - `views`: `title TEXT PRIMARY KEY, id INT UNIQUE, views INT` -- cgit v1.2.3