From 0a9b2c2e5eca8a04e37fbdd423379882863237c2 Mon Sep 17 00:00:00 2001 From: Terry Truong Date: Sat, 21 Jan 2023 12:21:03 +1100 Subject: Adjust backend coding style Increase line spacing, add section comments, etc --- backend/hist_data/README.md | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) (limited to 'backend/hist_data/README.md') diff --git a/backend/hist_data/README.md b/backend/hist_data/README.md index 9fe2d0e..09a71fc 100644 --- a/backend/hist_data/README.md +++ b/backend/hist_data/README.md @@ -9,37 +9,38 @@ This directory holds files used to generate the history database data.db. - `start*` and `end*` specify start and end dates. `start_upper`, `end`, and `end_upper`, are optional. If `start_upper` is present, it and `start` denote an uncertain range of start times. - Similarly for 'end' and 'end_upper'. + Similarly for `end` and `end_upper`. - `fmt` indicates format info for `start`, `start_upper`, `end`, and `end_upper`. - If 0, they denote a number of years AD (if positive) or BC (if negative). - If 1, they denote a Julian date number. This allows simple comparison of events with day-level precision, but only goes back to 4713 BC. - If 2, same as 1, but with a preference for display using the Julian calendar, not the Gregorian calendar. For example, William Shakespeare's birth appears 'preferably Julian', but Samuel Johnson's does not. - - If 3, same as 2, but where 'start' and 'start_upper' are 'preferably Julian'. + - If 3, same as 2, but where only `start` and `start_upper` are 'preferably Julian'. For example, Galileo Galilei's birth date appears 'preferably Julian', but his death date does not. - `pop`:
Format: `id INT PRIMARY KEY, pop INT`
- Associates each event with a popularity measure (currently an average monthly viewcount) + Associates each event with a popularity measure (currently an average monthly viewcount). - `dist`:
Format: `scale INT, unit INT, count INT, PRIMARY KEY (scale, unit)`
- Maps scale units to counts of events in them. + For each scale, maps its units to event counts. + For example, on the monthly scale, the unit for Jan 2010 might have 10 events. - `event_disp`:
Format: `id INT, scale INT, unit INT, PRIMARY KEY (id, scale)`
Maps events to scales+units they are 'displayable' on (used to make displayed events more uniform across time). -- `img_dist`:
- Like `dist`, but only counts events with images. -- `img_disp`:
- Like `events_disp`, but only counts events with images. - `images`:
Format: `id INT PRIMARY KEY, url TEXT, license TEXT, artist TEXT, credit TEXT`
- Holds metadata for available images + Holds metadata for available images. - `event_imgs`:
Format: `id INT PRIMARY KEY, img_id INT`
- Assocates events with images + Assocates events with images. - `descs`:
Format: `id INT PRIMARY KEY, wiki_id INT, desc TEXT`
Associates an event's enwiki title with a short description. +- `img_dist`:
+ Like `dist`, but only counts events with images. +- `img_disp`:
+ Like `events_disp`, but only counts events with images. # Generating the Database @@ -66,12 +67,12 @@ Some of the scripts use third-party packages: looks for infobox image names, and stores them in an image database. 1. In enwiki/, run `download_img_license_info.py`, which downloads licensing info for found images, and adds them to the image database. You should probably first change the USER_AGENT - script variable to identify yourself to the online API (this is expected - [best practice](https://www.mediawiki.org/wiki/API:Etiquette)). + script variable to identify yourself to the online API (this is + [expected best practice](https://www.mediawiki.org/wiki/API:Etiquette)). 1. In enwiki/, run `download_imgs.py`, which downloads images into enwiki/imgs/. Setting the USER_AGENT variable applies here as well.
In some rare cases, the download won't produce an image file, but a text file containing - 'File not found: ...'. These can simply be deleted. + 'File not found: ...'. These can be deleted. 1. Run `gen_imgs.py`, which creates resized/cropped images in img/, from images in enwiki/imgs/. Adds the `imgs` and `event_imgs` tables.
The output images might need additional manual changes: -- cgit v1.2.3