diff options
| author | Terry Truong <terry06890@gmail.com> | 2023-01-21 12:21:03 +1100 |
|---|---|---|
| committer | Terry Truong <terry06890@gmail.com> | 2023-01-21 12:32:01 +1100 |
| commit | 0a9b2c2e5eca8a04e37fbdd423379882863237c2 (patch) | |
| tree | 1812bdb6bb13e4f76fdd7ef04075b291f775c213 /backend/hist_data/README.md | |
| parent | 8321e2f92dbc073b8f1de87895d6620a2021b22e (diff) | |
Adjust backend coding style
Increase line spacing, add section comments, etc
Diffstat (limited to 'backend/hist_data/README.md')
| -rw-r--r-- | backend/hist_data/README.md | 27 |
1 files changed, 14 insertions, 13 deletions
diff --git a/backend/hist_data/README.md b/backend/hist_data/README.md index 9fe2d0e..09a71fc 100644 --- a/backend/hist_data/README.md +++ b/backend/hist_data/README.md @@ -9,37 +9,38 @@ This directory holds files used to generate the history database data.db. - `start*` and `end*` specify start and end dates. `start_upper`, `end`, and `end_upper`, are optional. If `start_upper` is present, it and `start` denote an uncertain range of start times. - Similarly for 'end' and 'end_upper'. + Similarly for `end` and `end_upper`. - `fmt` indicates format info for `start`, `start_upper`, `end`, and `end_upper`. - If 0, they denote a number of years AD (if positive) or BC (if negative). - If 1, they denote a Julian date number. This allows simple comparison of events with day-level precision, but only goes back to 4713 BC. - If 2, same as 1, but with a preference for display using the Julian calendar, not the Gregorian calendar. For example, William Shakespeare's birth appears 'preferably Julian', but Samuel Johnson's does not. - - If 3, same as 2, but where 'start' and 'start_upper' are 'preferably Julian'. + - If 3, same as 2, but where only `start` and `start_upper` are 'preferably Julian'. For example, Galileo Galilei's birth date appears 'preferably Julian', but his death date does not. - `pop`: <br> Format: `id INT PRIMARY KEY, pop INT` <br> - Associates each event with a popularity measure (currently an average monthly viewcount) + Associates each event with a popularity measure (currently an average monthly viewcount). - `dist`: <br> Format: `scale INT, unit INT, count INT, PRIMARY KEY (scale, unit)` <br> - Maps scale units to counts of events in them. + For each scale, maps its units to event counts. + For example, on the monthly scale, the unit for Jan 2010 might have 10 events. - `event_disp`: <br> Format: `id INT, scale INT, unit INT, PRIMARY KEY (id, scale)` <br> Maps events to scales+units they are 'displayable' on (used to make displayed events more uniform across time). -- `img_dist`: <br> - Like `dist`, but only counts events with images. -- `img_disp`: <br> - Like `events_disp`, but only counts events with images. - `images`: <br> Format: `id INT PRIMARY KEY, url TEXT, license TEXT, artist TEXT, credit TEXT` <br> - Holds metadata for available images + Holds metadata for available images. - `event_imgs`: <br> Format: `id INT PRIMARY KEY, img_id INT` <br> - Assocates events with images + Assocates events with images. - `descs`: <br> Format: `id INT PRIMARY KEY, wiki_id INT, desc TEXT` <br> Associates an event's enwiki title with a short description. +- `img_dist`: <br> + Like `dist`, but only counts events with images. +- `img_disp`: <br> + Like `events_disp`, but only counts events with images. # Generating the Database @@ -66,12 +67,12 @@ Some of the scripts use third-party packages: looks for infobox image names, and stores them in an image database. 1. In enwiki/, run `download_img_license_info.py`, which downloads licensing info for found images, and adds them to the image database. You should probably first change the USER_AGENT - script variable to identify yourself to the online API (this is expected - [best practice](https://www.mediawiki.org/wiki/API:Etiquette)). + script variable to identify yourself to the online API (this is + [expected best practice](https://www.mediawiki.org/wiki/API:Etiquette)). 1. In enwiki/, run `download_imgs.py`, which downloads images into enwiki/imgs/. Setting the USER_AGENT variable applies here as well. <br> In some rare cases, the download won't produce an image file, but a text file containing - 'File not found: ...'. These can simply be deleted. + 'File not found: ...'. These can be deleted. 1. Run `gen_imgs.py`, which creates resized/cropped images in img/, from images in enwiki/imgs/. Adds the `imgs` and `event_imgs` tables. <br> The output images might need additional manual changes: |
