Clean up some docs and naming inconsistencies

author: Terry Truong <terry06890@gmail.com> 2023-01-23 18:00:43 +1100
committer: Terry Truong <terry06890@gmail.com> 2023-01-23 18:01:13 +1100
commit: 94a8ad9b067e5a2c442ce47ce72d1a53eb444160 (patch)
tree: 2056373ee56b8b2f8269ac3e94d40f8f0e6eec0d /backend/tol_data/enwiki/README.md
parent: 796c4e5660b1006575b8f2af9d99e2ce592c767a (diff)
1 files changed, 11 insertions, 11 deletions
diff --git a/backend/tol_data/enwiki/README.md b/backend/tol_data/enwiki/README.md
index ba1de33..6f27d7f 100644
--- a/backend/tol_data/enwiki/README.md
+++ b/backend/tol_data/enwiki/README.md
@@ -14,12 +14,12 @@ This directory holds files obtained/derived from [English Wikipedia](https://en.
 # Dump-Index Files
 -   `gen_dump_index_db.py` <br>
     Creates a database version of the enwiki-dump index file.
--   `dumpIndex.db` <br>
+-   `dump_index.db` <br>
     Generated by `gen_dump_index_db.py`. <br>
     Tables: <br>
     -   `offsets`: `title TEXT PRIMARY KEY, id INT UNIQUE, offset INT, next_offset INT`
 
-# Description Database Files
+# Description Files
 -   `gen_desc_data.py` <br>
     Reads through pages in the dump file, and adds short-description info to a database.
 -   `desc_data.db` <br>
@@ -29,20 +29,20 @@ This directory holds files obtained/derived from [English Wikipedia](https://en.
     -   `redirects`: `id INT PRIMARY KEY, target TEXT`
     -   `descs`:     `id INT PRIMARY KEY, desc TEXT`
 
-# Image Database Files
+# Image Files
 -   `gen_img_data.py` <br>
-    Used to find infobox image names for page IDs, storing them into a database.
--   `downloadImgLicenseInfo.py` <br>
-    Used to download licensing metadata for image names, via wikipedia's online API, storing them into a database.
+    Used to find infobox image names for page IDs, and store them into a database.
+-   `download_img_license_info.py` <br>
+    Used to download licensing metadata for image names, via wikipedia's online API, and store them into a database.
 -   `img_data.db` <br>
-    Used to hold metadata about infobox images for a set of pageIDs.
+    Used to hold metadata about infobox images for a set of page IDs.
     Generated using `get_enwiki_img_data.py` and `download_img_license_info.py`. <br>
     Tables: <br>
     -   `page_imgs`: `page_id INT PRIMAY KEY, img_name TEXT` <br>
-        `img_name` may be null, which means 'none found', and is used to avoid re-processing page-ids.
+        `img_name` may be null, which means 'none found', and is used to avoid re-processing page IDs.
     -   `imgs`: `name TEXT PRIMARY KEY, license TEXT, artist TEXT, credit TEXT, restrictions TEXT, url TEXT` <br>
         Might lack some matches for `img_name` in `page_imgs`, due to licensing info unavailability.
--   `downloadImgs.py` <br>
+-   `download_imgs.py` <br>
     Used to download image files into imgs/.
 
 # Page View Files
@@ -51,7 +51,7 @@ This directory holds files obtained/derived from [English Wikipedia](https://en.
     Obtained via <https://dumps.wikimedia.org/other/pageview_complete/monthly/>.
     Some format info was available from <https://dumps.wikimedia.org/other/pageview_complete/readme.html>.
 -   `gen_pageview_data.py` <br>
-    Reads pageview/*, and creates a database holding average monthly pageview counts.
+    Reads pageview/* and `dump_index.db`, and creates a database holding average monthly pageview counts.
 -   `pageview_data.db` <br>
     Generated using `gen_pageview_data.py`. <br>
     Tables: <br>
@@ -60,4 +60,4 @@ This directory holds files obtained/derived from [English Wikipedia](https://en.
 # Other Files
 -   `lookup_page.py` <br>
     Running `lookup_page.py title1` looks in the dump for a page with a given title,
-    and prints the contents to stdout. Uses dumpIndex.db.
+    and prints the contents to stdout. Uses dump_index.db.
author	Terry Truong <terry06890@gmail.com>	2023-01-23 18:00:43 +1100
committer	Terry Truong <terry06890@gmail.com>	2023-01-23 18:01:13 +1100
commit	94a8ad9b067e5a2c442ce47ce72d1a53eb444160 (patch)
tree	2056373ee56b8b2f8269ac3e94d40f8f0e6eec0d /backend/tol_data/enwiki/README.md
parent	796c4e5660b1006575b8f2af9d99e2ce592c767a (diff)