From 6cb2cccad2fae70ce7e857e3aab232a6e7eeb358 Mon Sep 17 00:00:00 2001 From: Terry Truong Date: Sat, 11 Jun 2022 02:05:08 +1000 Subject: Add yet more manual-correction for node-desc generation --- backend/data/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'backend/data/README.md') diff --git a/backend/data/README.md b/backend/data/README.md index 6ec629a..0845450 100644 --- a/backend/data/README.md +++ b/backend/data/README.md @@ -25,12 +25,12 @@ File Generation Process 1 Obtain data in dbpedia/, as specified in it's README. 2 Run genDbpData.py, which adds a 'descs' table to data.db, using data in dbpedia/dbpData.db, the 'nodes' table, and possibly - dbpNamesToSkip.txt and dbpPickedLabels.txt. + genDescNamesToSkip.txt and dbpPickedLabels.txt. 5 Supplementary Name/Description/Image Data 1 Obtain data in enwiki/, as specified in it's README. 2 Run genEnwikiDescData.py, which adds to the 'descs' table, using data in - enwiki/enwikiData.db, and the 'nodes' table. Also uses genEnwikiDesc*.txt - files for skipping/resolving some name-page associations. + enwiki/enwikiData.db, and the 'nodes' table. Also uses genDescNamesToSkip.txt and + genEnwikiDescTitlesToUse.txt for skipping/resolving some name-page associations. 3 Optionally run genEnwikiNameData.py, which adds to the 'names' table, using data in enwiki/enwikiData.db, and the 'names' and 'descs' tables. 4 In enwiki/, run getEnwikiImgData.py, which generates a list of -- cgit v1.2.3