diff options
| author | Terry Truong <terry06890@gmail.com> | 2022-06-11 02:05:08 +1000 |
|---|---|---|
| committer | Terry Truong <terry06890@gmail.com> | 2022-06-11 02:05:08 +1000 |
| commit | 6cb2cccad2fae70ce7e857e3aab232a6e7eeb358 (patch) | |
| tree | 9a4ca11359abf79c8daa3c74bad83228fabaa3b4 /backend/data/README.md | |
| parent | 5220d744dc3f7b2629d0ad8bd4bb4634d21e0d96 (diff) | |
Add yet more manual-correction for node-desc generation
Diffstat (limited to 'backend/data/README.md')
| -rw-r--r-- | backend/data/README.md | 6 |
1 files changed, 3 insertions, 3 deletions
diff --git a/backend/data/README.md b/backend/data/README.md index 6ec629a..0845450 100644 --- a/backend/data/README.md +++ b/backend/data/README.md @@ -25,12 +25,12 @@ File Generation Process 1 Obtain data in dbpedia/, as specified in it's README. 2 Run genDbpData.py, which adds a 'descs' table to data.db, using data in dbpedia/dbpData.db, the 'nodes' table, and possibly - dbpNamesToSkip.txt and dbpPickedLabels.txt. + genDescNamesToSkip.txt and dbpPickedLabels.txt. 5 Supplementary Name/Description/Image Data 1 Obtain data in enwiki/, as specified in it's README. 2 Run genEnwikiDescData.py, which adds to the 'descs' table, using data in - enwiki/enwikiData.db, and the 'nodes' table. Also uses genEnwikiDesc*.txt - files for skipping/resolving some name-page associations. + enwiki/enwikiData.db, and the 'nodes' table. Also uses genDescNamesToSkip.txt and + genEnwikiDescTitlesToUse.txt for skipping/resolving some name-page associations. 3 Optionally run genEnwikiNameData.py, which adds to the 'names' table, using data in enwiki/enwikiData.db, and the 'names' and 'descs' tables. 4 In enwiki/, run getEnwikiImgData.py, which generates a list of |
