diff options
| author | Terry Truong <terry06890@gmail.com> | 2022-05-14 19:30:43 +1000 |
|---|---|---|
| committer | Terry Truong <terry06890@gmail.com> | 2022-05-14 19:39:10 +1000 |
| commit | c97acf8852e2017fd4776d65069f707121405f43 (patch) | |
| tree | 1c0d725b6ae496239036b0f1d1c4a2caadf209cf /backend/data/dbpedia/README.md | |
| parent | 7003ef7f92f3a8fed059dab2b37c0e203c000dba (diff) | |
Use DBpedia data for node descriptions
Add backend/data/dbpedia/ directory containing scripts and README
for obtaining DBpedia data, storing it into a db, converting/adding
description data to data.db, and for resolving tol-node DBpedia-node
association conflicts (via DBpedia relations, manual listing, etc).
Resulted in less (about 3/4 as many) descriptions as with using enwiki,
but with notably less mis-associations (eg: node Thor is described as
a shrimp instead of a god).
Diffstat (limited to 'backend/data/dbpedia/README.md')
| -rw-r--r-- | backend/data/dbpedia/README.md | 25 |
1 files changed, 25 insertions, 0 deletions
diff --git a/backend/data/dbpedia/README.md b/backend/data/dbpedia/README.md new file mode 100644 index 0000000..0e7c266 --- /dev/null +++ b/backend/data/dbpedia/README.md @@ -0,0 +1,25 @@ +Downloaded Files +================ +- labels\_lang=en.ttl.bz2 <br> + Obtained via https://databus.dbpedia.org/dbpedia/collections/latest-core, + using the link <https://databus.dbpedia.org/dbpedia/generic/labels/2022.03.01/labels_lang=en.ttl.bz2>. +- redirects\_lang=en\_transitive.ttl.bz2 <br> + Downloaded from <https://databus.dbpedia.org/dbpedia/generic/redirects/2022.03.01/redirects_lang=en_transitive.ttl.bz2>. +- disambiguations\_lang=en.ttl.bz2 <br> + Downloaded from <https://databus.dbpedia.org/dbpedia/generic/disambiguations/2022.03.01/disambiguations_lang=en.ttl.bz2>. +- instance-types\_lang=en\_specific.ttl.bz2 <br> + Downloaded from <https://databus.dbpedia.org/dbpedia/mappings/instance-types/2022.03.01/instance-types_lang=en_specific.ttl.bz2>. +- short-abstracts\_lang=en.ttl.bz2 <br> + Downloaded from <https://databus.dbpedia.org/vehnem/text/short-abstracts/2021.05.01/short-abstracts_lang=en.ttl.bz2>. + +Generated Files +=============== +- dbpData.db <br> + An sqlite database representing data from the ttl files. + Generated by running genData.py. + Tables + - labels: iri TEXT PRIMARY KEY, label TEXT + - redirects: iri TEXT PRIMARY KEY, target TEXT + - disambiguations: iri TEXT PRIMARY KEY + - types: iri TEXT, type TEXT + - abstracts: iri TEXT PRIMARY KEY, abstract TEXT |
