aboutsummaryrefslogtreecommitdiff
path: root/backend/data/dbpedia/README.md
diff options
context:
space:
mode:
authorTerry Truong <terry06890@gmail.com>2022-05-14 19:30:43 +1000
committerTerry Truong <terry06890@gmail.com>2022-05-14 19:39:10 +1000
commitc97acf8852e2017fd4776d65069f707121405f43 (patch)
tree1c0d725b6ae496239036b0f1d1c4a2caadf209cf /backend/data/dbpedia/README.md
parent7003ef7f92f3a8fed059dab2b37c0e203c000dba (diff)
Use DBpedia data for node descriptions
Add backend/data/dbpedia/ directory containing scripts and README for obtaining DBpedia data, storing it into a db, converting/adding description data to data.db, and for resolving tol-node DBpedia-node association conflicts (via DBpedia relations, manual listing, etc). Resulted in less (about 3/4 as many) descriptions as with using enwiki, but with notably less mis-associations (eg: node Thor is described as a shrimp instead of a god).
Diffstat (limited to 'backend/data/dbpedia/README.md')
-rw-r--r--backend/data/dbpedia/README.md25
1 files changed, 25 insertions, 0 deletions
diff --git a/backend/data/dbpedia/README.md b/backend/data/dbpedia/README.md
new file mode 100644
index 0000000..0e7c266
--- /dev/null
+++ b/backend/data/dbpedia/README.md
@@ -0,0 +1,25 @@
+Downloaded Files
+================
+- labels\_lang=en.ttl.bz2 <br>
+ Obtained via https://databus.dbpedia.org/dbpedia/collections/latest-core,
+ using the link <https://databus.dbpedia.org/dbpedia/generic/labels/2022.03.01/labels_lang=en.ttl.bz2>.
+- redirects\_lang=en\_transitive.ttl.bz2 <br>
+ Downloaded from <https://databus.dbpedia.org/dbpedia/generic/redirects/2022.03.01/redirects_lang=en_transitive.ttl.bz2>.
+- disambiguations\_lang=en.ttl.bz2 <br>
+ Downloaded from <https://databus.dbpedia.org/dbpedia/generic/disambiguations/2022.03.01/disambiguations_lang=en.ttl.bz2>.
+- instance-types\_lang=en\_specific.ttl.bz2 <br>
+ Downloaded from <https://databus.dbpedia.org/dbpedia/mappings/instance-types/2022.03.01/instance-types_lang=en_specific.ttl.bz2>.
+- short-abstracts\_lang=en.ttl.bz2 <br>
+ Downloaded from <https://databus.dbpedia.org/vehnem/text/short-abstracts/2021.05.01/short-abstracts_lang=en.ttl.bz2>.
+
+Generated Files
+===============
+- dbpData.db <br>
+ An sqlite database representing data from the ttl files.
+ Generated by running genData.py.
+ Tables
+ - labels: iri TEXT PRIMARY KEY, label TEXT
+ - redirects: iri TEXT PRIMARY KEY, target TEXT
+ - disambiguations: iri TEXT PRIMARY KEY
+ - types: iri TEXT, type TEXT
+ - abstracts: iri TEXT PRIMARY KEY, abstract TEXT