From c97acf8852e2017fd4776d65069f707121405f43 Mon Sep 17 00:00:00 2001 From: Terry Truong Date: Sat, 14 May 2022 19:30:43 +1000 Subject: Use DBpedia data for node descriptions Add backend/data/dbpedia/ directory containing scripts and README for obtaining DBpedia data, storing it into a db, converting/adding description data to data.db, and for resolving tol-node DBpedia-node association conflicts (via DBpedia relations, manual listing, etc). Resulted in less (about 3/4 as many) descriptions as with using enwiki, but with notably less mis-associations (eg: node Thor is described as a shrimp instead of a god). --- backend/data/dbpedia/README.md | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) create mode 100644 backend/data/dbpedia/README.md (limited to 'backend/data/dbpedia/README.md') diff --git a/backend/data/dbpedia/README.md b/backend/data/dbpedia/README.md new file mode 100644 index 0000000..0e7c266 --- /dev/null +++ b/backend/data/dbpedia/README.md @@ -0,0 +1,25 @@ +Downloaded Files +================ +- labels\_lang=en.ttl.bz2
+ Obtained via https://databus.dbpedia.org/dbpedia/collections/latest-core, + using the link . +- redirects\_lang=en\_transitive.ttl.bz2
+ Downloaded from . +- disambiguations\_lang=en.ttl.bz2
+ Downloaded from . +- instance-types\_lang=en\_specific.ttl.bz2
+ Downloaded from . +- short-abstracts\_lang=en.ttl.bz2
+ Downloaded from . + +Generated Files +=============== +- dbpData.db
+ An sqlite database representing data from the ttl files. + Generated by running genData.py. + Tables + - labels: iri TEXT PRIMARY KEY, label TEXT + - redirects: iri TEXT PRIMARY KEY, target TEXT + - disambiguations: iri TEXT PRIMARY KEY + - types: iri TEXT, type TEXT + - abstracts: iri TEXT PRIMARY KEY, abstract TEXT -- cgit v1.2.3