DM4T PROJECT
Converting CSV Tables into RDF Triples with Grafter
PLEASE NOTE: This article is a work in progress!
Context: The DM4T Project
TEDDINET is a multi-disciplinary research project to monitor the energy usage of homes and appliances, and work out ways to reduce it. From 2013 up until 2016, millions of sensor readings have been taken from hundreds of homes, recording data such as power usage, temperature, humidity, and sound levels. Now that all the data have been gathered, we have to figure out how to process it to make it easy to query and visualise.
DM4T is a sub-project of TEDDINET with the aim to manage these data
The CSV Problem
Tools to Solve the Problem
Generating RDF from CSV
Grafter: Industrial-strength RDF Production
What is Grafter?
Table Conversions
Graph Generation
Pipeline Generation
Datasets Used
Choosing an Ontology
Example Triples
Setting up a SPARQL Endpoint
The process of converting our TSV files into n-triple (nt) format is ballooning their size by a factor of 10. Take, for example, the light readings dataset:
- Original TSV size: 890MB
- Converted N-Triples size: 19GB
As well as taking up all this space, it is impractical to load these triples into memory for use in a SPARQL endpoint.