7

6

How we can implement version control of semantica data? On level of data design and SPARQL queries for RDF storage like Sesame. In a wiki way for example. What info do you have on this subject?

flag

5 Answers

5

Here's a few resources that may be of interest to you:

link|flag
0

The DatasetDynamics page at the ESW Wiki has an overview of some of the ways to describe changing RDF data.

link|flag
0

Here's some info I've found:

link|flag
0

There are a number of approaches one could take to versioning Semantic Data. The simplest is to use standard versioning software (e.g., SVN or CVS) with RDF or N3 or Turtle files. This approach has its drawbacks, but it also has the virtue that the work process is well-understood.

The nature of RDF (in which resources have global identifiers as URIs) makes it possible to manage versions in a much more sophisticated way - for instance, querying for changes between one version and the next. The possibilities here go beyond what one could write up in a short reply.

TopQuadrant supports both of these approaches. TopBraid Composer integrates smoothly with SVN and CVS. The Enterprise Vocabulary Management Solution provides facilities for querying change histories and comparing versions of RDF (or OWL, or SKOS) data.

link|flag
0

I personally quite like the pragmatic approach which Freebase has adopted.

Browse and edit views for humans:

The data model exposed here:

Stricly speaking, it's not RDF (it's probably a superset of it), but part of it can be exposed as RDF:

Since it's a community driven website, not only they need to track who said what, when... but they are probably keeping the history as well (never delete anything):

To conclude, the way I would tackle your problem is very similar (and pragmatic). AFAIK, you will not find a solution which works out-of-the-box. But, you could use a "tuple" store (3 or 4 aren't enough to keep history at the finest granularity (i.e. triples|quads)).

I would use TDB code as a library (since it gives you B+Trees and a lot of useful things you need) and I would use a data model which allows me to: count quads, assign an owner to a quad, a timestamp and previous/next quad(s) if available:

[ id | g | s | p | o | user | timestamp | prev | next ]

Where:

   id - long (unique identifier, same (g,s,p,o) will have different id... 
        a lot of space, but you can count quads... and when you have a 
        community driven website (like this one) counting things it's 
        important.
    g - URI (or blank node?|absent (i.e. default graph))
    s - URI|blank node
    p - URI
    o - URI|blank node|literal
 user - URI

timestamp - when the quad was created prev - id of the previous quad (if present) next - id of the next quad (if present)

Then, you need to think about which indexes you need and this would depend on the way you want to expose and access your data.

You do not need to expose all your internal structures/indexes to external users/people/applications. And, when (and if), RDF vocabularies or ontologies for representing versioning, etc. will emerge, you are able to quickly expose your data using them (if you want to).

Be warned, this is not common practice and if you look at it with your "semantic web glasses" it's probably wrong, bad, etc. But, I am sharing the idea, since I believe it's not harmful, it allows to provide a solution to your question (it will be slower and use more space than a quad store), part of it can be exposed to the semantic web as RDF / LinkedData.

My 2 (heretic) cents.

link|flag

Your Answer

Get an OpenID
or

Not the answer you're looking for? Browse other questions tagged or ask your own question.