4

When dealing with XML, a common method to transform data between different schemas is to use XSLT. Is there a similar standard to transform RDF, or how would you go about handling heterogenous RDF data?

Example: Say that I have defined a RDF Schema for people that I use internally in my application and I want to be able to import/export data using the foaf vocabulary.

flag

4 Answers

3

First part of the answer is: don't do that in the first place, if you can avoid it. If your domain model needs a person representation, or any other kind of entity, consider re-using existing vocabulary. If foaf:Person is close to what you need, you would ideally use FOAF unless there's a good reason for your application why that doesn't work.

If you can't simply re-use existing vocabularies, the equivalent to XSLT transformations would be re-write rules. Jena has a generic rules engine, Pellet supports SWRL rules, and there are other rule engines. Parenthetically, the W3C has just moved RIF to recommendation status - though there are few implementations available as yet.

link|flag
7

The CONSTRUCT feature of SPARQL can be quite handy for transforming from one schema to another, and in my opinion it's easier than using SWRL or RIF rules. Especially with the new features in SPARQL 1.1, CONSTRUCT becomes quite powerful for transforming from one vocabulary to another. At this point in time, SPARQL 1.1 features are readily available in Jena's ARQ query engine, and in OpenLink Virtuoso. More SPARQL stores will certainly follow in the next months.

link|flag
1 
Many SPARQL 1.1 features are also available in the Perl-based RDF::Query library. Anzo's Glitter SPARQL engine also implements many 1.1 features. – lee Jul 5 at 1:56
4

SPARQL CONSTRUCT queries can be used as an equivalent to XSLT stylesheets for transforming RDF graphs. CONSTRUCT queries can be used to both add and remove structure from a graph, e.g. to map one modelling pattern onto another, as well as doing simple transformations of predicates.

Arguably, CONSTRUCT queries are the most portable solution for RDF transformations currently available. Rules engines and/or reasoners can be used to carry out similar manipulations but not all stores or endpoints support these extra technologies. For most current use cases, and particularly when massaging externally sourced data or schema migrations within an application, CONSTRUCT queries should be sufficient.

The main limitation of CONSTRUCT queries in SPARQL 1.0 is that you don't have the ability to generate new values, e.g. by manipulating strings. SPARQL 1.1 will add additional vocabulary, such as nested queries, more functions, and possibly a LET keyword that will add greater flexibility.

The downside to using CONSTRUCT over an alternative rules/reasoning approach is that you will typically have to store the data back in the store. This can make it difficult to unpick or reapply transformations at a later date. The advantage of the alternative approach is that you can apply the transformation at the point of extracting or querying the data. This allows more flexibility at the cost of potentially extra performance overheads.

Its worth noting that neither the rules/reasoning or query based approaches are set up for large batch-based transformation tasks. E.g. no option to suspend/resume a large scale transformation. There are options here to explore alternative approaches. E.g. SELECTing out a set of resources to process, then CONSTRUCTing the transformed data. This would allow check-pointing & resumption of processing.

link|flag
1

An approach you should consider, which is a W3C standard by the way, and I believe has good tool support, is using OWL itself - or rather its standard inferences. Note, however, that this is somewhat limited in expressivity in comparison with languages which let you express custom inference rules (e.g. RIF) already mentioned.

The owl:equivalentClass predicate prescribe the following inference:

If A owl:equivalentClass B, then
    A rdfs:subClassOf B and
    B rdfs:subClassOf A
which, in turn, means that
    for all ?x rdf:type A => ?x rdf:type B and
    for all ?y rdf:type B => ?y rdf:type A

This effectivelly maps between two classes. When those classes each come from a different model, you are in effect issuing a mapping between those models when asserting the class equivalence.

Another useful pattern in this case is the Relationship Transfer Pattern, e.g.:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

ex:RedThing rdfs:subClassOf owl:Thing
ex:RedThing owl:equivalentClass [ a owl:Restriction;
    owl:onProperty ex:hasColor;
    owl:hasValue "red"@en . ] .

This means that, for any resource ?x for which you define the triple

?x ex:hasColor "red"@en

the triple will be inferred:

?x a ex:RedThing

The usage of OWL for class equivalence and the Relationship Transfer Pattern is well explained in Hendler and Allemang's Semantic Web for the Working Ontlogist, which I'd recommend reading.

Back to your example model. If you already have an owl:Class defined for people, you could just assert:

ex:Person owl:equivalentClass foaf:Person

If you do have an owl:Class, but it's more specific than the meaning of foaf:Person (say ex:Employee), you can just do:

ex:Employeee rdfs:subClassOf foaf:Person

Now suppose you don't have an owl:Class defined for people (pretty bad modeling practice), but instead have it defined by a property, e.g.:

ex:joe a ex:Record ;
    ex:relationshipWithFirm "Employee"@en ;
    ex:hasEmail <mailto:joe@example.com> .

and want to infer proper foaf triples from that, you could do:

ex:Employee a owl:Class;
    rdfs:subClassOf foaf:Person;
    owl:equivalentClass [ a owl:Restriction;
        owl:onProperty ex:relationshipWithFirm;
        owl:hasValue "Employee"@en ] .

ex:hasEmail owl:equivalentProperty foaf:mbox .

That way, for every triple ?x ex:relationshipWithFirm "Employee"@en exists in your source model, ?x will inferred to be a ex:Employee and also a foaf:Person. Additionally, since we have also asserted equivalence between the properties ex:hasEmail and foaf:mbox, that property of foaf has also been inferred for that resource.

Edited: I suggest you see also this answer and the UMBEL project for more info on mapping ontologies.

link|flag

Your Answer

Get an OpenID
or

Not the answer you're looking for? Browse other questions tagged or ask your own question.