3

I was reading Tom Scott's post about Linking Things the other week, and thought it would be interesting to get an Overflow perspective on the idea of using RDF in a couple ways.

Firstly, should we be concentrating on RDF for non-information resources (i.e. things, not documents about things) only, as per Tom's piece? Is RDF for documents really boring, and even if it is, is it necessary to concentrate on?

Some people will tell you that the whole non-information resource thing isn’t necessary – we have a web of documents and we just don’t need to worry about URIs for non-information resources; others will claim that everything is a thing and so every URL is, in effect, a non-information resource.

Michael, however, recently made a very good point (as usual): all the interesting assertions are about real world things not documents. The only metadata, the only assertions people talk about when it comes to documents are relatively boring: author, publication date, copyright details etc.

If this is the case then perhaps we should focus on using RDF to describe real world things, and not the documents about those things.

Secondly, what do people think about the whole notion of RDF within language resources (i.e. content, like blog posts or articles)? Would metadata for these raise the barrier too high for useful implementation of RDF tools, or would it be a useful thing to be building?

I'm very much interested in the subtlety of this notion, because it seems important to balance keeping RDF simple enough for barriers to be low, while also being useful for a wide variety of applications.

flag

4 Answers

4

I see describing web resources (web pages, images, downloadable files and so on) as the very first problem that RDF must be able to solve. It's not the most exciting problem, but it's the simple and fundamental one. All the stuff about non-web resources should be layered on top of a solid foundation for describing web resources. So I disagree with Tom on this specific point.

I'm not sure what you mean by your second question, are you talking about expressing a part of the semantics of the article's text in RDF? Like the stuff that OpenCalais and similar systems are extracting from the text?

link|flag
Sorry for not being clearer about the second point. What I'm asking is whether focusing on RDF for non-structured resources, which are meant for people anyway is a useful activity at this stage. Not so much basic metadata (i.e. dc:author), but RDFising the content of blog posts, articles and suchlike themselves. (see the comments on Tom's piece for further context) – Zach Beauvais Jul 12 at 16:31
0

As we discussed this issue also recently at #swig, I came afterwards with the this proposal for a concrete domain. That means, we need access to both kinds - document based information and direct access (in form of a specific serialisation).

link|flag
1

For a library and information scientists it is funny to see that computer science frequently exemplifies data by bibliographic data and metadata by data about documents. At first glance it is all quite familiar and simple (author, a title, date...) so it seems to be a good starting point. On closer inspection everythings falls apart: how about multiple authors, editors, publishers, and translators? What if authors are unknown or known under different names? What about subtitles and abbreviations? Which date do you specify in which detail and when does a document change?

Library science has elaborated detailed cataloging rules to answer this questions at least for documents bound to a physical representation. The Meta Content Framework (precursor of RDF), Dublin Core, and RDF have all been created first to capture data about digital documents. I think the shift to non-information resources is misleading. Expressing data about things has already been solved by traditional database research - just create a data model based on your applications needs and implement it. But if you want to express data about things independent from a particular application, you first have to document these things. You may think talking about the things, but essentially all you have are documents. These documents do often not have simple author-title-date structure. But as said before: on closer look documents are not so simple anyway.

link|flag
4

Document metadata doesn't have to be so boring. If you know the authors, creation dates and topics of lots of documents, you can analyse the topics people have been interested in (interested enough to write about anyway), and how that's changed over time. Analysis could reveal trends like "people who used to be interested in Foo in 2005 seem to be writing a lot about Bar today".

Knowing which documents link to which other documents might be able to give you an idea of the authoritativeness of the document authors, and the connectedness of the real world things that are the primnary topics of those documents.

link|flag

Your Answer

Get an OpenID
or

Not the answer you're looking for? Browse other questions tagged or ask your own question.