1

I'm trying to compile a comparison of the major .NET RDF APIs.

The APIs I've gathered are: Semweb, Greengrass, OwlNetApi, NotNetRdf, Rowlex, intellidimension and linqtordf.

Among other criteria, like storage type, SPARQL capabilities, etc. I'm trying to classify them by performance.

What are you looking for when thinking about RDF api performance? Except for the BSBM?

flag

2 Answers

0

It's surely going to depend a lot on your application and what you want to do?

If your application is built on SPARQL you're going to want something with good BSBM performance but how good depends very much on the size of the dataset you want to work with. Also while BSBM serves as a good general indicator of performance actual performance is always going to depend on the type of queries you are making, if the majority of your queries are very simple then you'll get much faster performance than BSBM might suggest.
If you just want to do a lot of in-memory manipulation of Graphs you'll want something with good in-memory query performance (not necessarily SPARQL just basic lookups).
If you need to read/write a lot of RDF then the input/output speeds will be important to you.

Most RDF APIs do a lot of different things so there's no one benchmark that will be a definitive way of defining performance. Like most things you'll choose one you like and which has performance characteristics which are suitable for your scenario.

I assume by NotNetRDF you actually meant my API dotNetRDF? I'd like to point out that the current public release still has quite poor performance in some regards (esp SPARQL) but they'll be a new release at the end of the month which makes major improvements to this. And the library is still early Alpha stage development - I'm only on version 0.1.1 at the moment!

link|flag
1

While evaluating several triple store in our project I used the main use cases of the system we are building. I know its not applicable for each case but you can consider this approach.

We created several queries that with variable complexity, then created a test environment : imported test/real data into every triple store. Created a template test plan like execute query#1 with these parameters, execute query#15 with these parameters and so on. Its nice to make large number of query executions (we used 16 000). Query templates are used in random order with random existing parameters to create a test sequence of queries.
Parameters where chosen to be existing object from the graph.
While running the test make 2-3 warm up runs in order to fill caches and let the store to show its optimal performance.
Also you will need a test how many simultaneous clients the triple store can serve. So just create several clients running the test together. Changing the number of clients affects the performance so make sure you have a convenient way to adjust the number of clients.

We saved each query time, number or query results, total test run time and several derived metrics.

Plotting all the results in excel or in diagram makes it easy to grasp and allows management to have something to look at :)

link|flag

Your Answer

Get an OpenID
or

Not the answer you're looking for? Browse other questions tagged or ask your own question.