vote up 3 vote down
star
2

The current web is heavily polluted with various forms of spam. Can semantic technologies help? If the spammers start using fake metadata then it could reduce the usefulness of what we currently have.

Could we make use of personal networks to assign reputation to sites and people? I've heard of people using their FOAF network as an email filter, i.e. only accept incoming email from people in their extended network. Google's social search can give you search results using network data in your profile that can be based on FOAF and XFM data.

This question was partly inspired by the classic Metacrap article.

I know I keep mentioning FOAF and XFN, but those are the semantic technologies which I have played with most.

flag

5 Answers

vote up 4 vote down
check

Semantic Web has to track provenance and trust of data, judged by various people/agents/groups.

You should be able to state that you completely trust your friend's data, or that you trust somebody's judgment of trust - like some spam detection software/service.

You should be able to have multiple of such trust profiles, each for different purpose.

Unfortunately ontologies and tools for this are not yet well developed.

Personally I foresee a common Semantic Web gateway application for remote calls by applications, which would use the user's identity and trust information to present user a subset of the Semantic Web - only the information he trusts.

link|flag
vote up 2 vote down

I wrote recently about some of the vectors by which Spam could attack the semantic web. I don't offer any solutions but I think it's important to understand the vulnerable areas first. See http://iandavis.com/blog/2009/09/linked-data-spam-vectors

link|flag
vote up 2 vote down

My take is that spam is just a form of incorrect data, so it will be dealt with in a similar manner to markup that says (for example) that Sweden is a country in Africa. Semantic apps will need to calculate how much they trust their datasources, and look for consensus amongst disparate sources.

Search engines like Google already assign a measure of trust (PDF here) to domains and pages and assume that good pages don't link to bad ones. Something similar will soon have to be applied to semantic data.

link|flag
vote up 0 vote down

Perhaps the semantic web will be better able to accomodate advertising if the advertising has to be semantically relevant to gain linkage. The irritation about spam (discounting malware for a moment) is that the spam is frequently irrelevant. If it weren't irrelevant, I doubt people would be so irritated by it.

Obviously, there's nothing to stop spammers from, say, scanning the ontology referenced in some content and then randomly sprinkling the ontology's concepts all over their own advertisement. To detect that, you'd need something a bit more intelligent than a baseline semantic web application. Another evolutionary pressure towards general purpose AI.

I guess it depends what form of spam you're refering to:

  • unsolicited emails
  • comment spam
  • irrelevant wiki edits
  • semi-relevant answers that plug specific products...

Each of these has a different context that determines the contextual relevance requirements upon them.

link|flag
vote up 0 vote down

I believe one measure that is worth mentioning is the sha-1 encoding of the mailbox in FOAF, but obviously it can only be used to check whether a known value corresponds to it. Whether it really does counter spam, I don't know...

link|flag

Your Answer

Get an OpenID
or

Not the answer you're looking for? Browse other questions tagged or ask your own question.