Sam Gentle.com

Open Spam Net

Spam net

I've been looking into migrating away from Gmail for a while, largely out of concern that Google can shut down my email on a whim. My mail already runs through my own mail server, but I use Gmail for its nice UI, searching, and other neat features. I still haven't found a mail client that I actually like, but I'm hopeful that something will come along.

Aside from UI, the other big advantage of Gmail over your own server is the quality of its spam filtering. This isn't even just an algorithmic thing, although I'm sure their algorithms are top-notch. Rather, it's systemic: Google has access to everyone's email, so they can do spam filtering across everyone's email at once. They can apply detection techniques in the large that are impossible for an individual mail server operator to match.

Well, maybe. Most spam filters use some kind of Bayes classifier. Essentially, when you mark some emails are spam or not spam, you know the probability of different features (keywords, usually) appearing in those emails. What you really want to know is the reverse: the probability of spam given the features. That's exactly what Bayes' theorem is for.

And there's no reason that model has to be restricted to a single user, or single server. You can chain many layers of predictions together in a Bayes network, and you could use that exact structure to federate your predictions. So: I think a certain set of features are likely to be spam. You can subscribe to my ideas of spamminess, and any time I'm wrong, not only does your prediction of those features being spammy go down, your prediction of me being right goes down too!

The nice thing about this is that there's very little benefit in spammers getting hold of it. Anyone (including a spammer) entering the network would start with a 50/50 predictiveness, which is to say there's no assumption that they have any value. They would earn that value by making successful predictions, and if spammers want to help fight spam then great.

The end result would be a massive predictive network where each user's spam predictions are combined. A place where everyone from big corporate networks to little private VPS mail servers can collaborate to fight spam together.