Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] the maddening filter
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <xavier.leroy@i...>
Subject: Re: [Caml-list] the maddening filter
> Try as I might, I cannot get a message about the Seattle ML SIG to go
> through.  I am avoiding the use of certain words on the presumption that
> I am filtered because of them.

You piqued my curiosity enough so that I looked at the "spam" box for
caml-list.  Some general facts first:

- caml-list@inria.fr is a heavily spammed address -- 41000 spams since
  Jan 1st, or about 200 spams per day.

- Filtering is entirely automatic, using the "SpamOracle" Bayesian filter.
  No human would have the fortitute to filter 200+ messages per day.

- Yes, one or two spams occasionally get through, but that's still
  99.9% accuracy.

- An alternative would be to restrict posting to list members.
  Unfortunately, many members receive the list with one e-mail address
  and post with another, and Majordomo doesn't handle this.
  We'll reconsider when we switch caml-list to Mailman at some point
  this summer.

Now, for the funny bit, here is why SpamOracle thinks that a number of
your messages are spam.  I'll disguise the tokens just to make sure.

For some reason, "Br*ndon" and "Se*ttle" have high spam probability.
You're just unlucky :-)

Moreover, you signature contains the words "entr*preneur", "certifi*d"
and "anti-vir*s", which are also strongly correlated with spam.  This
you might want to change.

You occasionally talk about "mon*y", "mark*ters", "prod*cts", and use
the adjectives "pa-id", "bri*f" and "spurio*s", all of which occur
disproportionally more in spam than in regular caml-list messages.

On the other hand, you score good "ham" points by using "ocaml" (what
else?), "excessive", "complicated" and "crappy" (score one for strong
language!). "Cheers" is also a good ham indicator.

There is no denying that all this filtering nonsense is getting crazy,
and I'm sorry it just got berserk on your messages.  But still there
is some truth to the fact that your messages are somewhat different in
wording from the training "ham" used; the Bayesian filter just notices
this...

At any rate, here is one of your messages that didn't get through, so
that everyone gets it.  I took the liberty to disguise some words just
in case :-)

- Xavier Leroy

--------------------

On Monday, June 14 we had a meeting of the (first ever?) Se*ttle OCaml
Special Interest Group. It was 3 people, the beer was good, and the
discussion was lively. I now want to broaden the miss*on statement to
include all ML'ers. I think we might get a few more people that way.

We agreed that The Stumbling Monk, a Belgian pub in Capitol Hill, was
a good venue for b*er. We also thought that meetings at roughly 3 week
intervals is the right pace. Yes that's a funny number to remember,
but 2 weeks is too quick and once a month is too slow. So we
think. Anyways, I would like to organize the next meeting for the week
of july 5th, at The Monk again. Day and time to be decided.

Please e-ma*il me if interested.

--------------------

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners