?

Log in

No account? Create an account

Spam filter - Journal of Omnifarious

Jun. 1st, 2003

09:20 am - Spam filter

Previous Entry Share Next Entry

Well, I finished my spam filter early this morning at 3am or so.. It only tries to protect hopper@omnifarious.mn.org right now, an email address that will be going away in a few years. I get a LOT of spam on that address because that's the address I used when I still posted to Usenet. If ever you want lots and lots of spam, just post to Usenet. It'll start coming in by the truckload.

Anyway, it's a wrapper around CRM 114, so it's a learning filter. The way I have the learning interface work is that you have to drop mail you want it to learn as spam or not spam into the 'learn spam' or 'learn notspam' mailbox, and a little program periodically runs by and pulls all the mail from those boxes and ads them to the spam or not spam training set.

Hopefully, that's a simple enough method of configuration that, after testing, I can let other people who use my computer use that method.

Current Mood: [mood icon] tired

Comments:

[User Picture]
From:agnosticfont
Date:June 1st, 2003 08:13 am (UTC)
(Link)
That looks very cool and according to the Sourceforge page it looks very efficient too.

What did you write it in ? (I'm assuming its running on a *nix platform ?)

I did quite a bit of work on deploying complex systems and peoples reactions to those were similar to peoples initial reactions to span - ie its too complex to map & understand. But applications like this disprove that :)
(Reply) (Thread)
[User Picture]
From:omnifarious
Date:June 1st, 2003 12:25 pm (UTC)
(Link)

I wrote the wrapper in Python. :-) I've completely given up on perl. :-)

I had to make the wrapper do some pre-processing on the spam to try to remove some of the obfuscation spammers put in. For example, it removes html comments that are prefixed and suffixed by a non-whitespace character, and it decodes anything encoded in base64, and removes all non-text attachments. The de-obfuscated mail is only sent to the classifier and learning engine, it is not placed in a mailbox. I won't have some program (even my own) randomly munging messages before I read them.

How do applications like crm114 prove that complex systems aren't too complex to map & understand? I can see that people have the same reaction to spam as to complex systems, but I can't see how something like crm114 might apply to other complex systems.

(Reply) (Parent) (Thread)
[User Picture]
From:agnosticfont
Date:June 1st, 2003 02:14 pm (UTC)
(Link)
How do applications like crm114 prove that complex systems aren't too complex to map & understand? I can see that people have the same reaction to spam as to complex systems, but I can't see how something like crm114 might apply to other complex systems.

I was trying to make the point that in the company I worked for (a global telecoms company), peoples reactions to complex systems then (which was 5-6 years ago) where the same as their reactions to spam - that they were too complex and changed too frequently to be catered for.
(Reply) (Parent) (Thread)
[User Picture]
From:omnifarious
Date:June 1st, 2003 05:01 pm (UTC)
(Link)

*grin* Ahh, OK. I thought maybe that's what you meant, but I wasn't sure.

Actually, I have a couple of ideas that would totally eliminate the spam problem if they became widely adopted. But, they largely involve building some infrastructure on top of the Internet as it currently exists.

(Reply) (Parent) (Thread)
[User Picture]
From:prettydark
Date:June 1st, 2003 12:35 pm (UTC)
(Link)
yay!
i am so glad it's going!
*hugs*
(Reply) (Thread)
[User Picture]
From:omnifarious
Date:June 1st, 2003 05:02 pm (UTC)
(Link)

*smile* Thanks for the encouragement. :-)

(Reply) (Parent) (Thread)
[User Picture]
From:brighteyez
Date:June 1st, 2003 11:04 pm (UTC)
(Link)
Can I just say that I completely HATE spam especially those penis enlargement ones. I'd have to have outdoor plumbing before finding that email beneficial!! hehehe I wonder how much money companies actually make from spam.
(Reply) (Thread)