• Bayesian Spam Filter

    • This topic has 32 replies, 10 voices, and was last updated 22 years ago.
    Author
    Topic
    #388043

    After reading an IDC report I downloaded SpamBayes to check out capabilities. Download site for the Outlook add-in is http://starship.python.net/crew/mhammond/spambayes/%5B/url%5D Buttons are added to Outlook, and a new field view definition inserted.
    Documentation on set up and training the programme was easy to follow. After set up, I released a number of known spam e-mails sitting in the Mail Marshall quarantine. All went directly to the Spam folder. Did the same with a couple of more e-mails that could or could not have been spam.
    One was moved into the unsure folder, as it should be, and the other went into my in-box with a spam score of 6%. Used the delete as spam button in this programme to move from unsure to spam folder (which adds to the programmes training). The 6% e-mail looked like spam to me so asked this message to be treated as spam.
    So far I am impressed with the simplicity and functioning of this FREE Outlook add-in.
    This is an early version, with the authors soliciting feedback..
    cheers

    Viewing 2 reply threads
    Author
    Replies
    • #679738

      Paul–
      You might want a look at
      Paul Graham’s Plan for Spam
      Arc
      Better Beyesian Filtering

      SMBP

    • #679871

      Hi Paul:
      Have you seen any Baysian add-ins for Netscape?

      • #679872

        Morning Phil, Sadly crybaby I have seen nothing on a Bayesian product for Netscape. Perhaps you can post a question to the creators of SpamBayes, at the address in my earlier posting. thumbup

        • #679877

          On reading the article at the first link, I was disappointed to see that there was to be no support for Outlook Express.

          But it’s certainly worth a try at work on Outlook, where about 30% of my messages are spamular. And that’s just in my InBox! No- surrender

          • #679879

            I had the same thoughts as you John. So far I am finding that SpamBayes is working better than anything I had in the past. I can see that updating (aka training) is an important part of how this works, but once set up it is a minor task to do, say weekly. sigh
            This morning the inbox was cleared correctly, and the two in the “could be spam” category went to the separate folder where, after looking at the header I used the SpamBayes button to “delete as Spam” bravo Certainly saved time for me bananas

          • #679950
            • #679955

              SMBP, Popfile claims to work with OE. I use it at work on four machines with an add-in to make it work with Outlook. If works very well for us at work. I can’t say for sure about OE.

              Joe

              --Joe

            • #679957

              You’re right Joe—the ultimate test is how it works. I’ll have to give a couple of these Baysian filters a try. Fortunately, the simple little tricks, the “no-spam” and other tips on some of the Tech TV links have been working for me, but I know some boxes/clients where I’d like to test some. I’m not getting much spam in them, but enough to test with. I’m not getting spammed with an msn newsgroup address since I put “-no spam”[/i] in the exact same address. Before inserting it, I was getting spammed quite a lot after using the address on msn groups which is of course tantamount to posting it. Some spam newsgroups or fake newsgroups that were spam groups and not legitimate msn groups were also showing up on the msn “list”–they aren’t anymore.

              SMBP

          • #679967

            John–
            Also another Baysian Outlook spam filter worth looking at–posted from Stuart on Post 25538 this thread is Junk Out from The Office Maven.

            SMBP

          • #681450

            After reviewing several filter apps, I chose K9 from http://keir.net/k9.html [/url]
            I did not look at SpamBayes because of Python. I have nothing against Python, just didn’t have it installed and didn’t want it. Many others that looked interesting required permanent storage of the designated spam folder or continual re-training. K9 is what I wanted, is not a plug-in, the site has specific instructions for setting up Outlook, Outlook Express, Netscape and Eudora, and should work with other e-mail clients. I have been very pleased with it!

            • #681465

              I think that’s another case of “you pays your money and you takes your choice” (even though both products are free!).

              SpamBytes works on Outlook 2000 and 2002, but not Outlook Express, and only requires you do do an install, then it’s all integrated and requires almost no effort to use.

              The only Pythons I know anything about are Monty Python and the snake of that name, and I don’t see any signs of extraneous software. having installed SpamBytes.
              K9 requires you to do the following after installation:
              “For each POP3 account in your POP3 email program that you want to work with K9…
              Change the POP3 server port to 9999.
              Change the POP3 server to 127.0.0.1.
              The POP3 account user name should be changed to incorporate the original POP3 server, port and user name values into one long string, separated by a “/” character. ”

              Is this better or worse than a one-off install? Both products appear to be based on the same principles.

              I hope we’re not going to get into a “My Mac is better than your PC”-type of argument. Surely the important thing, since we can’t of ourselves actually prevent spam, is to deal with it at minimum cost and effort to ourselves? As always, people will prefer different products, and, as I have been know to say before, Your kilometerage may vary…

            • #681522

              Uhgg, You use a Mac? pcvmac … just kidding!

              Actually, I believe SpamBayes uses newer advanced scoring techniques while K9 uses the original Graham algorithm. You make a good point regarding installation. I would add that K9 will on first run automatically configures Outlook or Outlook Express.

              The main purpose of my post was to offer a possible solution to messages in this thread regarding Outlook Express and Netscape.

      • #679943

        If I remember correctly, Phil, you are using Mozilla–something on this list should work for you and you may want to tweak the search a little bit to find what you want. Hope this helps. If not let me know and I’ll look harder.

        Bayesian Spam Filters for Netscape Search

        SMBP

    • #680320

      Paul

      Am trying SpamBayes on Outlook 2002. Yesterday I must have had about 30 spams, but today, when I come to train it, I had just 6!

      • #680336

        Behold eMouseTrap…..coming soon!

      • #680503

        John, I am finding that as time passes this programme gets better and better. I have decided to do a weekly training run, as I have by then large numbers of spam examples. (most form Yahoo where I once logged on to a user group!)
        I have had only a couple of false positives, and the “possible” spam has in almost all cases been crafty spam.
        For over a week now there has been no spam in my inbox, it has all gone straight to the spam folders clapping

        • #680511

          Just to be annoying, the spammers seem mostly to be giving me a miss this week. [No, thanks, I don’t want any from anyone else!]. There have only been about a dozen, and the most recent ones have been put by SpamBayes into my “Possible Spam” folder.

          No doubt a silly question, but once you’ve assigned the Spam category to rubbish messages, either by doing “Delete as Spam” or moving them into the “Spam” folder, you can then just delete them, as you always used to, since SpamBayes has taken a note of their contents?

          • #680515

            Hi John, yes you could. At the moment I keep a week’s in my Spam folder, before deleting. Even though SpamBayes has learned from those that are there, I just prefer to have a good sample when I run each week’s database update. cheers NewZealand

            • #680564

              This is good stuff – I was a bit hesitant about trying it, being an “alpha” release according to the site. After noting the absence of reported problems here, I decided to give it a go, and glad I did. It seems more stable than most MS commercial releases.

              I was lucky that I’d been slack cleaning out my deleted files for a while, so training was a cinch – apart from categorising several hundred emails. I’m finding it has a better than a 90% hit rate in separating out spam with the default “confidence” levels at 90% and 25%. I’m now starting to tweak them to try and improve that 90%.

              Only problem to date? It’s almost fun getting spam, and seeing it get sidelined automatically. grin

            • #680571

              I know the feeling Tim….almost imagine that it must have been the same for the Romans in the Coloseum when their thumbs went down yep

            • #680660

              Paul, you say “when I run each week’s database update”. I can’t see why this would be necessary, because doesn’t it get updated each time you’ve told it that a particular message is Spam? I don’t see anything about this “database update” in the “SpamBayes Outlook Plugin” four-page document… I observe from the Logs that individual messages can be “trained as spam”.

            • #680663

              I think you may be right. I’ve just used the “Delete as Spam” and “Recover from Spam” buttons, with both “Automatically train” options ticked. After the initial training and brief use I had a look at the log. It commented any message that I accidentally processed more than once with “already was trained as spam”.

              That is, after the initial batch of training, all you need to do for further training is correct it when it gets a message wrong (as long as you’ve got the auto training set on).

            • #680805

              John, you are absolutely right, after reading the documentation again the programme learns as it goes. Sorry to put you off track, and thanks for pointing this out. doh

        • #680711

          [indent]


          I have decided to do a weekly training run, as I have by then large numbers of spam examples.


          [/indent] question How many spam messages do these filters require in order to be trained? Is there a generally accepted minimum spam to non-spam ratio or message volume level that would indicate that these filters would actually be beneficial? Thankfully, the biggest spam problem I have right now is co-workers multi-forwarding chain letters around and around the office. groan However, the possibility of getting buried is always a send/receive away.

          • #680807

            Hi David, firstly I need to correct my mistaken impression that I should do a weekly re-train. John correctly pointed out that the programmes learns as it goes. ( I will read the documentation carefully next time) doh
            After the install I had about 700 “good” emails and about 40 odd in the spam folder. After the initial training, using that quantity of emails the programme performance has been impressive. Most of my “re-training” has been for emails in the “possible spam” folder which I then classify as spam or not. As at now I have had no spam emails appear in my In-box folders that need to be manually handled. clapping clever NewZealand

            • #682811

              Quote from today’s (UK) “Computer Weekly”…

              “Microsoft’s chief spam fighter has claimed that the spread of junk e-mail can be contained within two years, but he admitted that the problem will get worse before it gets better. “For a lot of people the situation has got so bad that they are willing to give up e-mail if the spam situation does not get any better,” said Ryan Hamlin, general manager of Microsoft’s anti-spam technology and strategy group. He added that almost half of all e-mail is spam, and the figure is likely to rise to 65% next year.”

              Missing from this is any account of how this reduction is going to be achieved… I think we should be told.

            • #682890

              “Missing from this is any account of how this reduction is going to be achieved… I think we should be told.”

              So the spammers can start on working on a work around, and defeat the process as they are doing now?

              DaveA I am so far behind, I think I am First
              Genealogy....confusing the dead and annoying the living

            • #683074

              Sorry, I was being a bit obscurely British there. “I think we should be told” comes from a satirical magazine called Private Eye, and it is a sort of “running gag” or stock sentence, the usual suffix to some story of how information is being kept from the general public by (usually!) the government, ever anxious to hide everything from us plebs or to sweep it quietly under the carpet.

              Reminds me of an old tagline:
              The best form of security is ignorance.

            • #683064

              With Microsoft’s history of obfuscation we may be told but would not know what was being said! bummer
              Spammers will adapt to as technology changes, after all isn’t a tenet of advertising that “the message must get through”? megashout
              bouncenburn On the same level as spam are the advertising inserts in my daily newspaper, as this circumvents my mailbox notice “no advertising material”. rantoff

    Viewing 2 reply threads
    Reply To: Bayesian Spam Filter

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: