Перейти из форума на сайт.

НовостиФайловые архивы
ПоискАктивные темыТоп лист
ПравилаКто в on-line?
Вход Забыли пароль? Первый раз на этом сайте? Регистрация
Компьютерный форум Ru.Board » Компьютеры » В помощь системному администратору » FAQ по Exim MTA

Модерирует : lynx, Crash_Master, dg, emx, ShriEkeR

ShriEkeR (16-05-2011 18:50): FAQ по Exim MTA #2  Версия для печати • ПодписатьсяДобавить в закладки
На первую страницук этому сообщениюк последнему сообщению

   

hoochie



Full Member
Редактировать | Профиль | Сообщение | Цитировать | Сообщить модератору
Q. I've fed DSPAM thousands of spam, and am only getting marginal accuracy. What's up?
A. Your problem might be that you've fed DSPAM thousands of spam, but have not fed it enough nonspam for it to learn adequately. It's typically a bad practice to feed a statistical filter a grossly unbalanced corpus of mail, and if you're using a version of DSPAM that has a "training buffer" enabled by default, feeding a ton of spam can also cause it to start watering down its results until you feed it more ham. This watering down gets stronger the higher your spam ratio is, in an attempt to prevent false positives - so the more spam you feed it, the worse your accuracy will get. There are a few things you can do to remedy this:
Turn off the training buffer ("Feature tb=5" in dspam.conf) if it is turned on, or lower the buffering level. You'll want to use a value lower than 5, as this is DSPAM's default. A value of 0 will disable this protection entirely. Find a value that gives you the best spam filtering without allowing for too many false positives.  
The better solution may be to feed DSPAM enough nonspam to exceed the training threshold (2500 messages). This will not only disengage the statistical sedation feature, but will allow other algorithms to kick in, such as Bayesian Noise Reduction, which only engage after training.  
Try deleting your database and retraining using the dspam_train tool, instead of dspam_corpus. dspam_corpus isn't really designed for building highly accurate pretrained databases.  
If this doesn't work, or you're showing TI+IC values over 2500 in dspam_stats for your user, another common problem is incorrect training parameters. When a message is retrained in DSPAM, be careful not to specify it as a corpusfed spam, but as an error. Check your commandline arguments, and make sure you're using --source=error and NOT --source=corpus. --source=corpus is for messages that have not been processed by DSPAM. --source=error is for messages that have been processed by DSPAM, and were erroneously classified.
 
It's important not to specify corpus training on missed spam, because DSPAM only learns corpus messages, and doesn't relearn them. So you'll end up with 1 spam tick mark and 1 innocent tick mark, instead of the correct result: 1 spam tick mark and 0 innocent tick marks.

Всего записей: 434 | Зарегистр. 30-03-2003 | Отправлено: 11:07 21-02-2007
   

На первую страницук этому сообщениюк последнему сообщению

Компьютерный форум Ru.Board » Компьютеры » В помощь системному администратору » FAQ по Exim MTA
ShriEkeR (16-05-2011 18:50): FAQ по Exim MTA #2


Реклама на форуме Ru.Board.

Powered by Ikonboard "v2.1.7b" © 2000 Ikonboard.com
Modified by Ru.B0ard
© Ru.B0ard 2000-2024

BitCoin: 1NGG1chHtUvrtEqjeerQCKDMUi6S6CG4iC

Рейтинг.ru