Lmst

LOL.
Until last week, the scraperbots were poisoning models with the arcane performance details and masscheck logs of the ASF #SpamAssassin project. 2 decades of data that requires a deep knowledge of SA to make any sense of. It's freely available as a matter of principle. Between thousands of days & hundreds of rules with about a dozen distinct corpora, we poisoned the models with TBs of junk.
Now we're just sending '404 Forbidden' tens of thousands of times per hour.
https://mastodon.social/@wingo/115340116991402157

Finally found the right trick for protecting the #SpamAssassin RuleQA site: no more deep links into date+rule detail. You've gotta navigate in, or at least lie credibly about it.

As I posted on the SA mailing lists, I regret (slightly) the fact that our incomprehensible stats dumps are no longer serving to gum up the works of whatever LLMs are being so thirsty for sewage.

The spread of network space being used to DDoS the ASF #SpamAssassin RuleQA server is amazing. 25k+ unique /20 networks involved.

But I think I have it constrained. Too bad about China, Russia, and Brazil...

@dalias @musl Similar pattern for the ASF #SpamAssassin infra. Once we got a lot of the source networks blocked, the overload got much more bursty. I suspect that some are blindly cycling the query load around their various network providers without bothering to see when/where they are being blocked.
This is a pattern very similar to some historical spambots. They operate on a scale where the blockages are not worth tracking.

We are NOT AMUSED.

If you're hosted by Aceville, you're dead to me.

#Sysadminnery #SpamAssassin #AIBots

screenshot from the VM used for SpamAssassin RuleQA. It shows the runn of the command 'w' which reveals one logged in user and triple-digit load averages.

Raw Text:

root@sa-vm:/var/log/apache2# w
14:28:18 up 111 days, 8:42, 1 user, load average: 208.31, 203.42, 138.01
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
billcole pts/0 67.149.19.2 14:02 4.00s 7.525 0.22s sshd: billcole [priv]

How the open web closes, FOSS edition...

We are running a contest to create a new logo for ASF #SpamAssassin, but because of the steps taken to consolidate wikis for #TheASF and protect the ASF Confluence instance from various malefactors we have yet to figure out how to allow submissions from people who don't want to have an ASF account.

Gio is working with the Infra team to find a solution.

I need some help with #SpamAssassin because I am lost… I am trying to "fix" Microsoft's #M365 weird behaviour with sending calendar invites which, thanks to whatever they are doing now, is super-spammy:

1. the Message-Id contains a new line, i.e. (Message-Id:\nSomething very long) which breaks stuff
2. their relays (at least in Europe) present a different name at HELO than the PTR records - they are clearly migrating the nameing to be regional but forgot about anti-spam, SPF and DMARC, for example:

UZPR83CU001.outbound.protection.outlook.com (mail-northeuropeazlp17012010.outbound.protection.outlook.com [40.93.64.10]

3.they also love sending empty messages, as in completely empty.

The behaviour is not consistent, that is to say that if you send a meeting invite to 90 or so people, then about 80% come back with some form of reply which is catalogued as spam with:

0.9 FORGED_SPF_HELO No description available.
1.8 DMARC_REJECT DMARC reject policy
2.3 EMPTY_MESSAGE Message appears to have no textual parts

Obviously I could "turn off" the rules but I would like to do so selectively for just a bunch of IPs (i.e. the damned Exchange ones).

Would anyone be able to help me with writing a conditional rule? Can it even be done? I've been searching my life away but I land on either AI-generated text or "just whitelist the IPs" which is not what I want to do.

:flan_despair:

As a consequence of The Mothership changing its logo, ASF #SpamAssassin is also doing so, with a contest open to all.

We have not yet put together the details for this, but the last time we did this (before I even *used* SA) was 2004 and we did it like this then: https://cwiki.apache.org/confluence/display/SPAMASSASSIN/LogoContest

That page will change when we figure out a one-liner to revise all the dates.
(That's a Perl joke. Please Laugh.) https://fosstodon.org/@TheASF/115186533653931867

This is only bad because of the fully bogus URIs (a minority actually) and the breakneck pace of connections. If they'd just hit real rule detail pages at a serviceable rate, it would do a fabulous job of feeding an endless supply of incomprehensible and essentially meaningless numbers into their models.

#SpamAssassin

A couple hours of trying to elbow my way into the #SpamAssassin RuleQA server for repairs has finally succeeded. Now to work on a machine that takes 2 minutes to respond to anything. #FML #Sysadminnery

DAMNIT

Something is abusing the #SpamAssassin RuleQA system again. I assume it's AI only because it hits URLs that seem like they could exist but do not. Generated URLs structured like real "detail" pages that have bogus dates, bogus rule names, or a valid date and valid rule name, but the rule didn't exist at that time.

Pounding it so hard that I can't get in to fix it.

Incidentally, the suspicious TLD lists are public and here's the file from which it is drawn:

https://svn.apache.org/repos/asf/spamassassin/trunk/rulesrc/sandbox/pds/20_ntld.cf

Edit: That's now a direct link, since apparently the "ViewVC" tool has been put behind authentication.

If you use SA you can find the current lists in the 72_active.cf file of the daily rules distribution.

#PSA #SpamAssassin #spam #Sysadminnery #domains

This #PSA was brought to you by the 4th person in 3 years to report a SpamAssassin bug due to their .online domain.

I feel for these people, really, I do. it is absolutely unfair that some gTLDs correlate so strongly to spam. BUT THEY DO. We have to cap the strength of the relevant rules or the rescoring system would make a "wrong" domain reliably fatal

#SpamAssassin #spam #Sysadminnery #domains

#PSA: BEFORE selecting a domain name which you want to use for email, you definitely should consult the #SpamAssassin list of "suspicious" gTLDs. Those are gTLDs which have been so badly run that the overwhelming majority (99%+ in most cases) of messages using them for email addresses or even in URIs in the body are #spam. This means that if you pick example.online or examp-le.pro or any other domain in a suspicious gTLD, your mail will have delivery problems.

#Sysadminnery #domains

WARNING: #SpamAssassin will take pathologically long times to check pathological message content. <sigh>

There are 2 views on this. One is that it is best to train/test with such 'spicy ham' messages because that makes SA less likely to mark them as spam. My view is that sysadmins handle such pathological mail in such tiny volumes that teaching SA to treat it nicely is a waste of time and CPU.

Exempt your ham from heuristic analysis. Especially the spicy stuff.

https://bsd.network/@gbechis/115167091439496063

Hrm… it turns out that the just-released #SpamAssassin 4.0.2’s more aggressive CNAME chasing is causing some people problems with slowness, apparently related to slow DNS and/or specific pathological names.

I didn’t see any issues myself despite having run “trunk” for months. I am tempted to blame reporters’ crappy DNS, but maybe I misunderstand. In any case, if you see trouble you may want to check out the latest revision from svn/git

#spam #email

@blindcoder
Have a look at
/etc/spamassassin/local.cf
whitelist_from *@anymail.com

If spamassassin works with amavis you will find more settings in
/etc/amavis/conf.d/20-debian_defaults

Search for $whitelist_sender or @whitelist_sender_maps. These entries are self-explanatory.

In the same file search for
ENVELOPE SENDER SOFT-WHITELISTING / SOFT-BLACKLISTING

This allows you to control the scoring. (NOTE: positive: black, negative: white)

Go to the array definition an add something like the following to the whitelist part:

'.foo.example.com' => -6.0,
'nobody@cert.org' => -3.0,

Then restart your services like #amavis, #spamassassin, #postfix, #dovecot ...

Is there an option in SpamAssassin to exclude certain domains from Spamhaus DBL checks?
Let's say SH thinks "foo.example.com" is bad, but I want email from them anyway, what do I need to configure in spamassassin?

#Spamhaus #Spamassassin #Email #Selfhosted

If you're using #spamassassin on #debian, please consider testing the 4.0.2-rc1 packages. They're currently in experimental for unstable and built for bookworm in the bookworm-backports suite in my personal repo on people.debian.org. https://people.debian.org/~noahm/repo/

My plan is to update bookworm and trixie point releases to 4.0.2 once it's releases, so all the testing I can get is helpful. Please report any issues via the BTS or directly to me directly.

Whew. I think I finally got spamass-milter talking to spamd. At least, there's no errors in the logs and my test emails arrive in a reasonable time frame.

Now just to wait for more spam.

#Postfix #SpamAssassin

#SpamAssassin

Client Info