Google the SiteQuicksearchCategoriesSyndicate This BlogCreative CommonsBlog Administration |
Thursday, August 31. 2006Serendipity Spam Statistics
I just downloaded this great looking spam statistics plugin for Serendipity from Andreas. Unfortunately after installing it it didn't seem to work, so I got stuck in to see what was up.
Turns out it only works when the spamblock plugin logs to the database, so I'll either look into making it work with log files or maybe think about adding something to the admin stats plugin if that's possible. Or do neither given it's not uber important to me given I get a raft of info on the spam stats each night via a cron job. I have a cron job that checks various things spam related on a daily basis - checking for referer spam, quarantined files uploaded via PHP, mod_security log entries that need attention and finally checking for serendipity / weblog spam. The situation with weblog spam had gotten so bad on the old domain munk.nu that I even ended up creating a script to convert spamblock log entries into firewall rules for ipf. I'm not kidding, at least 100 trackback spam entries per day through June and July - for the year 2006 so far there are nearly 9000 unique IPs dropping new trackback spam. What's annoying too is that even adding offending IPs to my firewall block list, each and every new day there would be another 100 new unique IP addresses spamming the blog. No doubt this is a botnet - 100 new zombies found per day sounds like a professional organisation. Ho hum. Anyway I'll add the 'log2ipf.pl' perl script in the extended part of this article. It's a perl script that's little more than an extended 'grep | sed' which searches for text in a file and then reports how many results it found for each item. In the default case using just 'log2ipf.pl somefile.log' it searches for: CODE: "s9y"=>qr/.*\[REJECTED: [No API-created comments|Trackback URL invalid|Filtered by Akismet\.com].*, IP (.*?)].*/, in this case it reports a list of IP addresses and how many times each IP address was 'caught' trying to spam - but it could be modified to do anything. For example I have another 'filter' setup to see how many people use a google search to find pics on my server by searching for the term 'picasa.ini': CODE: "picasa" =>qr/^.*?\s+(.*?)\s+.*%22index\+of%22\+%2F\+picasa\.ini.*/ so I can feed apache logfiles to log2ipf.pl using this commandline: CODE: ; log2ipf.pl -l picasa /var/log/httpd/all/2006/07/*/* 24.242.97.20: 1 67.141.28.129: 1 telling me there was just 2 such searches during July 2006 (woo). I seem to remember that search returning more than that at the time I wrote the filter though lol. You get the idea anyway. To add a new 'filter', best thing to do is import a sample logfile line you want to produce a result, then customize the script %re variable to include your custom filter. For example, say you wanted to search for auth log failures for SSH (this is actually done for you by the periodic utility on FreeBSD if you set it up in /etc/periodic.conf, but that's another article! - you could write something like this for the %re filter: CODE: my %re=( "s9y"=>qr/.*\[REJECTED: [No API-created comments|Trackback URL invalid|Filtered by Akismet\.com].*, IP (.*?)].*/, #Example of logfile line we want to catch: # Aug 26 14:57:35 users sshd[30136]: Failed password for root from 211.48.62.102 port 50706 ssh2 "ssh" =>qr/.*Failed password for .* from (.*?) .*/, "picasa" =>qr/^.*?\s+(.*?)\s+.*%22index\+of%22\+%2F\+picasa\.ini.*/ ); which would result in: CODE: ; log2ipf.pl -l ssh /var/log/auth.log 168.126.71.148: 1 210.34.14.53: 3 84.10.149.105: 3 211.48.62.102: 3 220.231.54.232: 3 195.10.193.4: 5 213.179.181.26: 11 As I say you can do the equivalent with grep, sed, sort and uniq on the commandline: CODE: ; grep "Failed password for" /var/log/auth.log | sed -e 's/.*Failed password for .* from \([^ ]*\).*/\1/' \ | sort | uniq -c | sort -n 1 168.126.71.148 3 210.34.14.53 3 211.48.62.102 3 220.231.54.232 3 84.10.149.105 5 195.10.193.4 11 213.179.181.26 But for a very large file the timing differences between this method and the perl script are massive. Anyhoo this is turning into a crazy long entry so I'll turn it in. The script log2ipf.pl - should rename that really since it's got little to do with ipf really! - is in the extended article below if anyone's interested. Continue reading "Serendipity Spam Statistics"
(Page 1 of 1, totaling 1 entries)
|

