Google the SiteQuicksearchCategoriesSyndicate This BlogCreative CommonsBlog Administration |
Thursday, August 31. 2006Serendipity Spam Statistics
I just downloaded this great looking spam statistics plugin for Serendipity from Andreas. Unfortunately after installing it it didn't seem to work, so I got stuck in to see what was up.
Turns out it only works when the spamblock plugin logs to the database, so I'll either look into making it work with log files or maybe think about adding something to the admin stats plugin if that's possible. Or do neither given it's not uber important to me given I get a raft of info on the spam stats each night via a cron job. I have a cron job that checks various things spam related on a daily basis - checking for referer spam, quarantined files uploaded via PHP, mod_security log entries that need attention and finally checking for serendipity / weblog spam. The situation with weblog spam had gotten so bad on the old domain munk.nu that I even ended up creating a script to convert spamblock log entries into firewall rules for ipf. I'm not kidding, at least 100 trackback spam entries per day through June and July - for the year 2006 so far there are nearly 9000 unique IPs dropping new trackback spam. What's annoying too is that even adding offending IPs to my firewall block list, each and every new day there would be another 100 new unique IP addresses spamming the blog. No doubt this is a botnet - 100 new zombies found per day sounds like a professional organisation. Ho hum. Anyway I'll add the 'log2ipf.pl' perl script in the extended part of this article. It's a perl script that's little more than an extended 'grep | sed' which searches for text in a file and then reports how many results it found for each item. In the default case using just 'log2ipf.pl somefile.log' it searches for: CODE: "s9y"=>qr/.*\[REJECTED: [No API-created comments|Trackback URL invalid|Filtered by Akismet\.com].*, IP (.*?)].*/, in this case it reports a list of IP addresses and how many times each IP address was 'caught' trying to spam - but it could be modified to do anything. For example I have another 'filter' setup to see how many people use a google search to find pics on my server by searching for the term 'picasa.ini': CODE: "picasa" =>qr/^.*?\s+(.*?)\s+.*%22index\+of%22\+%2F\+picasa\.ini.*/ so I can feed apache logfiles to log2ipf.pl using this commandline: CODE: ; log2ipf.pl -l picasa /var/log/httpd/all/2006/07/*/* 24.242.97.20: 1 67.141.28.129: 1 telling me there was just 2 such searches during July 2006 (woo). I seem to remember that search returning more than that at the time I wrote the filter though lol. You get the idea anyway. To add a new 'filter', best thing to do is import a sample logfile line you want to produce a result, then customize the script %re variable to include your custom filter. For example, say you wanted to search for auth log failures for SSH (this is actually done for you by the periodic utility on FreeBSD if you set it up in /etc/periodic.conf, but that's another article! - you could write something like this for the %re filter: CODE: my %re=( "s9y"=>qr/.*\[REJECTED: [No API-created comments|Trackback URL invalid|Filtered by Akismet\.com].*, IP (.*?)].*/, #Example of logfile line we want to catch: # Aug 26 14:57:35 users sshd[30136]: Failed password for root from 211.48.62.102 port 50706 ssh2 "ssh" =>qr/.*Failed password for .* from (.*?) .*/, "picasa" =>qr/^.*?\s+(.*?)\s+.*%22index\+of%22\+%2F\+picasa\.ini.*/ ); which would result in: CODE: ; log2ipf.pl -l ssh /var/log/auth.log 168.126.71.148: 1 210.34.14.53: 3 84.10.149.105: 3 211.48.62.102: 3 220.231.54.232: 3 195.10.193.4: 5 213.179.181.26: 11 As I say you can do the equivalent with grep, sed, sort and uniq on the commandline: CODE: ; grep "Failed password for" /var/log/auth.log | sed -e 's/.*Failed password for .* from \([^ ]*\).*/\1/' \ | sort | uniq -c | sort -n 1 168.126.71.148 3 210.34.14.53 3 211.48.62.102 3 220.231.54.232 3 84.10.149.105 5 195.10.193.4 11 213.179.181.26 But for a very large file the timing differences between this method and the perl script are massive. Anyhoo this is turning into a crazy long entry so I'll turn it in. The script log2ipf.pl - should rename that really since it's got little to do with ipf really! - is in the extended article below if anyone's interested. CODE: #!/usr/bin/perl -w # File: log2ipf.pl # Author: Jez Hancock # Description: strips a list of IP addresses out of a logfile based on # variousperl regular expressions. # License: GPL use strict; use Getopt::Long; ######################################################################## # Start config ######################################################################## eval 'exec /usr/bin/perl -w -S $0 ${1+"$@"}' if 0; # not running under some shell my $progname = $0; $progname =~ s,.*/,,; # use basename only $progname =~ s/\.\w*$//; # strip extension, if any my $VERSION = sprintf("%d.%d", q$Revision: 0.1 $ =~ /(\d+)\.(\d+)/); # Commandline params: my($help, $verbose, $debug, $err)= ( undef, undef, undef, undef ); # Path to the ipfilter ruleset: my $ipf_rules="/etc/ipf.rules"; # Threshold of blocked entries above which we should include this IP: my $threshold = 0; # list of offending ips: my %ips; # indicates to use ipf ruleset output: my $ipf=0; # Holds the ipf rules created if applic: my $ipfrules=""; # count of abuse entries: my $count=0; # for buffering output: my $output=""; # whether to add a comment block at the top of ipf rules: my $comments=0; # whether to display list of offending ip addresses only: my $ips_only=0; # whether or not to output rules for an ip if a rule already exists # in /etc/ipf.rules for it: my $check_dups=0; # Type of logfile we're searching in, default to s9y: my $logtype="s9y"; # whether to output ipf rules in a form ready for execution on the # commandline: my $executable=0; # Types of regular expression to use for different types of logs # the regexp should contain one single pair of brackets - the IP of the # offending host: my %re=( "s9y" =>qr/.*\[REJECTED: [No API-created comments|Trackback URL invalid|Filtered by Akismet\.com].*, IP (.*?)].*/, "picasa" =>qr/^.*?\s+(.*?)\s+.*%22index\+of%22\+%2F\+picasa\.ini.*/ ); ######################################################################## # End config ######################################################################## sub usage{ $err && (print $err,"\n"); die<<"~USAGE~"; Usage: $progname [-h] [-v] [-d] [-n] [-t <number>] -h Display this help. -v Display verbose logging output. -d Display added debug info. -n Do not execute any commands, dry run. -i Print out results in ipf rule format. -z Don't dispaly duplicate rules - if a rule already exists in /etc/ipf.rules for the current ip, don't create another. -n Only print a list of offending IP addresses and exit. -e Make any ipf rules output 'executable'. -t <number> Number of times an IP must have been blocked for us to include it in our reports. -l <logtype> Specify the <logtype> to search in. Valid options are: s9y or my_ egallery $progname takes lines of spamblock log entries from the serendipity weblogging application and works out how many times each IP address has attempted to 'spam' us. With the -t <number> option, only list IPs where the number of attempts is over <number> attempts. With the -l <logtype> option, allows the user to specify alternative types of logfile to search for attack vectors in. Current supported searches are s9y and my_ egallery. With the -e option, $progname will output ipfilter rules in a form suitable for executing from the commandline. Output looks as follows: echo 'block in quick from x.x.x.x to any group 100' | ipf -f - Can be executed by adding '|sh' onto the end of the $progname command once output has been checked over. Version $VERSION ~USAGE~ } MAIN:{ # validate input: GetOptions( 'v|verbose' => \$verbose, 'h|help' => \$help, 'd|debug' => \$debug, 't=i' => \$threshold, 'c' => \$comments, 'n' => \$ips_only, 'z' => \$check_dups, 'i|ipf' => \$ipf, 'e|executable' => \$executable, 'l=s' => \$logtype ); usage() if($help); if($comments && !$ipf){ $err="-c specified without -i, must use -i with -c\n"; usage(); exit; } if($ips_only && $ipf){ $err="-n makes no sense with -i, quitting\n"; usage(); exit; } if(!defined($re{$logtype})){ $err="Logtype '$logtype' not recognized, see below for valid options for -l\n"; usage(); exit; } while(<>){ next if(!/$re{$logtype}/); $ips{$1}++; $count++; } print "Total log entries for period: $count\n\n" if $verbose; print "Unique IPs for period: " . int(keys (%ips)) . "\n\n" if $verbose; if($threshold && $verbose){ print "Threshold is set to $threshold - only display IPs with $threshold or more entries.\n"; } my $format_space=4; IP: foreach (sort {$ips{$a}<=>$ips{$b}} keys %ips) { if($ips{$_}>=$threshold){ if($check_dups){ # check if this ip is already in /etc/ipf.rules: my $ipf_check=""; $ipf_check=`grep $_ $ipf_rules`; chomp $ipf_check; if ( $ipf_check ne ""){ $verbose && print "Skipping $_, already exists in $ipf_rules...\n"; next IP; } } if($ipf){ $output.= $executable ? "echo '" : ""; $output.= sprintf "block in quick from %15s to any group 100", $_; $output.= $executable ? "' | ipf -f -" : ""; $output.= "\n"; }elsif($ips_only){ $output.="$_\n"; }else{ $output.=sprintf "%15s:%10s\n", $_, $ips{$_}; } } } if($output ne ""){ if($ipf && $comments){ my $date=`date`; chomp $date; $output="\n# $date - entries blocked by $logtype:\n".$output."\n"; } print $output; } } Trackbacks
Trackback specific URI for this entry
No Trackbacks
|

