<?xml version="1.0" encoding="utf-8" ?>

<rss version="2.0" 
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:admin="http://webns.net/mvcb/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
   xmlns:wfw="http://wellformedweb.org/CommentAPI/"
   xmlns:content="http://purl.org/rss/1.0/modules/content/"
   xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule">
<channel>
    <title>freebsd.munk.me.uk - ipfilter</title>
    <link>http://freebsd.munk.me.uk/</link>
    <description>FreeBSD System Administration</description>
    <dc:language>en</dc:language>
    <generator>Serendipity 1.5.2 - http://www.s9y.org/</generator>
    
    <image>
        <url>http://freebsd.munk.me.uk/templates/default/img/s9y_banner_small.png</url>
        <title>RSS: freebsd.munk.me.uk - ipfilter - FreeBSD System Administration</title>
        <link>http://freebsd.munk.me.uk/</link>
        <width>100</width>
        <height>21</height>
    </image>

<item>
    <title>Serendipity Spam Statistics</title>
    <link>http://freebsd.munk.me.uk/archives/170-Serendipity-Spam-Statistics.html</link>
            <category>ipfilter</category>
            <category>Perl</category>
            <category>Serendipity</category>
            <category>Spam</category>
    
    <comments>http://freebsd.munk.me.uk/archives/170-Serendipity-Spam-Statistics.html#comments</comments>
    <wfw:comment>http://freebsd.munk.me.uk/wfwcomment.php?cid=170</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://freebsd.munk.me.uk/rss.php?version=2.0&amp;type=comments&amp;cid=170</wfw:commentRss>
    

    <author>nospam@example.com (munk)</author>
    <content:encoded>
    I just downloaded this great looking &lt;a href=&quot;andreas.id.au/blog/archives/77-Akismet-Spam-Statistics.html&quot;  title=&quot;Spam Statistics Plugin&quot;&gt;spam statistics plugin for Serendipity from Andreas&lt;/a&gt;.  Unfortunately after installing it it didn&#039;t seem to work, so I got stuck in to see what was up.&lt;br /&gt;
&lt;br /&gt;
Turns out it only works when the &lt;a href=&quot;http://blog.s9y.org/archives/123-Spamblock-Improvements,-Part-II.html&quot;  title=&quot;Serendipity Spamblock Plugin&quot;&gt;spamblock plugin&lt;/a&gt; logs to the database, so I&#039;ll either look into making it work with log files or maybe think about adding something to the admin stats plugin if that&#039;s possible.  Or do neither given it&#039;s not uber important to me given I get a raft of info on the spam stats each night via a cron job.&lt;br /&gt;
&lt;br /&gt;
I have a cron job that checks various things spam related on a daily basis - checking for referer spam, quarantined files uploaded via PHP, mod_security log entries that need attention and finally checking for serendipity / weblog spam.  The situation with weblog spam had gotten so bad on the old domain munk.nu that I even ended up creating a script to convert spamblock log entries into firewall rules for ipf.  I&#039;m not kidding, at least 100 trackback spam entries per day through June and July - for the year 2006 so far there are nearly 9000 unique IPs dropping new trackback spam.&lt;br /&gt;
&lt;br /&gt;
What&#039;s annoying too is that even adding offending IPs to my firewall block list, each and every new day there would be another 100 new unique IP addresses spamming the blog.  No doubt this is a botnet - 100 new zombies found per day sounds like a professional organisation.&lt;br /&gt;
&lt;br /&gt;
Ho hum.  Anyway I&#039;ll add the &#039;log2ipf.pl&#039; perl script in the extended part of this article.  It&#039;s a perl script that&#039;s little more than an extended &#039;grep | sed&#039; which searches for text in a file and then reports how many results it found for each item.  In the default case using just &#039;log2ipf.pl somefile.log&#039; it searches for:&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;bb-code-title&quot;&gt;CODE:&lt;/div&gt;&lt;div class=&quot;bb-code&quot;&gt;&quot;s9y&quot;=&amp;#62;qr/.&amp;#42;\&amp;#91;REJECTED&amp;#58;&amp;#160;&amp;#91;No&amp;#160;API-created&amp;#160;comments|Trackback&amp;#160;URL&amp;#160;invalid|Filtered&amp;#160;by&amp;#160;Akismet\.com&amp;#93;.&amp;#42;,&amp;#160;IP&amp;#160;&amp;#40;.&amp;#42;?&amp;#41;&amp;#93;.&amp;#42;/,&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
in this case it reports a list of IP addresses and how many times each IP address was &#039;caught&#039; trying to spam - but it could be modified to do anything.  For example I have another &#039;filter&#039; setup to see how many people use a google search to find pics on my server by searching for the term &#039;picasa.ini&#039;:&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;bb-code-title&quot;&gt;CODE:&lt;/div&gt;&lt;div class=&quot;bb-code&quot;&gt;&quot;picasa&quot;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;=&amp;#62;qr/^.&amp;#42;?\s+&amp;#40;.&amp;#42;?&amp;#41;\s+.&amp;#42;%22index\+of%22\+%2F\+picasa\.ini.&amp;#42;/&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
so I can feed apache logfiles to log2ipf.pl using this commandline:&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;bb-code-title&quot;&gt;CODE:&lt;/div&gt;&lt;div class=&quot;bb-code&quot;&gt;;&amp;#160;log2ipf.pl&amp;#160;-l&amp;#160;picasa&amp;#160;/var/log/httpd/all/2006/07/&amp;#42;/&amp;#42;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;24.242.97.20&amp;#58;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;1&lt;br /&gt;
&amp;#160;&amp;#160;67.141.28.129&amp;#58;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;1&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
telling me there was just 2 such searches during July 2006 (woo).  I seem to remember that search returning more than that at the time I wrote the filter though lol.  You get the idea anyway.&lt;br /&gt;
&lt;br /&gt;
To add a new &#039;filter&#039;, best thing to do is import a sample logfile line you want to produce a result, then customize the script %re variable to include your custom filter.  &lt;br /&gt;
&lt;br /&gt;
For example, say you wanted to search for auth log failures for SSH (this is actually done for you by the periodic utility on FreeBSD if you set it up in /etc/periodic.conf, but that&#039;s another article! - you could write something like this for the %re filter:&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;bb-code-title&quot;&gt;CODE:&lt;/div&gt;&lt;div class=&quot;bb-code&quot;&gt;my&amp;#160;%re=&amp;#40;&lt;br /&gt;
&quot;s9y&quot;=&amp;#62;qr/.&amp;#42;\&amp;#91;REJECTED&amp;#58;&amp;#160;&amp;#91;No&amp;#160;API-created&amp;#160;comments|Trackback&amp;#160;URL&amp;#160;invalid|Filtered&amp;#160;by&amp;#160;Akismet\.com&amp;#93;.&amp;#42;,&amp;#160;IP&amp;#160;&amp;#40;.&amp;#42;?&amp;#41;&amp;#93;.&amp;#42;/,&lt;br /&gt;
#Example&amp;#160;of&amp;#160;logfile&amp;#160;line&amp;#160;we&amp;#160;want&amp;#160;to&amp;#160;catch&amp;#58;&lt;br /&gt;
#&amp;#160;Aug&amp;#160;26&amp;#160;14&amp;#58;57&amp;#58;35&amp;#160;users&amp;#160;sshd&amp;#91;30136&amp;#93;&amp;#58;&amp;#160;Failed&amp;#160;password&amp;#160;for&amp;#160;root&amp;#160;from&amp;#160;211.48.62.102&amp;#160;port&amp;#160;50706&amp;#160;ssh2&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&quot;ssh&quot;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;=&amp;#62;qr/.&amp;#42;Failed&amp;#160;password&amp;#160;for&amp;#160;.&amp;#42;&amp;#160;from&amp;#160;&amp;#40;.&amp;#42;?&amp;#41;&amp;#160;.&amp;#42;/,&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&quot;picasa&quot;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;=&amp;#62;qr/^.&amp;#42;?\s+&amp;#40;.&amp;#42;?&amp;#41;\s+.&amp;#42;%22index\+of%22\+%2F\+picasa\.ini.&amp;#42;/&lt;br /&gt;
&amp;#41;;&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
which would result in:&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;bb-code-title&quot;&gt;CODE:&lt;/div&gt;&lt;div class=&quot;bb-code&quot;&gt;;&amp;#160;log2ipf.pl&amp;#160;-l&amp;#160;ssh&amp;#160;/var/log/auth.log&lt;br /&gt;
&amp;#160;168.126.71.148&amp;#58;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;1&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;210.34.14.53&amp;#58;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;3&lt;br /&gt;
&amp;#160;&amp;#160;84.10.149.105&amp;#58;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;3&lt;br /&gt;
&amp;#160;&amp;#160;211.48.62.102&amp;#58;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;3&lt;br /&gt;
&amp;#160;220.231.54.232&amp;#58;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;3&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;195.10.193.4&amp;#58;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;5&lt;br /&gt;
&amp;#160;213.179.181.26&amp;#58;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;11&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
As I say you can do the equivalent with grep, sed, sort and uniq on the commandline:&lt;br /&gt;
&lt;br /&gt;
&lt;div class=&quot;bb-code-title&quot;&gt;CODE:&lt;/div&gt;&lt;div class=&quot;bb-code&quot;&gt;;&amp;#160;grep&amp;#160;&quot;Failed&amp;#160;password&amp;#160;for&quot;&amp;#160;/var/log/auth.log&amp;#160;|&amp;#160;sed&amp;#160;-e&amp;#160;&#039;s/.&amp;#42;Failed&amp;#160;password&amp;#160;for&amp;#160;.&amp;#42;&amp;#160;from&amp;#160;\&amp;#40;&amp;#91;^&amp;#160;&amp;#93;&amp;#42;\&amp;#41;.&amp;#42;/\1/&#039;&amp;#160;\&lt;br /&gt;
&amp;#160;&amp;#160;|&amp;#160;sort&amp;#160;|&amp;#160;uniq&amp;#160;-c&amp;#160;|&amp;#160;sort&amp;#160;-n&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;1&amp;#160;168.126.71.148&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;3&amp;#160;210.34.14.53&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;3&amp;#160;211.48.62.102&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;3&amp;#160;220.231.54.232&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;3&amp;#160;84.10.149.105&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;5&amp;#160;195.10.193.4&lt;br /&gt;
&amp;#160;&amp;#160;11&amp;#160;213.179.181.26&lt;/div&gt;&lt;br /&gt;
&lt;br /&gt;
But for a very large file the timing differences between this method and the perl script are massive.&lt;br /&gt;
&lt;br /&gt;
Anyhoo this is turning into a crazy long entry so I&#039;ll turn it in.  The script log2ipf.pl - should rename that really since it&#039;s got little to do with ipf really! - is in the extended article below if anyone&#039;s interested.&lt;br /&gt;
 &lt;br /&gt;&lt;a href=&quot;http://freebsd.munk.me.uk/archives/170-Serendipity-Spam-Statistics.html#extended&quot;&gt;Continue reading &quot;Serendipity Spam Statistics&quot;&lt;/a&gt;
    </content:encoded>

    <pubDate>Thu, 31 Aug 2006 13:16:31 +0000</pubDate>
    <guid isPermaLink="false">http://freebsd.munk.me.uk/archives/170-guid.html</guid>
    <creativeCommons:license>http://creativecommons.org/licenses/by/2.5/</creativeCommons:license>
</item>

</channel>
</rss>