Top

Blocking Annoying Penetration Test Bots

A firewalld based pipeline to block annoying bots looking for a way in

Getting sick of bot traffic in your web logs? Me too.

The UNIX ideology is based around creating small, single-use tools that do one thing really well and integrate with other things in a logical way. You can use this opinionated approach to find ways to leverage the power of your underlying OS through your web application. As an example, I recently implemented a small pipeline to tackle a problem that most webmasters run into: automated bot traffic. Being a small business, trying to make sense of your web analytics can be rough when more than half of your traffic is from bots looking for common exploits across a wide range of IP addresses - and you’re the target today!

So, how do we approach this? Well, in my case I used a combination of SSH and Web-facing URL reports and created a system of cronjobs to ingest those reports and automatically add the matching IP addresses to my firewall’s reject list on a daily basis. While this is in no way a complete solution, it is still quite effective and I thought I would share it. This project is focused on implementing a solution on a CodeIgniter 4 website running on a Redhat based Linux distro. However, the same basic principle can be applied to any PHP framework running on any Linux distro. You will just have to change out the firewall commands to match your chosen distribution’s built-in firewall.

The CodeIgniter 4 Setup

This tutorial is not intended to be used with a Wordpress website! In fact, most of the things I block are related to Wordpress endpoints. Since I don't use Wordpress - most everything associated with it gets blocked.

The PenetrationWatcher Library Class

All Codeigniter route requests “should” be mapped to a corresponding controller. All CodeIgniter controllers are extensions of the BaseController class. This class offers a method to load additional helpers and other setup utilities using the initController method. I created a simple library class call PenetrationWatcher which looks like the following:

<?php

namespace App\Libraries;


use CodeIgniter\HTTP\RequestInterface;

class PenetrationWatcher {

    protected static $log_path = WRITEPATH."penetrations.json";

    protected static $bad_patterns = [
        "wp-login.php",
        "admin",
        "wp-admin",
        "wp-includes",
        "wp-comments",
        "wp-sitemap",
        "administrator",
        "ads.txt",
        "phpmyadmin",
        "xmlrpc.php",
        "old",
        "new",
        "wordpress",
        "bk",
        "backup",
        "laravel",
        ".env",
        "autodiscover",
        "cgi-bin",
        ".git",
        ".hg",
        "java.lang.String"
    ];

    protected static $unused_methods = [
        'head', 'put', 'patch', 'delete'
    ];

    protected static $bad_agents = [
        'python-requests', 'zgrab', 'masscan-ng', 'GuzzleHttp'
    ];

    protected static $log_data = [];

    public static function init() {
        if (!file_exists(static::$log_path)) {
            $h = fopen(static::$log_path, "w");
            fwrite($h, json_encode([]));
            fclose($h);
        }
        static::$log_data = json_decode(file_get_contents(static::$log_path), true);
    }

    public static function knownIP($ip) {
        return array_key_exists($ip, static::$log_data);
    }

    public static function process(RequestInterface $request) {
        // Entry and management
        $route  = $request->getUri()->getPath();
        $method = $request->getMethod();
        $agent  = $_SERVER['HTTP_USER_AGENT'] ?? "NA";
        $agent  = trim($agent);
        if (static::strpos_arr(static::$unused_methods, strtolower($method))) {
            static::logMethodViolation($request);
            static::save();
        }
        if (static::strpos_arr(static::$bad_patterns, strtolower($route))) {
            static::logRouteViolation($request);
            static::save();
        }
        if (static::strpos_arr(static::$bad_agents, strtolower($agent))) {
            static::logMethodViolation($request);
            static::save();
        }
    }

    public static function logRouteViolation(RequestInterface $request) {
        $route = $request->getUri()->getPath();
        if (static::knownIP($request->getIPAddress())) {
            if (!isset(static::$log_data[$request->getIPAddress()]['routes'])) {
                static::$log_data[$request->getIPAddress()]['routes'] = [$route];
            } else {
                static::$log_data[$request->getIPAddress()]['routes'][] = $route;
            }
        } else {
            static::$log_data[$request->getIPAddress()] = [ 'routes' => [$route] ];
        }
    }

    public static function logMethodViolation(RequestInterface $request) {
        if (static::knownIP($request->getIPAddress())) {
            if (!isset(static::$log_data[$request->getIPAddress()]['methods'])) {
                static::$log_data[$request->getIPAddress()]['methods'] = [$request->getMethod()];
            } else {
                static::$log_data[$request->getIPAddress()]['methods'][] = $request->getMethod();
            }
        } else {
            static::$log_data[$request->getIPAddress()] = [ 'methods' => [$request->getMethod()] ];
        }
    }

    public static function save() {
        $h          = fopen(static::$log_path, "w");
        $content    = json_encode(static::$log_data);
        fwrite($h, $content);
        fclose($h);
        return true;
    }

    private static function strpos_arr(array $haystack, $needle) {
        foreach ($haystack as $focus) {
            if (strpos($needle, $focus) !== false) {
                return true;
            }
        }
        return false;
    }

}

The BaseController initialization

That PenetrationWatcher class is initialized in the BaseController class like this:

public function initController(RequestInterface $request, ResponseInterface $response, LoggerInterface $logger) {

    parent::initController($request, $response, $logger);
    PenetrationWatcher::init();
    PenetrationWatcher::process($request);
    
}

In my case, all my routes are mapped to a controller and will, therefore, pass their Request object through the PenetrationWatcher class for inspection. As you can see, I have listed a small, targeted setup that inspects each request’s:

  • Route path i.e. URL
  • User Agent
  • HTTP Method

Each check has a corresponding static list of string values the request value gets compared to. The URI portion, captured from $route = $request->getUri()->getPath();, gets compared to the list of bad patterns. The User agent, captured from $_SERVER['HTTP_USER_AGENT'] gets compared against a small list of common scraper/bot agents. The HTTP methods, captured from $request->getMethod(), are those which are not in use on my site, and likely indicate a automated probe. Again, these are things that overwhelm my logs, so I have included them here - you will obviously find ways to tailor this to your application’s web traffic. Some are obvious security targets i.e. git repos and .env files. If someone is looking for these in their url request, they are definitely targeting me for sensitive information, so they (the IP address) will be blocked on the next run of my firewall script.

The Command Classes

As you may have noticed, the PenetrationWatcher class is saving this data to the writable folder in a JSON log. The PenetrationWatcher class is strictly a passive sensor that inspects a request and saves any matches that I define to a report. So, now we have to get the data out of this log in a way that my firewall can ingest. I chose this way, because I like to inspect this report in a human-readable way sometimes to get an idea of how many of my patterns are being matched and who (which locale) is performing most of the actions. To achieve this, I created a few commands to inspect the log and prep it for the firewall script. These three commands are:

Security
pen:dump            Will delete the penetration watcher log
pen:ips             Will get IP addresses in a desired format
pen:show            Will display information from the penetration watcher log

The associated command classes associated with the above are:

  • PenetrationDump
  • PenetrationIPs
  • PenetrationShow

PenetrationDump Command Class

This class is simply to empty out the log every once in a while. I would recommend adding a backup feature if you’re a security research nut like me.

Usage:

Usage:
pen:dump

Description:
Will delete the penetration watcher log

Class Definition Code:

<?php

namespace App\Commands;

use CodeIgniter\CLI\BaseCommand;
use CodeIgniter\CLI\CLI;

class PenetrationDump extends BaseCommand
{

    protected $group = 'Security';
    protected $name = 'pen:dump';
    protected $description = 'Will delete the penetration watcher log';
    protected $usage = 'pen:dump';
    protected $arguments = [];
    protected $options = [];

    public function run(array $params) {
        if (!file_exists(WRITEPATH."penetrations.json")) {
            CLI::error("There is no log to dump!");exit;
        }
        unlink(WRITEPATH."penetrations.json");
        CLI::write("Log cleared!", "green");
    }
}

PenetrationIPs Command Class

This class is responsible for providing the standard string of IP addresses that get matched so my firewall script can easily integrate the output into a chain of firewall-cmd invocations.

Usage:

Usage:
pen:ips [options]

Description:
Will get IP addresses in a desired format

Options:
format  json|csv outputs available
silent  Suppress no finding output for iptables

Class Definition Code:

<?php

namespace App\Commands;

use CodeIgniter\CLI\BaseCommand;
use CodeIgniter\CLI\CLI;
use function PHPUnit\Framework\matches;

class PenetrationIPs extends BaseCommand {

    protected $group        = 'Security';
    protected $name         = 'pen:ips';
    protected $description  = 'Will get IP addresses in a desired format';
    protected $usage        = 'pen:ips [options]';
    protected $arguments    = [];
    protected $options      = [
        'format' => 'json|csv outputs available',
        'silent' => 'Suppress no finding output for iptables'
    ];

    public function run(array $params) {
        if ($format = CLI::getOption('format')) {
            if (!in_array($format, ['json', 'csv'])) {
                CLI::error("You must specify a format of either: json or csv.");exit;
            }
        }
        $log = json_decode(file_get_contents(WRITEPATH."penetrations.json"), true);
        if (empty($log)) {
            if (!CLI::getOption('silent')) {
                CLI::write("Hoorah! There is nothing to show."); exit;
            }
            exit;
        }
        $output = [];
        if ($format) {
            switch ($format) {
                case 'json':
                    $output = json_encode($log);
                    break;
                case 'csv';
                    // TODO: Make CSV if you need it
                    break;
            }
            CLI::write($output);
        } else {
            foreach ($log as $ipaddress => $value) {
                CLI::write($ipaddress);
            }
        }
    }
}

PenetrationShow Command Class

This class is primarily a way for me to do a quick inspection of what’s in the log at any given time. I created some basic command switches to refine how the output is displayed.

Usage:

Usage:
pen:show <mode> [options]

Description:
Will display information from the penetration watcher log

Arguments:
mode  [ip|route] Will determine output either by IP or by Target.

Options:
route  Show all logged issues using a certain route
ip     Show all logged issues using a certain IP address

Class Definition Code:

<?php

namespace App\Commands;

use CodeIgniter\CLI\BaseCommand;
use CodeIgniter\CLI\CLI;

class PenetrationShow extends BaseCommand
{

    protected $group = 'Security';
    protected $name = 'pen:show';
    protected $description = 'Will display information from the penetration watcher log';
    protected $usage = 'pen:show <mode> [options]';
    protected $arguments = [
        'mode' => '[ip|route] Will determine output either by IP or by Target.'
    ];
    protected $options = [
        'route' => 'Show all logged issues using a certain route',
        'ip'    => 'Show all logged issues using a certain IP address'
    ];

    public function run(array $params) {
        if (empty($params)) {
            CLI::error("You must specify a mode of either: ip or route.");exit;
        }
        $mode = $params[0];
        if (!in_array($mode, ['ip', 'route'])) {
            CLI::error("You must specify a mode of either: ip or route.");exit;
        }
        $log = json_decode(file_get_contents(WRITEPATH."penetrations.json"), true);
        if (empty($log)) {
            CLI::write("Hoorah! There is nothing to show."); exit;
        }
        if ($mode == 'ip') {
            foreach ($log as $ipaddress => $value) {
                $route_offenses     = (isset($value['routes'])) ? count($value['routes']) : 0;
                $method_offenses    = (isset($value['methods'])) ? count($value['methods']) : 0;
                CLI::write("{$ipaddress} has been flagged for {$route_offenses} routes and {$method_offenses} methods.");
                // Value is either routes or methods
                if ($route_offenses > 0) {
                    CLI::write("Routes flagged:");
                    foreach ($value['routes'] as $route) {
                        CLI::write($route);
                    }
                }
                if ($method_offenses > 0) {
                    CLI::write("Methods flagged:");
                    foreach ($value['methods'] as $method) {
                        CLI::write($method);
                    }
                }
                CLI::write(str_repeat("=",10));
            }
        }
        if ($mode == 'route') {
            $routes = [];
            foreach ($log as $ip_address) {
                if (isset($ip_address['routes'])) {
                    foreach ($ip_address['routes'] as $route) {
                        if (!array_key_exists($route, $routes)) {
                            $routes[$route] = 1;
                        } else {
                            $routes[$route] += 1;
                        }
                    }
                }
            }
            asort($routes);
            foreach ($routes as $r => $count) {
                CLI::write("{$r} ({$count})");
            }
        }
    }
}

Integrating our Sensor Data into FirewallD

So, with all the data getting collected, now we need to get this into our server’s firewall. I use a Redhat based distribution, so in this example, I am using FirewallD. CodeIgniter commands have an annoying feature that spits out the header, which shows the CodeIgniter version and some other miscellaneous information. To remove this header, you need to call your command with the --no-header flag. Now that we have exposed a CLI endpoint to get our bad-actor IPs, we can run a cronjob at your desired interval to call the command, parse the response, and add the targets to our reject or drop list.

#!/usr/bin/bash

ZONE="FedoraServer"
badweb=$(/usr/bin/php /var/www/html/yourci4site/spark pen:ips --no-header);
echo "Processing bad web actor IP addresses from our Codeigniter4 Site"
for IP in $badweb;
do
  echo "Adding $IP to shit list...";
  firewall-cmd --permanent --zone=$ZONE --add-rich-rule='rule family="ipv4" source address="'$IP'" destination address="127.0.0.1" reject'
  #iptables -I INPUT -s $IP -j DROP;
done;

#Check bad SSH logins file collected by cronjob
badlogin=$(cat /root/bad-ips.log | awk '{$1=$1;print}' | cut -d" " -f 2);
echo "Processing bad SSH login actor IP addresses"
for IP in $badlogin;
do
  echo "Adding $IP to shit list...";
  firewall-cmd --permanent --zone=$ZONE --add-rich-rule='rule family="ipv4" source address="'$IP'" destination address="127.0.0.1" reject'
  #iptables -I INPUT -s $IP -j DROP;
done;
echo "Reloading firewalld"
firewall-cmd --reload
I have included the iptables command to achieve the same as an additional example. Keep in mind though, iptables is not permanent and to be honest, I kept running into issues with my chain getting reset even though I run this daily.

Bonus Script to identify bad SSH login attempts

As you may have noticed in my example above, I reference an SSH report log. In my example above, this is the list of IP addresses that ram my SSH port all the time with bad login attempt. You can get this information using the following script:

lastb | cut -f3 | grep -P "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" -o | sort | uniq -c | sort -rn

Wrapping it up

This is just one way to address this issue. It is by no means the only way to do it. I also recommend using a log analysis application like Graylog or OpenSearch to inspect your Apache log traffic for other anomalous requests. Security is a process, and you need to stay vigilant to make sure you are being proactive, not reactive. It’s hard to balance sometimes. On one hand, you want legitimate customers to have access to your website, but you also want to keep bad guys out. My initial reason for doing this was to help clear up my Apache logs of clutter from automated bots. In the process, I learned about vulnerabilities floating out there in the wild through the requests I was capturing. This process helped me stay informed and conscious of my attack surface.

Think I might be a good fit for your project?

Let's get the conversation started!