Join Jeff Starr for an in-depth discussion in this video Detect and block bad bots, part of WordPress: Developing Secure Sites .
Here we are looking at our site's root htaccess file. To block some bad bots, grab a copy of the htaccess code that's included in the Exercise Files for this tutorial and then paste the code into the root htaccess file before any existing htaccess rules. This chunk of code is like a virtual control panel for blocking bad bots. In this first section, we block some of the worst known bots via their user-agents. This section is added for your convenience, so you can see how to easily add more user-agents as necessary.
And the last section here is the part that actually does the blocking based on any patterns that are matched in the previous directives. Older versions of Apache will use this first set of rules, while newer versions of Apache will use the second set. And best of all, no upfront editing is required for this code to work. Just save, and upload the file, and you are ready to go. To see this technique in action, let's visit the awesome Request Maker. First let's try accessing our site using a legitimate agent, such as Googlebot.
First we set the Request Type to GET, and we enter the request URL, which is our demo site. Then we delete the Content-Type header and copy the Googlebot user-agent. Then we add a User-Agent request header and paste the Googlebot user-agent as the value. Then we click Submit and scroll down to see the results.
As expected, our site is accessible for good bots, such as Googlebot. 200 OK basically means that the Googlebot, or any good bot, will have normal access. Now let's verify that our htaccess code is working by spoofing a request from one of the blocked bad bots. We return to our FTP client and copy skygrid, for example. Then we return to the Request Maker and replace the Googlebot user-agent with skygrid.
Then once more we click Submit, and perfection: "403 Forbidden" basically means that the request has been blocked. This is exactly the message that we want to send bad bots. A simple response that's easy on server resources. Again, using a plugin to block bad bots may be more convenient, but it requires significantly more resources to get the same result. Using htaccess enables Apache to block the request directly, which is an optimal way of controlling traffic.
So with our bad bot code in place, let's return to the code editor and add a new bot to the list. To do so, we can either add a vertical bar and then the user-agent string, like so, or we could just start a new line and add more bot names like this. Using either of these techniques, we can add as many user-agent strings as is needed to block bad bots. Check out the htaccess notes in the Exercise Files for more details. In this video, we've seen how to block bad bots and user-agents from accessing our website.
Using htaccess instead of a plugin, we are able to block bad bots with greater efficiency and better site performance.
- Backing up and restoring your site
- Setting up strong passwords
- Understanding users and roles
- Choosing trusted plugins and themes
- Changing and recovering passwords
- Configuring authentication keys
- Securing the login page
- Fighting spam in the comments
- Blocking access and detecting hacks
- Building a firewall for WordPress
- Detecting and blocking bots
- Auditing your WordPress security