How To Block Bots By User-agent

 

Why you should block some crawling bots

 

The activity of crawling bots and spider bots of well-known search engines usually does no matter site load and does not affect a website's work speed. But the most part of crawling bots is not helpful, moreover, they harm the site performance. 

For example, bots like DotBot or Semrush. We have experienced these bots sent so many requests to the site, so it was like a small DDoS attack effect. This led to a heavy overload of the site and the server, and the site was inaccessible to other visitors.

We strongly recommend blocking overly active bots if your site has more than 100 pages, especially if your account has already exceeded the provided load limits.

 

Two ways to block harmful bots

 

1. Using CleanTalk Anti-Spam plugin with Anti-Flood and Anti-Crawler options enabled. 

 

This way is preferred because the plugin detects bot activity according to its behavior. Any bot with high activity will be automatically redirected to 403 for some time, independent of user-agent and other signs. Web crawling bots such as Google, Bing, MSN, Yandex are excluded and will not be blocked.

More information about the options: https://cleantalk.org/help/anti-flood-and-anti-crawler 

Installation guide: https://cleantalk.org/help/install-wordpress

 

2. Using .htacces for apache servers or nginx.conf file for Nginx. 

 

We do not recommend using these methods. Note, a too-large list records in .htaccess will slow down the web-server work!

 

 

How to block popular crawling bots using .htacces file for Apache and nginx.conf for Nginx

 

1. How to block Baidu bot

 

Click to open the spoiler and to know how to block Baidu

Using .htaccess:

Add this code to the end of .htaccess file:

# block baidu bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} baidu [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block baidu bot nginx
if ($http_user_agent ~* (baidu|baidubot) ) {
return 403;
}

 

2. How to block AhrefsBot

 

Click to open the spoiler and to know how to block AhrefsBot

Using .htaccess:

Add this code to the end of .htaccess file:

# block AhrefsBot bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block AhrefsBot bot nginx
if ($http_user_agent ~* (AhrefsBot) ) {
return 403;
}

 

3. How to block MJ12bot

 

Click to open the spoiler and to know how to block MJ12bot

Using .htaccess:

Add this code to the end of .htaccess file:

# block MJ12bot bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block MJ12bot bot nginx
if ($http_user_agent ~* (MJ12bot) ) {
return 403;
}

 

4. How to block Detectify bot

 

Click to open the spoiler and to know how to block Detectify

Using .htaccess:

Add this code to the end of .htaccess file:

# block detectify bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} Detectify [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block detectify bot nginx
if ($http_user_agent ~* (Detectify) ) {
return 403;
}

 

5. How to block DuckDuckGo bot

 

Click to open the spoiler and to know how to block DuckDuckGo

Using .htaccess:

Add this code to the end of .htaccess file:

# block DuckDuckGo bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} DuckDuckGo [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block DuckDuckGo bot nginx
if ($http_user_agent ~* (DuckDuckGo) ) {
return 403;
}

 

6. How to block Semrush bot

 

Click to open the spoiler and to know how to block Semrush

Using .htaccess:

Add this code to the end of .htaccess file:

# block Semrush bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} semrush [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block Semrush bot nginx
if ($http_user_agent ~* (semrush) ) {
return 403;
}

 

7. How to block Seznam bot

 

Click to open the spoiler and to know how to block Seznam

Using .htaccess:

Add this code to the end of .htaccess file:

# block Seznam bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} seznam [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block Seznam bot nginx
if ($http_user_agent ~* (seznam) ) {
return 403;
}

 

8. How to block Zgrab bot

 

Click to open the spoiler and to know how to block Zgrab

Using .htaccess:

Add this code to the end of .htaccess file:

# block Zgrab bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} zgrab [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block Zgrab bot nginx
if ($http_user_agent ~* (zgrab) ) {
return 403;
}

 

9. How to block Petalbot bot

 

Click to open the spoiler and to know how to block Petalbot

Using .htaccess:

Add this code to the end of .htaccess file:

# block Petalbot bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} petalbot [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block Petalbot bot nginx
if ($http_user_agent ~* (petalbot) ) {
return 403;
}

 

10. How to block Jorgee bot

 

Click to open the spoiler and to know how to block Jorgee

Using .htaccess:

Add this code to the end of .htaccess file:

# block Jorgee bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} jorgee [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block Jorgee bot nginx
if ($http_user_agent ~* (Jorgee) ) {
return 403;
}

 

11. How to block Yandex bot

 

Click to open the spoiler and to know how to block Yandex

Using .htaccess:

Add this code to the end of .htaccess file:

# block Yandex bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} yandex [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block Yandex bot nginx
if ($http_user_agent ~* (yandex) ) {
return 403;
}

 

12. How to block Dotbot

 

Click to open the spoiler and to know how to block Dotbot

Using .htaccess:

Add this code to the end of .htaccess file:

# block Dotbot bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} dotbot [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block Dotbot bot nginx
if ($http_user_agent ~* (dotbot) ) {
return 403;
}

 

13. How to block Sogou bot

 

Click to open the spoiler and to know how to block Sogou

Using .htaccess:

Add this code to the end of .htaccess file:

# block Sogou bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} sogou [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block Sogou bot nginx
if ($http_user_agent ~* (sogou) ) {
return 403;
}

 

14. How to block multiple bots at the same time

 

Click to open the spoiler and to know how to block multiple bots at the same time

Using .htaccess:

Add this code to the end of .htaccess file:

# block bot htaccess
<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} baidu [NC]
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC]
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC]
RewriteRule .* - [F,L]
</IfModule>


Using nginx.conf:

Add this code HTPP{} section of nginx.conf:

#block bot nginx
if ($http_user_agent ~* (baidu|baidubot|AhrefsBot|MJ12bot) ) {
return 403;
}

 

You can block any user-agent you need. See the list of known crawlers.

Click to open the list of known crawling bots

yandex 
baidu
petalbot
semrush
Cliqzbot
SurdotlyBot
zgrab
Jorgee
dotbot
seznam
duckduckgo
sogou
exabot
AhrefsBot
InterfaxScanBot
SputnikBot
SolomonoBot
MJ12bot
Detectify
Riddler
omgili
socialmediascanner
Jooblebot
SeznamBot
Scrapy
CCBot
linkfluence
veoozbot
Leikibot
Seopult
Faraday
hybrid
Go-http-client
SMUrlExpander
SNAPSHOT
getintent
ltx71
Nuzzel
SMTBot
Laserlikebot
facebookexternalhit
mfibot
OptimizationCrawler
crazy
Dispatch
ubermetrics
HTMLParser
musobot
petalbot
filterdb
InfoSeek
omgilibot
DomainSigma
SafeSearch
CommentReader
meanpathbot
statdom
proximic
spredbot
StatOnlineRuBot
openstat
DeuSu
semantic
postano
masscan
Embedly
NewShareCounts
linkdexbot
GrapeshotCrawler
Digincore
NetSeer
help.jp
PaperLiBot
getprismatic
360Spider
Ahrefs
ApacheBench
Aport
Applebot
archive
BaiduBot
Baiduspider
Birubot
BLEXBot
bsalsa
Butterfly
Buzzbot
BuzzSumo
CamontSpider
curl
dataminr
discobot
DomainTools
DotBot
Exabot
Ezooms
FairShare
FeedFetcher
FlaxCrawler
FlightDeckReportsBot
FlipboardProxy
FyberSpider
Gigabot
gold crawler
HTTrack
ia_archiver
InternetSeer
Jakarta
Java
JS-Kit
km.ru
kmSearchBot
Kraken
larbin
libwww
Lightspeedsystems
Linguee
LinkBot
LinkExchanger
LinkpadBot
LivelapBot
LoadImpactPageAnalyzer
lwp-trivial
majestic
Mediatoolkitbot
MegaIndex
MetaURI
MJ12bot
MLBot
NerdByNature
NING
NjuiceBot
Nutch
OpenHoseBot
Panopta
pflab
PHP/
pirst
PostRank
ptd-crawler
Purebot
PycURL
Python
QuerySeekerSpider
rogerbot
Ruby
SearchBot
SemrushBot
SISTRIX
SiteBot
Slurp
Sogou
solomono
Soup
spbot
suggybot
Superfeedr
SurveyBot
SWeb
trendictionbot
TSearcher
ttCrawler
TurnitinBot
TweetmemeBot
UnwindFetchor
urllib
uTorrent
Voyager
WBSearchBot
Wget
WordPress
woriobot
Yeti
YottosBot
Zeus
zitebot
ZmEu


Perhaps it would also be interesting