Firewall rules issue

irene · January 22, 2021, 10:36pm

This is my rule but the only thing it stops is python-request

(http.user_agent contains "python-request") or (http.user_agent contains "MJ12bot") or (http.user_agent contains "Barkrowler") or (http.user_agent contains "Proximic" and http.user_agent contains "X-Middleton") or (http.user_agent contains "ADmantX" and http.user_agent contains "X-Middleton") or (http.user_agent contains "SEOkicks") or (http.user_agent contains "oBot" and not http.user_agent contains "Googlebot" and not http.user_agent contains "bingbot" and not http.user_agent contains "ezoic" and cf.client.bot)

Is this because my server has implemented X-Middleton headers?
Also I used block user agent, not working ether.
Any way to fix this?
Thanks in advance

sdayman · January 22, 2021, 11:06pm

It should stop MJ12bot as well, plus those other standalone user_agent checks. That last string of checks doesn’t make sense, unless for some reason there’s a User Agent String that combines several user agents into one…and…the logic on that one is hard to untangle. Why check for contains “obot” and NOT contain “Googlebot” at the same time? If it contains obot, then it can’t contain Googlebot.

freitasm · January 23, 2021, 4:14am

The last block is confusing - since it’s AND for all conditions, it will never be true.

Are you trying to allow known boths through and block the rest? If that’s the case create a rule ALLOW cf.client.bot OR http.user_agent contains “ezoic” and create another rule where you BLOCK the other user agents with the OR blocks.

irene · January 23, 2021, 4:20am

Ok may be this is better:
(http.user_agent contains "python-request") or (http.user_agent contains "MJ12bot") or (http.user_agent contains "Barkrowler") or (http.user_agent contains "Proximic" and http.user_agent contains "X-Middleton") or (http.user_agent contains "ADmantX" and http.user_agent contains "X-Middleton") or (http.user_agent contains "SEOkicks") or (http.user_agent contains "oBot")

However I don’t think that this was the cause that is not blocking

irene · January 23, 2021, 4:23am

Thanks @sdayman and @freitasman in a few hours I should see the result

freitasm · January 23, 2021, 5:39am

Why test for the X-Middleton string in the UA string? Why not just test for “Proximic” or “ADmantX”?

irene · January 23, 2021, 12:08pm

Ok now I returned to this
(http.user_agent contains "python-request") or (http.user_agent contains "MJ12bot") or (http.user_agent contains "Barkrowler") or (http.user_agent contains "Proximic") or (http.user_agent contains "ADmantX") or (http.user_agent contains "SEOkicks") or (http.user_agent contains "oBot")

I add the X-Middleton because I was trying. The problem is that the only bots that are blocked are the ones that don’t use X-Middleton because my server setup X -Middleton headers to allow Ezoic. And almost all bots have X -Middleton in the user agent.

irene · January 24, 2021, 11:22pm

I have some questions:

The firewall rule for blocking bots is blocking the bots that don’t have X-Middleton in the user agent. Do you think that this is because in the origin server is setup X-Middleton headers?
I didn’t check my backlinks for a while. I discover that they were added a lot of backlinks that link to the internal search of my site with pharma keywords. This links are followed by the bots Google Bing, etc.
I had setup a firewall rule, to block the referral traffic from those domains. However as I have allow good bots in another firewall rule I think that they are not going to be blocked those request. Any suggestion?

irene · January 25, 2021, 3:19pm

I need help:

The firewall rule to block traffic from spam referrers site is blocking what its seems to be legit users. I inspect the IPs most of them are not in honeypot. And the links are not as follows.
Example:
Spam Site: https://www.cmaxfanatics.com/forum/showthread.php?page=41&t=304356
Link to my site: Home
Page in my site:

The firewall rule:
(http.referer contains "concerns.sportshouse.com.ph") or (http.referer contains "subglobal.net") or (http.referer contains "unm.org.ua") or (http.referer contains "pinballspares.com.au") or (http.referer contains "emmcforum.com") or (http.referer contains "wallpaper144-781ed.web.app") or (http.referer contains "ucapanbagus.web.app") or (http.referer contains "semogalekasi.web.app") or (http.referer contains "robuxgenerator2018.web.app") or (http.referer contains "robloxjailbreakhackgenerator.web.app") or (http.referer contains "breakingnewstrend.web.app") or (http.referer contains "quizzical-boyd-79e1b7.netlify.app") or (http.referer contains "affectionate-cori-3dc1f3.web.app") or (http.referer contains "affectionate-cori-3dc1f3.netlify.app") or (http.referer contains "bdmedicin.info") or (http.referer contains "fseriesfanatics.com") or (http.referer contains "cmaxfanatics.com") or (http.referer contains "para.inria.fr") or (http.referer contains "forums.subglobal.net") or (http.referer contains "forum.unm.org.ua") or (http.referer contains "pinballspares.com.au") or (http.referer contains "") or (http.referer contains "donia2link.xyz") or (http.referer contains "saldogratispoker.com") or (http.referer contains "spinbotstudio.fr") or (http.referer contains "dubaiescorts24forum.com") or (http.referer contains "palais.beesims.com") or (http.referer contains "primalcarnageforums.com") or (http.referer contains "www.e-tahmin.com") or (http.referer contains "brodzio.pl") or (http.referer contains "euvapor.com") or (http.referer contains "movietato.com") or (http.referer contains "phwow.sk6.ru") or (http.referer contains "plbm.eu") or (http.referer contains "forum.3dnatives.com") or (http.referer contains "edgefanatics.com")

What the firewall rule is actually blocking are links to post of my site, robot.txt, or images ( I have hotlink protection enabled)

If I look in my access log, I see the following:
Example:
13.66.139.2 - - [24/Jan/2021:20:58:27 +0000] "GET /es/?s=%F0 %9F%8E%8D%20www.Getmaple.store%20%F0%9F%8E%8Dviagra%20barata %20online%20espa%C3%B1a/feed/rss2/ HTTP/1.0" 200 12331 "-" " Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/b ingbot.htm) X-Middleton/1"

In my Error Logs:
Example:
[Sun Jan 24 21:13:05.698651 2021] [proxy_fcgi:error] [pid 1030:tid 139875935971072] [client 35.158.99.113:53122] AH01071: Got error 'PHP message: Error Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8mb4_unicode_520_ci,COERCIBLE) for operation 'like' de la base de datos de WordPress para la consulta SELECT SQL_CALC_FOUND_ROWS ar3_2_posts.ID FROM ar3_2_posts WHERE 1=1 AND (((ar3_2_posts.post_title LIKE '% Kamagra oral jelly 50mg kaufen Kamagra 100mg kaufen apotheke erfahrungen\xe2\xa3\x97\xf0\x9f\xa7\xb9 www.WebMD.shop \xf0\x9f\xa7\xb9\xe2\xa3\x97 Cialis generika 5mg rezeptfrei%') OR (ar3_2_posts.post_excerpt LIKE '% Kamagra oral jelly 50mg kaufen Kamagra 100mg kaufen apotheke erfahrungen\xe2\xa3\x97\xf0\x9f\xa7\xb9 www.WebMD.shop \xf0\x9f\xa7\xb9\xe2\xa3\x97 Cialis generika 5mg rezeptfrei%') OR (ar3_2_posts.post_content LIKE '% Kamagra oral jelly 50mg kaufen Kamagra 100mg kaufen apotheke erfahrungen\xe2\xa3\x97\xf0\x9f\xa7\xb9 www.WebMD.shop \xf0\x9f\xa7\xb9\xe2\xa3\x97 Cialis generika 5mg rezeptfrei%'))) AND (ar3_2_posts.post_password = '') AND ar3_2_posts.post_type IN ('post', 'page', 'attachment') AND (ar3_2_posts.post_status = 'publish') ORDER BY (CASE WHEN ar3_2_posts.post_title LIKE '% Kamagra oral jelly 50mg kaufen Kamagra 1...'

What’s wrong?? All your help would be much appreciated.

sdayman · January 25, 2021, 3:28pm

With data like that, it looks like your site has been compromised (EDIT: Apparently not). Along with many others that are trying to form a sort of spam affiliate network. I could be wrong, though.

As for the firewall rule, I’d work it backwards. Block everything that Does Not Contain (your own site, google, and other search engines you want). Then keep an eye on your Firewall Event Log for false positives and add them to your Firewall Rule.

irene · January 25, 2021, 4:53pm

Block everything that Does Not Contain (your own site, google, and other search engines you want).
Every bad search link has my url.

I had setup the following:

Instead of the referrers:
(http.request.uri contains "es/?s=")
This one is from 2 days ago and is not blocking anything
(cf.threat_score gt 5)

This looks really bad, I don’t think this can be rejected with blocking. In the way it behaves it seems that all my traffic is redirected trough spam sites, like if they were a revers proxy. The original user is not a spammer.
I am like David fighting with Goliath, armed with a toothpick.

freitasm · February 12, 2021, 5:42am

irene:

t 35.158.99.113:53122] AH01071: Got error 'PHP message: Error Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8mb4_unicode_520_ci,COERCIBLE) for operation ‘like’ de la base de datos de WordPress para la consulta SELECT SQL_CALC_FOUND_ROWS ar3_2_posts.ID FROM ar3_2_posts WHERE 1=1 AND (((ar3_2_posts.post_title LIKE ‘% Kamagra oral jelly 50mg kaufen Kamagra 100mg kaufen apotheke erfahrungen\xe2\xa3\x97\xf0\x9f\xa7\xb9 www.WebMD.shop \xf0\x9f\xa7\xb9\xe2\xa3\x97 Cialis generika 5mg rezeptfrei%’) OR (ar3_2_posts.post_excerpt LIKE ‘% Kamagra oral jelly 50mg kaufen Kamagra 100mg kaufen apotheke erfahrungen\xe2\xa3\x97\xf0\x9f\xa7\xb9 www.WebMD.shop \xf0\x9f\xa7\xb9\xe2\xa3\x97 Cialis generika 5mg rezeptfrei%’) OR (ar3_2_posts.post_content LIKE ‘% Kamagra oral jelly 50mg kaufen Kamagra 100mg kaufen apotheke erfahrungen\xe2\xa3\x97\xf0\x9f\xa7\xb9 www.WebMD.shop \xf0\x9f\xa7\xb9\xe2\xa3\x97 Cialis generika 5mg rezeptfrei%’))) AND (ar3_2_posts.post_password = ‘’) AND ar3_2_posts.post_type IN (‘post’, ‘page’, ‘attachment’) AND (ar3_2_posts.post_status = ‘publish’) ORDER BY (CASE WHEN ar3_2_posts.post_title LIKE '% Kamagra oral jelly 50mg kaufen Kamagr

Looking at this single entry I would say this is a SQL injection attempt. We cannot be sure this caused your website to be compromised or not as we don’t know how your site handles malformed URLs like this.

As for spam referrers you mentioned, I’d have a couple of questions:

Do the requests impact your site operation e.g. do these requests make your website slower or have other consequence?
Is the parameter ?s= just a search?

irene · February 12, 2021, 6:07pm

What I researched until now is the following:

I fixed my database for “Error Illegal mix of collations”, so I don’t see this in the error log.
I had this kind of request in the access log, they come basically from bots good and bad. What I discovered is that In October there were set up hundreds of links to my site from spam sites and forums. The url of those links my site is as follows:
Example:
Example Domain
This request goes directly to the internal search and the results page shows only the search with no results because I don’t have this type of content. This seems a kind of negative Pharma SEO. I disavowed the links, but I didn’t find anything else to do.
Blocking referrals didn’t work, but since I blocked ASN I have less requests of this kind, from bots. However I can’t avoid Bing doing this request until now.
I scanned my site with several plugins, and it is clean.

freitasm · February 12, 2021, 8:49pm

Ok, looking at the log entry again I see it’s not a SQL injection, just the log for the internal search query, not a non-sanitised input. So that’s #1 done as you said.

Blocking the referrals will not influence the SEO. The only think it will do is reduce the load on your database engine. Disavowing the inbound links was the right thing to do.

Perhaps instead of blocking referrals you should look at creating a rule to block Known Bots from accesing the search page - they should be indexing the content pages anyway, not internal search pages. Something like “Known Bots” and “Search page” Action:BLOCK.

fritex · February 12, 2021, 9:00pm

I am not sure if this can help, but why not using robots.txt file at your origin and block the bad ones and allow only Google bot for further indexing of your sitemap (if exists)?
So, do you would not want to allow indexing your website further and doing harm to your website or domain and your SEO score for that “bad guys” by having it.

See here an good example:

freitasm · February 12, 2021, 9:12pm

The problem is that the majority of “bad bots” simply ignore robots.txt

irene · February 13, 2021, 3:04pm

Ok I changed to the following allow firewall rule:

(not http.request.uri contains “?s=” and cf.client.bot)`

Do you think its ok?

freitasm · February 13, 2021, 9:20pm

Is this rule ALLOW or BLOCK?

You want to allow known bots on all pages except search.

Because of “not” I would change the order to make it easier to understand and make it

(cf.client.bot and http.request.uri contains “?s=”) BLOCK

The only thing you get out of this really is to make sure these internal search results is not indexed. and appear later on public search engines but won’t solve anything else.

irene · February 13, 2021, 10:28pm

Hi,

This rule is allow.
For sure these results are not indexed because I use Yoast plugin, which by default don’t let the internal search results to be indexed.
May be this search request are not harming, but I prefer to not have them.