Bots going "around" Cloudflare to heroku


#1

My domain has an application on heroku, which is pointed to from Cloudflare via CNAME.

The heroku log for a Cloudflare access to this app looks like this:

2018-08-22T23:15:22.877801+00:00 heroku[router]: at=info method=GET path="/mods/2807/details" host=cmx1mods.greenasjade.net request_id=fc4ccc01-782c-4683-9024-11633070d726 fwd="66.249.69.211,172.68.59.28" dyno=web.1 connect=24ms service=25ms status=200 bytes=5552 protocol=http

I can block these with the Cloudflare firewall, by blacklisting ‘66.249.69.21’ in this case.

However, I am getting these:

2018-08-22T23:24:03.161580+00:00 heroku[router]: at=info method=GET path="/sessions/new" host=cmx1mods.greenasjade.net request_id=235bcf43-aa43-4a04-91c0-d066a61e05cf fwd="94.130.18.15" dyno=web.1 connect=0ms service=11ms status=200 bytes=6264 protocol=http

It doesn’t seem to have come through Cloudflare, and blacklisting ‘94.130.18.159’ in Cloudflare doesn’t block it.

What is the deal with that - can I handle these somehow?

Thanks!


#2

After searching around Heroku doesn’t seem to have a true firewall that allows whitelisting certain IP blocks, it also looks like it doesn’t support client certificates so you likely can’t use Authenticated Origin Pull.

One thing you can try is an application-level firewall. Add a middleware or some entry script that checks for the client’s IP address to originate from Cloudflare IPs and return something like a 403 error if the client’s IP address is not from Cloudflare. If you use Node and Express you may be able to use express-ipfilter as your firewall, or use its source code as a reference when building your application-level firewall.


#3

Thanks - I feared as much.

Does this mean that the bot is somehow finding the heroku URL for my app and using that, rather than accessing via the subdomain? It’s the only explanation I can think of. It doesn’t explain why heroku’s log message lists the subdomain as the route being used - I would have thought this would contain the heroku URL if that’s what the bot came in on.

Also, I can’t see how the heroku URL is discoverable :frowning:

Unfortunately this app is a legacy app - still used, but not actively maintained. Even something as simple as updating the robots.txt (which is currently fully permissive) requires a repush and rebuild on heroku which requires a stack version update which … is a can of worms I’ve been avoiding. Adding application level firewalling is well in that bucket! It will trigger the question “so - is this the end?” for this app.


#4

Typically there is no secret URL, they’re likely just accessing your normal public URL but sending the request to the IP address hosting the Heroku server itself.


#5

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.