Bug Report: Cloudflare crawls commented-out script tags and seems to get relative paths wrong

Edit: See sdayman’s reply before reading this. It’s not Cloudflare’s crawler. Please ignore this post :slight_smile:

I noticed in my server monitoring software that nginx was returning a lot of 404s, and looking into the logs I see the requests were coming from Cloudflare’s IPs (like 162.158.62.114 and 162.158.63.249, for example).

I had some old commented-out script tags on my template page (that’s used to render almost all pages on the website) like this:

<!--
<script src="js/thing1.js"></script>
<script src="js/thing2.js"></script>
<script src="js/thing3.js"></script>
<script src="js/thing4.js"></script>
-->

So the first bug here is that Cloudflare is attempting to crawl files that don’t exist, but that’s not a huge deal, because it would just be 4 requests.

These files are predictably stored in example.com/js/thing1.js

But the crawler was tens of thousands of requests like this:

https://example.com/page-on-my-website/js/thing1.js
https://example.com/another-page-on-my-website/js/thing1.js
https://example.com/different-page-on-my-website/js/thing1.js
...

If I remember correctly the interpretation of something like src="js/thing4.js" depends on whether there’s a trailing slash on the page being loaded. My site doesn’t have trailing slashes on page URLs so I think it should be treated as example.com/js/thing1.js.

I’m not sure though - just quickly reporting this in case there are bugs here. It’s an easy fix for me - I just removed the commented-out script tags and added a preceding slash to the remaining ones like this: src="/js/thing4.js".

Cheers!

This isn’t Cloudflare crawling your site. Your server is not restoring the original IP address of the actual visitor.

2 Likes

Ohh I see - thank you! So it’s an external crawler I guess. Sorry about that!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.