Facebook unable to scrape website correctly

#Facebook is unable to scrape data correctly from my website

http://brindisi.tk in FB debugger pulls very old data and and https://brindisi.tk pulls outdated data
most of internal links (strangely all but one!) either cause CURL error no. 56 or “Check that the webserver is running, and that there are no firewalls blocking Facebook’s crawlers.”

e.g.


https://developers.facebook.com/tools/debug/sharing/?q=https%3A%2F%2Fwww.brindisi.tk%2Fpage%2F2%2F

or

kinda work even tho it has a CURL error but can’t scrape image correctly

I have been already whitelisting Facebook ASNs
AS32931
AS32934
AS63293

And that improved crawl ability just a little bit on very few pages.

Does anyone have an idea on how to solve this?

@cloonan you seem to be the “great master of Facebook scraping on Cloudflare” hopefully you know how to solve this, sir?

Post a full page screenshot of https://dash.cloudflare.com/redirect?zone=ssl-tls/edge-certificates

hello @sandro grazie for replying to me here it is

Hmm, alright. My assumption was your minimum SSL version is set too high (i.e. 1.3) but that does not seem to be the case.

However, it now seems to work, except for that one warning.

Did you make any changes?

it actually works only on that specific page try the homepage http://brindisi.tk or https://brindisi.tk or another internal page like https://brindisi.tk/sequestri-record-per-i-carabinieri-di-fasano/

and you’ll see a miscellanea of errors this being the worst one where nothing at all gets crawled

That URL seems to work too

However it would appear as if it always takes a few attempts to actually get it sucessfully crawled.

Do you have any rate limiting configured, either on Cloudflare or on your server?

still missing the image tho … nope no rate limiting on edge or origin server that I am aware of …and I should be aware of that :slight_smile:

that specific page now works like a charm …I’m even more confused

It always takes a few attempts and then it seems to work fine, hence the question about rate limiting?

Do you have any (related) entries whatsoever in your Cloudflare firewall event log?

no rate limiting on my edge/origin servers

the weird thing is that I have been trying to scrape those pages for a week and nothing happened…it seems like when I try to scrape them they don’t work but when someone else tries they start working… it doesn’t make sense to me it makes me look like I’m a fool…to be sure twice would you mind trying to scrape http://brindisi.tk ? (NO https)

this is what I scrape and obviously is totally wrong:

Event log is clear

It would seem as if the entire .tk domain had some DNS issues with their root servers

https://dnsviz.net/d/brindisi.tk/dnssec/

That could explain that erratic issue

I just ran it and got the cached values

Forcing it made the picture disappear

I remember a recent thread (cant find it right now I am afraid) where similar issues were reported about a .tk domain.

well that’s something at least…not a solution but definitely a clue

That would suggest nothing is blocked on Cloudflare’s side.

At this point I’d really rather attribute it do a DNS issue with .tk than an issue on an HTTP level. Would you have another non-tk domain which you could temporarily swap in for yours?

if I had one I definitely wouldn’t use a .tk :smiley: I will be trying to contact them about making available DNSSEC configurations on .tk and file a complaint to ICANN if it doesn’t work

This topic was automatically closed after 14 days. New replies are no longer allowed.