Fetch of csv file from Google Colab via python pandas errors in 403 works otherwise

What is the name of the domain?

What is the error number?

403

What is the error message?

HTTP Error 403: Forbidden

What is the issue you’re encountering

When fetching a CSV file from a Google Colab Notebook server I get an HTTP 403 error

What steps have you taken to resolve the issue?

I have a short piece of python pandas code to fetch a csv file from our website hosted from Cloudflare. This code works without problems when run on most machines. When the same code is run on the Google Colab Notebook environment and thus comes from a Google IP address, then the fetch fails with an HTTP error 403.

Setting the User Agent of the HTTP request to anything other than the default makes the error go away for Google Colab. This makes me think that a DDOS mitigation is kicking in.

We would like this file to be downloadable regardless of the user agent and regardless of origin of the request.

The security events view reports no events.

We have turned off all bot protection settings on Cloudflare we could find. Most of our website is static HTML in a CF Pages project with a few cloudflare functions in the mix. The CSV file is served via a CF function. Security → Security Settings is on “Essentially off”. Most Managed Rules are set to off (the remaining ones all seem unrelated and not applicable to us).

We also tried to set up a custom WAF rule to match on paths ending with .csv (ends_with(http.request.uri.path, “.csv”)) and setting the option to “skip”. This made the CSV requests show up in the Security Events view (both those coming from google colab and others that come in organically) with “Action taken: skip”, but the request still failed with the 403 error. The default User Agent that is used by Pandas shows up in these events as “Python-urllib/3.10”.

It seems to me that some kind of DDOS rule is still kicking in. Since our website is a public good and these csv files are important data, we’d really like to allowlist them so that python users can always fetch these CSV files even from google colab without having to customize the User Agent. Please let us know if there is more we can do. Thank you!

Was the site working with SSL prior to adding it to Cloudflare?

Yes

What is the current SSL/TLS setting?

Full (strict)

What are the steps to reproduce the issue?

Open the Notebook at Google Colab and run it:

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.