Adding robot.txt to a worker?

I have a worker that is used to set the headers. This has allowed us to pass our internal compliance scan. Now this internal scan is looking for the robot.txt file. How can I add this to my worker? What is the easiest way to implement this using Cloudflare? I’m a novice so some examples or link would be great

Since I’m lazy, I’d create a separate Worker just for robots.txt:

async function handleRequest(request) {
  const init = {
    headers: {
      'content-type': 'text/html;charset=UTF-8',
      'cache-control': 'max-age=31536000',
      'X-Frame-Options': 'SAMEORIGIN',
      'Referrer-Policy': 'no-referrer',
      'content-security-policy': 'upgrade-insecure-requests',
      'X-XSS-Protection': '1; mode=block',
    },
  }
  return new Response(someHTML, init)
}
addEventListener('fetch', event => {
  return event.respondWith(handleRequest(event.request))
})
const someHTML =  `
User-agent: *
Disallow:
`
3 Likes

We have a external compliance scan tool hosted internally. It scans for headers, robot.txt, CSP etc… We have 2 identical applications that returned the same exact results before we added one of them into Access. We have a robot.txt file set at the application level for both DNS The DNS using Access is now failing the robot.txt value and not presenting the headers we set in a worker. This same worker is showing the proper values for the DNS not using Access, Ideally we’d like use the worker to resolve the robot.txt.

Hi Sdayman,

You are soooooo lazy…….Thanks very much I appreciate it!!!. I have another issue with this same Access URL and the worker. Since we added this DNS to Access our standard security headers are blockd in our scan. This same worker presents the proper value with URLs not added to Access. Can this this issue be resolved using a worker?

Thanks

Are you talking about Cloudflare’s “Access” app for password protecting URLs? If so, then you can add a “Bypass” policy for the IP address of your scanner.

But where would i find this robots.txt file in Cloudflare

Cloudflare does not host files. Robots.txt should be in the root directory of your website. But if you’re coding in Workers, you can use Workers to respond to requests for robots.txt.

1 Like

Hello Everyone!
I need help! As my website is built using click funnels and is hosted on Cloudflare, now I am facing a problem in finding or modifying the robots.txt file in Cloudflare. Would you help me in finding the steps to edit/add the robots.txt file?

Thank you for the reply!

Hey sdayman, i have done with the coding part of the robots.txt file for the worker’s developer mode, but i am facing difficulty in deploying the robots.txt file changes to the live site. Would you help me how can i do this? I need your help. Any suggestion.
Thanks in advance!

Hello sdayman

I have used you worker with the robot.tx, but for some applications when implemented the actual robot.tx presents itself instead of the web page. Any idea why?

Thanks

Hello Sdayman,

I also need a worker that can remove the below cookie using a worker. The vendor cannot remove this or the app will stop worker, so we wanted to remove on the edge with a worker if possible.

Headers for URL https://account.activedirectory.windowsazure.com/applications/signin/application/60eeace9-003b-4f8e-b61b-fc2a5452ac60?tenantId=f3211d0e-125b-42c3-86db-322b19a65a22&SAMLRequest=lVJLj9MwEP4rufnk%2BpEmtFZTVG2FVGlBaHfhwAVNnEnXIrGDPQH23%2BNkhVguK3Gd%2BR4z38whwThM5jTTo7%2FD7zMmKk4pYSQX%2FE3waR4x3mP84Sx%2Burtt2CPRlIwQHRCASwRX3IC16GmOuLFhFFdHA7RrR8xZKQnI6mIxEhaGoQX7jRXn7OQ8LDZ%2FRccnmKa0GZ2NIYWeVr3krt55kTuDsytB1BIRLO65lGXLt%2F0OeVurlvdWQ7WtNNhaviX04OnSNX2pleokcqWrjNa25Lu6a3mpdav2UFegNSsu54Z93QL2uMWa129KlaEt8l1m8S4TlNbbvYZ9hqY048XnFT01TEutuNRc1g%2BqMpUylfzCis958XU3vZGs%2BDUOPpklgobN0ZsAySXjYcRkyJr70%2Ftbk4EG%2FkT%2FkjK9zplioGDDwI6HBW3W6eLxvw51EC%2Bph%2Ben%2BJCtLuePIcf%2BVJyGIfy8iQiEDaM4IyvehTgCvT7cUnEd71eomZZQUj4MMXF89vz3946%2FAQ%3D%3D (ignored because domain different to parent)

That robots.txt will only work in domains where you’ve added that Worker as a Route.

I two issues

  1. For some applications I add the below worker and we see the actual robot.txt in a browser. Is there and adjustment I can make in the worker to stop this from happening.

  1. Is related to my last request and maybe I did not explain it clearly. We have an internal scan tool that checks for our approved headers and cookies. In the scan we are seeing an unapproved cookie. The cookie is below the vendor is not able to remove the below cookie or the app would stop working. I have added the worker we adjusted and added to the route, which is presenting the proper headers. We need to adjust the worker to block, bypass or conceal the below cookie on the edge. Is that possible?

Vendor cookie
Headers for URL https://account.activedirectory.windowsazure.com/applications/signin/application/60eeace9-003b-4f8e-b61b-fc2a5452ac60?tenantId=f3211d0e-125b-42c3-86db-322b19a65a22&SAMLRequest=lVJLj9MwEP4rufnk%2BpEmtFZTVG2FVGlBaHfhwAVNnEnXIrGDPQH23%2BNkhVguK3Gd%2BR4z38whwThM5jTTo7%2FD7zMmKk4pYSQX%2FE3waR4x3mP84Sx%2Burtt2CPRlIwQHRCASwRX3IC16GmOuLFhFFdHA7RrR8xZKQnI6mIxEhaGoQX7jRXn7OQ8LDZ%2FRccnmKa0GZ2NIYWeVr3krt55kTuDsytB1BIRLO65lGXLt%2F0OeVurlvdWQ7WtNNhaviX04OnSNX2pleokcqWrjNa25Lu6a3mpdav2UFegNSsu54Z93QL2uMWa129KlaEt8l1m8S4TlNbbvYZ9hqY048XnFT01TEutuNRc1g%2BqMpUylfzCis958XU3vZGs%2BDUOPpklgobN0ZsAySXjYcRkyJr70%2Ftbk4EG%2FkT%2FkjK9zplioGDDwI6HBW3W6eLxvw51EC%2Bph%2Ben%2BJCtLuePIcf%2BVJyGIfy8iQiEDaM4IyvehTgCvT7cUnEd71eomZZQUj4MMXF89vz3946%2FAQ%3D%3D (ignored because domain different to parent)

Current header used in route

let securityHeaders = {
	"Content-Security-Policy" : "upgrade-insecure-requests",
	"Strict-Transport-Security" : "max-age=1000",
	"X-Xss-Protection" : "1; mode=block",
	"X-Frame-Options" : "DENY",
	"X-Content-Type-Options" : "nosniff",
	"Referrer-Policy" : "strict-origin-when-cross-origin",
}

let sanitiseHeaders = {
	"Server" : "My New Server Header!!!",
}

let removeHeaders = [
	"Public-Key-Pins",
	"X-Powered-By",
	"X-AspNet-Version",
]

addEventListener('fetch', event => {
	event.respondWith(addHeaders(event.request))
})

async function addHeaders(req) {
	let response = await fetch(req)
	let newHdrs = new Headers(response.headers)

	if (newHdrs.has("Content-Type") && !newHdrs.get("Content-Type").includes("text/html")) {
        return new Response(response.body , {
            status: response.status,
            statusText: response.statusText,
            headers: newHdrs
        })
	}

	Object.keys(securityHeaders).map(function(name, index) {
		newHdrs.set(name, securityHeaders[name]);
	})

	Object.keys(sanitiseHeaders).map(function(name, index) {
		newHdrs.set(name, sanitiseHeaders[name]);
	})

	removeHeaders.forEach(function(name){
		newHdrs.delete(name)
	})

	return new Response(response.body , {
		status: response.status,
		statusText: response.statusText,
		headers: newHdrs
	})
}

What’s your Worker route? You should just need to create a Worker route for www.example.com/robots.txt, but not www.example.com/*.

1 Like