I recently set up an API for users to pull information and it’s been a huge success. The API returns information containing info such as Name and ID . I then tell the user they can get the matching image by simply pulling the image from:
mysite.com/pics/ID_Retrieved_From_API.jpg
This has been working fine except today when I woke up to my site being completely down due to it hitting it’s resource limit (shared hosting). I checked my logs an noticed a specific IP pulling thousands of my images:
[08/Feb/2019:07:06:54 +0000] "GET /pics/76600549.jpg HTTP/1.1" 508 224 "-" "python-requests/2.21.0"
I blocked the IP temporarily to bring my site back up. I’m not annoyed at the user since I pretty much encouraged this but I didn’t except someone to pull the images so aggressively. I’m unsure how to proceed now. What would be an optimal way of allowing users to pull images? I was thinking of zipping them all together but the end result would be a file >5gb and I think that would still kill server resources.
I tried rate limiting in Cloudflare but that didn’t seem to work and when I unblocked the IP my resources maxed out again. My rate limit rule was:
mysite.com/pics/*
10 requests per 1 second = Block
But the issue is that I have a front end search database and the user can view 100 items per page and it uses the images there. How can I prevent my front end users being affected but protect myself from someone scraping?
Any advice is appreciated.