Cloudflare Logpush - optimize for AWS Athena

Currently, Logpush will store data in the specified AWS S3 bucket and prefix and it will create a new prefix for each date. The prefixes will be formatted as yyyymmdd so today’s date would be 20201005. In order to query this data using AWS Athena in a partitioned way, you need to manually load each partition via something like

ALTER TABLE cloudflare_logs ADD PARTITION (dt='20201005') location 's3://cloudflare-logs/20201005/'

In AWS Athena, there is a way to make this a bit more automated. If Cloudflare were to allow for the customization of the per-date prefix to write prefixes such as ‘dt=20201005/’, then all partitions could be loaded at once by using MSCK REPAIR TABLE cloudflare_logs.

My specific feedback is to either allow customization of the per-date prefix used for Cloudflare or to have the default per-date prefix specify some key such as dt=20201005.

Best,
Brian

Do you mean a new folder? It does not have to do this. When creating the LogPush job in the UI set the “Daily Subfolders” option to “No subfolders”.

In the context of AWS S3 people usually refer to the folders as prefixes, but they essentially mean the same thing.

I do currently have “Daily subfolders” set up to create a new subfolder each day, but I wanted to be able to customize the naming of those daily subfolders so that they can be more easily indexed with AWS Athena. If Cloudflare were to allow its users to name the daily subfolders as “dt=20201005” rather than the current naming of “20201005”, it would simplify the querying and management of those folders.

For reference: https://docs.aws.amazon.com/athena/latest/ug/partitions.html