Note I am creating this post hoping that someone that works on R2 at Cloudflare finds this helpful.
I was testing couple of S3 compatible storage providers, R2 being one of them. I have found
mc, the Minio client, to be a more natural tool to interact with any S3 compatible API.
I was testing uploads of larger files which would trigger a multipart upload. I tested against S3, Backblaze B2, and then finally R2. However when I got to testing R2, I was getting the following error from
Failed to copy `**REDACTED**/2022-01-01T05_00_00Z_PT1H.parquet`. The XML you provided was not well formed or did not validate against our published schema.
After digging a little into
mc and running
mc with debug logs, I found that the error message is returned by R2 along with a 400 Bad Request:
POST **REDACTED**/2022-01-01T05_00_00Z_PT1H.parquet?uploadId=ABrSSDpJ9qd09qLB27UAKz%2BMCaCGxKVHtOiVxCvlnNyz%2BAznC4TymYpV6WglQltBUwXMLEzHDCqwjz4ZZaW62gQ02eGm%2FtyB19A2qC%2FyaH2fL%2FEfSS0AMxG%2B60dHrUlaK2aopVS9zs6p62iJehgivRMWINhbX7OS12cXt%2BXlJqxFo8eodxn0fHCJrpaMtz7o%2BSAdAtmia5xWepMr6dXFNeZtpARYjydNvCMWmJWdDi%2BUDkbG2hFePsHLESZD7CXNgSCFGj30K33JrhwiImcfJivc4KomkS8XGigZqTf94Ityn%2FhuCIJKCaarT2HAsPgf4A%3D%3D HTTP/1.1 Host: **REDACTED**.r2.cloudflarestorage.com User-Agent: MinIO (linux; amd64) minio-go/v7.0.27 mc/RELEASE.2022-06-26T18-51-48Z Content-Length: 5144 Authorization: AWS4-HMAC-SHA256 Credential=**REDACTED**/20220703/auto/s3/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=**REDACTED** Content-Type: application/octet-stream X-Amz-Content-Sha256: b0ff115848d1eade087bf77d66c70a30ff93af21b3dfa8ac8dc12d0af9dc74b1 X-Amz-Date: 20220703T101550Z Accept-Encoding: gzip
HTTP/1.1 400 Bad Request Content-Length: 149 Cf-Ray: 724edcd00d693607-MAN Connection: keep-alive Content-Type: application/xml Date: Sun, 03 Jul 2022 10:15:53 GMT Expect-Ct: max-age=604800, report-uri="****" Server: cloudflare <Error><Code>MalformedXML</Code><Message>The XML you provided was not well formed or did not validate against our published schema.</Message></Error>
To check if the XML sent as part of the request was malformed or not, I modified
mc to output the XML before sending it to R2. I found the XML that was triggering the 400 was a
CompleteMultipartUpload message. After comparing with the AWS docs for this message, everything seemed to match up. Here is part of the XML:
<?xml version="1.0"?> <CompleteMultipartUpload xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Part xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <PartNumber>1</PartNumber> <ETag>AC1GJQgYfVqoSXiBAbsceB7AGHhTrGQo5yxb/gKRPa5U8mFjEJ8mY5yXCSWXVCgnSC3htAhdtH4PPIBuk43sZRwyZ2NMI+jdBp277XB6iGWa7gokwzP+5D+qOT3F6yfvoTdlhMCAHlUuZGXLCGMLAaB9nMdUdM40xJenddNWb/PdGI5Ywn5IKKsRFrb4Xwnh85SMp90I7LZxpGuy6J3mqSdYSuqhPh4MndcnHDd4aS6EXmm9lfm2hnISFXgz4Pngog==</ETag> </Part> <Part xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <PartNumber>2</PartNumber> <ETag>APGSqyqEDKN2bG68x/4CykDaifv9wit9RVwXTvC10MDGu+6gXviP4ws1I/SOqb8unzH0WM3QaELjbKshCTw02HA5N6T7vjQ/bsHwAon1dfC9Q4t7S/fis/LTjlY55IIE5fzsSAZPR8BN3T7LdADDiSO7HomrlhTlX5sgajRq0CA1JmP+aqFQENouDpi/U2vvZsrUWxwTxIUMV3pjwdPdJ1xs+V/kU7qT0hCIpO9cItI69ZzrzLx0/M8IdYKlRDYwDw==</ETag> </Part> ... </CompleteMultipartUpload>
At this point, I decided to try uploading the same file using the AWS CLI. This worked as expected. After inspecting the difference between the XML for
CompleteMultipartUpload sent by
aws s3, I noticed that the text body of
ETag was quoted by
aws s3 and left unquoted by
To make sure this was the problem, I modified
mc to quote the body of
ETag and tada, it worked! Here is the XML that worked:
<?xml version="1.0"?> <CompleteMultipartUpload xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Part xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <PartNumber>1</PartNumber> <ETag>"AOCRehBvt+Esfw9HEvgxgFDspcj1uubiJbAqFcY5u6iv0qBikr57n0Qs7FvHR10LBaoIk7VO4y9GCw1GtbaOUgaT+aViejBuJf04w3vOBq0doF3H0SeZFYHkAqWhcX7QUGiwAc+k8R9nP+WVMXSiXGhzIh1jUX/oRNOfp5cQt951JFxuu9G/ft28dajy7HOb7DXoSBSiwqT6hscj3mqQ3N1XvnuHS71zEwAhnxSpAc392hGr6SH6U+41ssuUeM7bKw=="</ETag> </Part> <Part xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <PartNumber>2</PartNumber> <ETag>"AMomqON9e7lrEVVHugMeVOLX74XWpe/CIKabpT8Xbg34iYWfPXUNI/Dz2OfPiswj5E/ZrdCSmGg1nH1lrZKy1mrWD/LP03enTcKkb14wiV4I5UQ2J2HIYX427fYGsIM6yAWWPasz/Py1xWQmTA7ZLtaGnC6DaMX1dBnuFHk3FfQjxLU3evFmHA/bC3Ci06foa8rm7ygFetoTGq9CEbUBBnFjDVj2sOQSmKfGBuAsxKLPQrK6vrnthLPbbo9cHg1jaQ=="</ETag> </Part> ... </CompleteMultipartUpload>
So here is the question: is this something that needs to be changed on the R2 side, it looks like other S3 implementations including S3 itself can handle the
ETag body without needing quotes,
mc works fine with these other providers. On the other hand, I could raise this issue with
mc and maybe get the change for quoting the
ETag body upstreamed, but I feel skeptical about the latter?