How to copy cloudeflare https site using WinHttrack


#1

Hi Team,

We want to copy complete site using WinHttrack, How to do that.
After moving to cloudeflare we are not able to copy. Please suggest is there any way or alternative way to copy.

Thanks


#2

The best way to copy a site is to download all files directly from the server.


#3

If we download manually directly from server we can’t get external resources which will downloaded from different places like pdf’s, videos,mp3 …etc.

If incase site was huge we can’t search all resources to get it from server.

Thanks,
Krishna MM


#4

Are you getting a particular error mesage?


#5

HTTrack3.48-22+htsswf+htsjava launched on Thu, 23 Nov 2017 13:46:10 at www.fridaylaw.com +.css +.js -ad.doubleclick.net/* -mime:application/foobar +* +.gif +.jpg +.jpeg +.png +.tif +.bmp +.zip +.tar +.tgz +.gz +.rar +.z +.exe +.mov +.mpg +.mpeg +.avi +.asf +.mp3 +.mp2 +.rm +.wav +.vob +.qt +.vid +.ac3 +.wma +.wmv
(winhttrack -qiC2%Ps2u1%s%uN0%I0p3DaK0H0%kf2A25000%f#f -F “Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)” -%F “” -%l "en, " www.fridaylaw.com -O1 “C:\My Web Sites\aa\www.fridaylaw.com” +.css +.js -ad.doubleclick.net/ -mime:application/foobar +* +.gif +.jpg +.jpeg +.png +.tif +.bmp +.zip +.tar +.tgz +.gz +.rar +.z +.exe +.mov +.mpg +.mpeg +.avi +.asf +.mp3 +.mp2 +.rm +.wav +.vob +.qt +.vid +.ac3 +.wma +.wmv )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may contain sensitive information,
such as username/password authentication for websites mirrored in this project
do not share these files/folders if you want these information to remain private
13:46:11 Warning: Moved Permanently for www.fridaylaw.com/robots.txt
13:46:11 Warning: Redirected link is identical because of ‘URL Hack’ option: www.fridaylaw.com/robots.txt and https://www.fridaylaw.com/robots.txt
13:46:11 Warning: Warning moved treated for www.fridaylaw.com/robots.txt (real one is https://www.fridaylaw.com/robots.txt)
13:46:11 Warning: Moved Permanently for www.fridaylaw.com/
13:46:11 Warning: Redirected link is identical because of ‘URL Hack’ option: www.fridaylaw.com/ and https://www.fridaylaw.com/
13:46:11 Warning: File has moved from www.fridaylaw.com/ to https://www.fridaylaw.com/
13:46:11 Warning: No data seems to have been transferred during this session! : restoring previous one!


#6

Cloudflare will treat such requests as an attack. Also throttling your requests makes the task lengthy. One solution is turning off CF and do the job then enable protection again.


#7

We understand the risk.
We have huge list of sites and resources . So we can’t turning off CF always when client requested us to copy or to keep a copy.

Is there any way to allow using any header/proxy …etc with secured manner, So that we can control to limited users internally.
Please keep your reply into private or send to my email directly if we have any solution. Really it is necessary for us.

Thanks,
Krishna M


#8

Maybe if you Whitelist your IP address in Cloudflare’s Firewall settings, you won’t trigger any attack warnings.

As for your pull command, it looks like one issue is you didn’t specify https://, so a lot of your requests are being redirected from http to https.


#9

Thanks Sdayman,
We will try to Whitelist our IP address and let you know the result.
Even I tried to add https but with the error “Forbidden” (403)

10:46:52 Error: “Forbidden” (403) at link https://www.fridaylaw.com/ (from primary/primary)

HTTrack3.48-22+htsswf+htsjava launched on Fri, 24 Nov 2017 10:46:51 at https://www.fridaylaw.com +.css +.js -ad.doubleclick.net/* -mime:application/foobar +* +.gif +.jpg +.jpeg +.png +.tif +.bmp +.zip +.tar +.tgz +.gz +.rar +.z +.exe +.mov +.mpg +.mpeg +.avi +.asf +.mp3 +.mp2 +.rm +.wav +.vob +.qt +.vid +.ac3 +.wma +.wmv
(winhttrack -qC2%Ps2u1%s%uN0%I0p3DaK0H0%kf2A25000%f#f -F “Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)” -%F “” -%l "en, " -r1p0C0I0t https://www.fridaylaw.com -O1 “C:\My Web Sites\aa\www.fridaylaw.com” +.css +.js -ad.doubleclick.net/ -mime:application/foobar +* +.gif +.jpg +.jpeg +.png +.tif +.bmp +.zip +.tar +.tgz +.gz +.rar +.z +.exe +.mov +.mpg +.mpeg +.avi +.asf +.mp3 +.mp2 +.rm +.wav +.vob +.qt +.vid +.ac3 +.wma +.wmv )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may contain sensitive information,
such as username/password authentication for websites mirrored in this project
do not share these files/folders if you want these information to remain private
10:46:52 Warning: HTML file (620 bytes) retransferred due to lack of cache: https://www.fridaylaw.com/
10:46:52 Error: “Forbidden” (403) at link https://www.fridaylaw.com/ (from primary/primary)
10:46:52 Warning: No data seems to have been transferred during this session! : restoring previous one!