Argo-ingress not working anymore. lookup cftunnel.com on 10.0.0.10:53: no such host;


#1

Hi,

I have argo-ingress deployed in my cluster from https://github.com/cloudflare/cloudflare-ingress-controller

I had cloned the repo and installed using:

helm install --name $RELEASE_NAME --namespace $NS --set rbac.install=$USE_RBAC --set secret.install=true,secret.domain=$DOMAIN,secret.certificate_b64=$CERT_B64 --set image.pullPolicy=Always ./chart --debug

Recently (past 4 days), I am seeing 503 issues:

503 Service Unavailable
The origin has been unregistered from Argo Tunnel

This was resolved by restarting the pod. Going through the logs of these pods, there wasn’t any error suggesting that the tunnel was closed. I could still see Validation ok for running default/httpbin/httpbin with 1 endpoint(s).

Today however, all my ingresses have stopped working. I get the usual log of ingress, creating tunnel and validation ok. But I don’t see the log message which states the PoP connected to (ex: connected to SIN). I just see:

controller.go:722] Starting tunnel to url httpbin.default:80
controller.go:698] Validation ok for running default/httpbin/httpbin with 1 endpoint(s)
controller.go:635] created tunnel for ingress httpbin, default/httpbin/httpbin

Running cloudflared directly with --hello-world works and creates the appropriate CNAME records. I ssh’d into the pod and ran argot manually. Then I deleted all pods to a service, which is caught by argot as endpoint unavailable. When the pod is re-created by the replication-controller, I see the following events:

controller.go:307] Watching endpoint default/httpbin
controller.go:694] Endpoints not ready for tunnel default/httpbin/httpbin
controller.go:502] Error processing update:default/httpbin: at least one error occured handling update:default/httpbin: lookup cftunnel.com on 10.0.0.10:53: no such host
controller.go:694] Endpoints not ready for tunnel default/httpbin/httpbin
controller.go:307] Watching endpoint default/httpbin
controller.go:698] Validation ok for running default/httpbin/httpbin with 1 endpoint(s)
controller.go:722] Starting tunnel to url httpbin.default:80

To diagnose the lookup error, I ran the following commands from another pod in same namespace:

/ # dig +trace a cftunnel.com

; <<>> DiG 9.11.3 <<>> +trace a cftunnel.com
;; global options: +cmd
.                       241933  IN      NS      m.root-servers.net.
.                       241933  IN      NS      b.root-servers.net.
.                       241933  IN      NS      c.root-servers.net.
.                       241933  IN      NS      d.root-servers.net.
.                       241933  IN      NS      e.root-servers.net.
.                       241933  IN      NS      f.root-servers.net.
.                       241933  IN      NS      g.root-servers.net.
.                       241933  IN      NS      h.root-servers.net.
.                       241933  IN      NS      i.root-servers.net.
.                       241933  IN      NS      a.root-servers.net.
.                       241933  IN      NS      j.root-servers.net.
.                       241933  IN      NS      k.root-servers.net.
.                       241933  IN      NS      l.root-servers.net.
.                       241933  IN      RRSIG   NS 8 0 518400 20180825050000 20180812040000 41656 . AKpBAC+GLUffj3ssEoEkbd03Kcsq+yKvzaLIorw4kcwWeXGiD7zvECyb 74erZSoeA25J4W75bUyetwOEj+JVoTey5mPxQGyIR2t5sRKrHdKDJiSs BsW5gvayV/m+3BltYSQhUEihzbmEcj6JZLCAZxlH1C7KyXeOInDK5XYg epSMumair6RiMNaIm7zH74jFG5BiIjXo/oAprDiPP5oWqBMJNOgkdAvz LZNENPFweTEskKzXOsTP3V0MQxqxPcmTbe4G3WEAkrD7TiFJZLK/1nWZ NgFZ5IcTGo/QxgWEiycfyaM2sdqXHQ+JMptSrJvcfYWnPxM+Z7YfDQ6z koaD5A==
;; Received 525 bytes from 10.0.0.10#53(10.0.0.10) in 0 ms

com.                    172800  IN      NS      a.gtld-servers.net.
com.                    172800  IN      NS      b.gtld-servers.net.
com.                    172800  IN      NS      c.gtld-servers.net.
com.                    172800  IN      NS      d.gtld-servers.net.
com.                    172800  IN      NS      e.gtld-servers.net.
com.                    172800  IN      NS      f.gtld-servers.net.
com.                    172800  IN      NS      g.gtld-servers.net.
com.                    172800  IN      NS      h.gtld-servers.net.
com.                    172800  IN      NS      i.gtld-servers.net.
com.                    172800  IN      NS      j.gtld-servers.net.
com.                    172800  IN      NS      k.gtld-servers.net.
com.                    172800  IN      NS      l.gtld-servers.net.
com.                    172800  IN      NS      m.gtld-servers.net.
com.                    86400   IN      DS      30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766
com.                    86400   IN      RRSIG   DS 8 1 86400 20180825050000 20180812040000 41656 . rTOR/bcUhIjlLufuHmwodcGHCV1T3McqK08tTtHgBwmGUS/CAxD7LE0l R8RRsuhu1F3En6MIbz69/RLWURm8S69QIPkrXLMXko+k5bW2IWJatPe3 IxeswJl9gN2/oKsHD4UnpLJ+amLSUH0gZ44yFQpiyelRpg+GYHk2L6r2 4yjBi2+Gz0wVTmZZeY3GeluTstIAu/35LPhiLEiwJwf7WfhAQWka+noE 60I+5Ne21l74Gffkn5S5UmEOc+eihtj1v/RADyO2F6p/KuazVLwjbD/b RDlqG93y+ObsmxJwnRLg2eEt4+hnl7MLRXMyYsboShIeH0ox+5r+nW23 0gLkwA==
;; Received 1172 bytes from 193.0.14.129#53(k.root-servers.net) in 128 ms

cftunnel.com.           172800  IN      NS      kevin.ns.cloudflare.com.
cftunnel.com.           172800  IN      NS      marjory.ns.cloudflare.com.
cftunnel.com.           86400   IN      DS      2371 13 2 E628814E43E924733990D4B7DA1B2E3E42107ECB754AEDAA01B97D20 85648728
cftunnel.com.           86400   IN      RRSIG   DS 8 2 86400 20180818043630 20180811032630 46475 com. JRdrMEBKm2CXW8tev8FY8j6YrzA5M9eJQD6LIKJU826OOXL5McJ7yTsD 0R4jMg89xTLd7KCmHYoHkDNvONxwD9nNOhZvQ2O1BSzvuO5H89ADfdth MAv/jqeDEG1hmkIeXww0DzyLmoEbEfN6qAjB7nEAA215wjNmGR2yDik2 cE8=
;; Received 396 bytes from 192.26.92.30#53(c.gtld-servers.net) in 270 ms

cftunnel.com.           30      IN      SOA     kevin.ns.cloudflare.com. dns.cloudflare.com. 2028576619 10000 2400 604800 30
cftunnel.com.           30      IN      NSEC    \000.cftunnel.com. NS SOA HINFO MX TXT AAAA LOC SRV CERT SSHFP RRSIG NSEC DNSKEY TLSA HIP CDS CDNSKEY OPENPGPKEY SPF
cftunnel.com.           30      IN      RRSIG   SOA 13 2 30 20180813194652 20180811174652 35273 cftunnel.com. l5det6ze+36Ek8me415LC5cIK6j1GU4eVAoQmqaLk6NBEOzmZiXBfHZW x6isewP2UWYQ0QKUzKTb5YJzXTOw3w==
cftunnel.com.           30      IN      RRSIG   NSEC 13 2 30 20180813194652 20180811174652 35273 cftunnel.com. wGvw362LmbifiPDprnMsLx//ou+BoWxWK9VVtLTky6Uov87Xm0+L0YRu r8zcXHaXCu0mlCiMrEJy07fBeRsgEw==
;; Received 360 bytes from 173.245.58.193#53(marjory.ns.cloudflare.com) in 64 ms

/ # dig a cftunnel.com

; <<>> DiG 9.11.3 <<>> a cftunnel.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44398
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;cftunnel.com.                  IN      A

;; AUTHORITY SECTION:
cftunnel.com.           29      IN      SOA     kevin.ns.cloudflare.com. dns.cloudflare.com. 2028576619 10000 2400 604800 30

;; Query time: 69 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Sun Aug 12 18:47:26 UTC 2018
;; MSG SIZE  rcvd: 101

/ # dig cftunnel.com ANY

; <<>> DiG 9.11.3 <<>> cftunnel.com ANY
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8216
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;cftunnel.com.                  IN      ANY

;; ANSWER SECTION:
cftunnel.com.           3788    IN      HINFO   "ANY obsoleted" "See draft-ietf-dnsop-refuse-any"
cftunnel.com.           3788    IN      RRSIG   HINFO 13 2 3789 20180813194828 20180811174828 35273 cftunnel.com. IQ58x0PakpDhBFlgtukJzI37ti6kCgE81B0Lc3P1RNX4ABcubbnLylAq BGhaMuSxh8t3AV/+IkJOFHkG4WTsCA==

;; Query time: 69 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Sun Aug 12 18:48:28 UTC 2018
;; MSG SIZE  rcvd: 207

Looks like the lookup of an A record to cftunnel.com is not working. Or some other issue with argot.

Any help with this issue is greatly appreciated.

Thanks,
Shantanu


#2

We are having a similar the same issue, the origin is not created in the traffic page, no matter how hard I try.
Quick help would be really appreciated


#3

I got the same and due to tunnels falling over a lot actually have a script checking and restarting the argo pod if necessary. I found though recently that something had changed so I upgraded the container image to

gcr.io/stackpoint-public/argot:20180812-c

with the hash 7a55aedbf0b1

which did work but today subsequently disappeared from the repo!

I was able to retrieve a version of this working container image from one of the hosts however and retagged and pushed to public docker hub as

barrymac/argobackup

It got my tunnels back up and running but we suffered some down time while I was frantically retrieving the image.

Latest stable one 0.5.1 is not working for me.


#4

Are you all still experiencing issues, or is it resolved?


#5

I had to remove argo-ingress and add a static IP with a LoadBalancer to our production backend. I haven’t yet re-enabled it on our staging. I’ll try and let you know by tomorrow. Is 0.5.1 the image tag we should be using?

Also, I’m interested in knowing the technicalities of the issue (if you guys found something).


#6

I need to make sure the issue is still happening and is reproducable before I ask engineers to jump off of other tasks. So let me know the latest status and reproduction steps and I’ll get it filed right away.