Unbound forwarding to 1.1.1.1#853 stopped working

I have had a working configuration of Unbound forwarding DNS over TLS to 1.1.1.1 for the past few months but things started failing today. Not all domains/websites, but most of them started failing in resolving.

[Edit: This is from an AWS EC2 host. I have a similar setup at home on a raspberry pi, which is still working fine; no resolution issues]

An example below.

➜  ubuntu ~  dig m.schwab.com @127.0.0.1 -p5335                                                            

; <<>> DiG 9.16.1-Ubuntu <<>> m.schwab.com @127.0.0.1 -p5335
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 57228
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;m.schwab.com.                  IN      A

;; Query time: 420 msec
;; SERVER: 127.0.0.1#5335(127.0.0.1)
;; WHEN: Mon Jul 06 22:39:29 UTC 2020
;; MSG SIZE  rcvd: 41

Unbound logs

unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.1.1.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was CNAME
unbound[9737]: [1594074834] unbound[9737:0] info: resolving lms-auth.schwab.com. A IN
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. AAAA IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.1.1.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was CNAME
unbound[9737]: [1594074834] unbound[9737:0] info: resolving lms-auth.schwab.com. AAAA IN
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. A IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.1.1.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was THROWAWAY
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. A IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.0.0.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was THROWAWAY
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. A IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.1.1.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was THROWAWAY
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. A IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.1.1.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was THROWAWAY
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. A IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.1.1.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was THROWAWAY
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. A IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.0.0.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was THROWAWAY
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. AAAA IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.1.1.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was THROWAWAY
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. AAAA IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.0.0.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was THROWAWAY
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. AAAA IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.1.1.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was THROWAWAY
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. AAAA IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.0.0.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was THROWAWAY
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. AAAA IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.0.0.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was THROWAWAY
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. AAAA IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.1.1.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was THROWAWAY
unbound[9737]: [1594074834] unbound[9737:0] info: response for lms-auth.schwab.com. AAAA IN
unbound[9737]: [1594074834] unbound[9737:0] info: reply from <.> 1.0.0.1#853
unbound[9737]: [1594074834] unbound[9737:0] info: query response was THROWAWAY

Relevant unbound config

forward-zone:
    name: "."
    forward-addr: [email protected]#cloudflare-dns.com
    forward-addr: [email protected]#cloudflare-dns.com
    forward-ssl-upstream: yes

I can reach both 1.1.1.1 & 1.0.0.1 consistently (ping/traceroute), and I can telnet both the address at port 853.

Hi,

Can you increase the log verbosity, see what you can find there? I’m not able to reproduce it locally.

Better to follow the readme-first, to grab more information, especially the data center unbound connects to, by “dig CHAOS TXT id.server @1.1.1.1”.

Thank you for trying to reproduce this. I switched back to clouldflare (had pointed it to a different resolver) and can’t reproduce it at my end either anymore. FWIW, the issue persisted for more than 3 hours across 2020-07-06/2020-07-07 UTC. I might still have more verbose logs from unbound during that time, if you are still interested.

Also,

➜  ubuntu ~  dig CHAOS TXT id.server @1.1.1.1                           

; <<>> DiG 9.16.1-Ubuntu <<>> CHAOS TXT id.server @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12277
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;id.server.                     CH      TXT

;; ANSWER SECTION:
id.server.              0       CH      TXT     "IAD"

;; Query time: 0 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Thu Jul 09 00:00:05 UTC 2020
;; MSG SIZE  rcvd: 52

Detailed unbound log would be helpful to know what error it saw from 1.1.1.1. Unfortunately, we don’t know the root cause at the moment. Maybe you could try to collect some logs next time when it happen again(I wish not).