Nftables experiments: ICMPv6, Hop-by-Hop Options header

I experimented a little with nftables, even though it's not clear whether it will ever completely replace iptables, especially after the news about bpfilter, but it was an interesting exercise anyways.

After converting some iptables rules, I looked at how others were using nftables to write a more idiomatic configuration; I found some ruleset for a host firewall, and I tried to add them to my configuration.

When testing those rules I found that some ICMPv6 packets were not being matched as I would have expected.

The problem was about this rule:

# Allow multicast listener discovery on link-local addresses.
ip6 nexthdr ipv6-icmp icmpv6 type {
} ip6 saddr fe80::/10 accept

Surprisingly, it was not sufficient to cover Multicast Listener Discovery (RFC2710), in fact the fall-back logs kept showing:

[ 1029.326686] [INPUT]: IN=enp0s10 OUT= MAC=...
                        LEN=76 TC=0 HOPLIMIT=1 FLOWLBL=0
                        PROTO=ICMPv6 TYPE=130 CODE=0

Where PROTO=ICMPv6 TYPE=130 means mld-listener-query.

When I enabled tracing:

ip6 saddr fe80::1 meta nftrace set 1

The packet was identified by the following match:

trace id 6ef0d41b inet filter input packet: iif "enp0s10" ether saddr ... ether daddr ... ip6 saddr fe80::1 ip6 daddr ff02::1 ip6 dscp cs0 ip6 ecn not-ect ip6 hoplimit 1 ip6 flowlabel 0 ip6 nexthdr ip ip6 length 36 ...

In particular the ip6 nexthdr ip part caught my attention.

I then proceeded to capture the traffic with tshark, using a capture filter:

$ sudo tshark -i enp0s10 -f "icmp6" -w /tmp/ICMPv6.pcap

The decoded dump of the packet showed all the needed information:

  Internet Protocol Version 6, Src: fe80::1, Dst: ff02::1
      0110 .... = Version: 6
      Next Header: IPv6 Hop-by-Hop Option (0)
      Hop Limit: 1
      Source: fe80::1
      Destination: ff02::1
      IPv6 Hop-by-Hop Option
          Next Header: ICMPv6 (58)
  Internet Control Message Protocol v6
      Type: Multicast Listener Query (130)

That confirmed that the packet was standard compliant, quoting RFC2710:

MLD message types are a subset of the set of ICMPv6 messages, and MLD messages are identified in IPv6 packets by a preceding Next Header value of 58. All MLD messages described in this document are sent with a link-local IPv6 Source Address, an IPv6 Hop Limit of 1, and an IPv6 Router Alert option [RTR-ALERT] in a Hop-by-Hop Options header.

So the problem had to be in the nftables rule.

I figured that the issue could be about the Hop-by-Hop Options header.

I looked in /etc/protocols and I saw that:

  ip		0	IP			# internet protocol, pseudo protocol number
  hopopt	0	HOPOPT		# IPv6 Hop-by-Hop Option [RFC1883]

So, in this case, the meaning of ip6 nexthdr ip in the nftables trace output is rather ip6 nexthdr hopopt.

After that, I double checked the nftables manual and it says:

Caution when using ip6 nexthdr, the value only refers to the next header, i.e. ip6 nexthdr tcp will only match if the ipv6 packet does not contain any extension headers.

Basically the problem with my rule was that:

  ip6 nexthdr ipv6-icmp icmpv6 type {

was not matching the packet because between the IPv6 header and the ICMPv6 header there was another extension header, the Hop-by-Hop Options header.

Skipping all extension headers with the following rule would work:

  meta l4proto ipv6-icmp icmpv6 type {

But I settled for a more explicit rule which should document better what is going on:

  hbh nexthdr ipv6-icmp icmpv6 type {

A rule to represent the whole headers chain would be valid too, but it would be unnecessarily verbose, I am pasting it just for reference:

  ip6 nexthdr hopopt hbh nexthdr ipv6-icmp icmpv6 type {

BTW, some other great resources about nftables are:

Side notes

I like the nftables syntax, especially the nested form.

Unfortunately some projects still have a hard dependency on iptables, one example is the Virtual Networking mechanism in libvirt. This means that I won't be able to use nftables on my workstation just yet.

I uploaded my nftables ruleset for a host firewall. Even if I won't use nftables, the experience surely helped to improve my iptables setup.

