Hi Ilario, hi all,
On Sunday, 14 July 2024 16:03:52 CEST Ilario wrote:
> I just replicated the problem at my house connecting 2 LibreMesh
> devices (a YouHua WR1200JS and an old Ubiquiti NanoStation LoCo M2 XM
> with just 32 MB of RAM, I had to remove some stuff in order to have it
> working) running LibreMesh 2024.1-rc1 on OpenWrt 23 to the house
> commercial router, via their LAN ports.
> I observed these messages in the kernel logs (dmesg):
>
> [ 121.472686] batman_adv: bat0: Possible loop on VLAN -1 detected
> which can't be handled by BLA - please check your network setup!
>
> [ 117.539621] br-lan: received packet on bat0 with own address as
> source address (addr:d4:5f:25:eb:7e:ac, vlan:0)
> [ 117.555507] br-lan: received packet on bat0 with own address as
> source address (addr:d4:5f:25:eb:7e:ac, vlan:0)
> [ 117.566445] br-lan: received packet on bat0 with own address as
> source address (addr:d4:5f:25:eb:7e:ac, vlan:0)
> [ 118.340415] mt7530-mdio mdio-bus:1f: port 1 failed to delete
> dc:9f:db:37:28:a9 vid 0 from fdb: -2
> [ 122.441546] net_ratelimit: 1011 callbacks suppressed
>
> [ 113.499078] br-lan: received packet on eth0 with own address as
> source address (addr:dc:9f:db:37:28:a9, vlan:0)
> [ 113.512276] br-lan: received packet on eth0 with own address as
> source address (addr:dc:9f:db:37:28:a9, vlan:0)
> [ 113.524799] br-lan: received packet on eth0 with own address as
> source address (addr:dc:9f:db:37:28:a9, vlan:0)
> [ 118.420838] net_ratelimit: 427 callbacks suppressed
>
I find it strange that batmans BLA does not work here. I think we should find out exactly when and why this happens. Unfortunately, I was not able replicate it. I installed LibreMesh24.01-rc1, obtained from
https://downloads.libremesh.org/selector/ on a AVM FRITZ!Box 4040 and MERCUSYS MR70X v1 (both dsa). With default configuration, I connected them using a dumb switch. I then checked the kernel logs but there were no warnings. Here is what the backbone table and the neighbor table looks like:
**AVM FRITZ!Box 4040**
root@LiMe-26aebf:~# batctl bbt
Warning - name already known (changing mac from '9A:C7:F8:77:1B:C4' to 'c2:5e:37:34:aa:15'): LiMe_26aebf_bat0
[B.A.T.M.A.N. adv 2023.1-openwrt-6, MainIF/MAC: eth0_29/02:95:39:26:ae:bf (bat0/c2:5e:37:34:aa:15 BATMAN_IV), group id: 0x78dd]
Originator VID last seen (CRC )
LiMe_8611a8_eth0_29 on -1 0.950s (0x0000)
root@LiMe-26aebf:~# batctl n
Warning - name already known (changing mac from '9A:C7:F8:77:1B:C4' to 'c2:5e:37:34:aa:15'): LiMe_26aebf_bat0
[B.A.T.M.A.N. adv 2023.1-openwrt-6, MainIF/MAC: eth0_29/02:95:39:26:ae:bf (bat0/c2:5e:37:34:aa:15 BATMAN_IV)]
IF Neighbor last-seen
lan1_29 LiMe_8611a8_lan1_29 0.080s
wlan1-mesh_29 LiMe_8611a8_wlan1_mesh_29 1.040s
wlan0-mesh_29 LiMe_8611a8_wlan0_mesh_29 1.840s
```
**MERCUSYS MR70X v1**
root@LiMe-8611a8:~# batctl bbt
Warning - name already known (changing mac from '9A:C7:F8:77:1B:C4' to 'c2:5e:37:34:aa:15'): LiMe_26aebf_bat0
[B.A.T.M.A.N. adv 2023.1-openwrt-6, MainIF/MAC: eth0_29/02:95:39:86:11:a8 (bat0/42:55:60:34:6a:8c BATMAN_IV), group id: 0x78dd]
Originator VID last seen (CRC )
LiMe_26aebf_eth0_29 on -1 6.910s (0x0000)
root@LiMe-8611a8:~# batctl n
Warning - name already known (changing mac from '9A:C7:F8:77:1B:C4' to 'c2:5e:37:34:aa:15'): LiMe_26aebf_bat0
[B.A.T.M.A.N. adv 2023.1-openwrt-6, MainIF/MAC: eth0_29/02:95:39:86:11:a8 (bat0/42:55:60:34:6a:8c BATMAN_IV)]
IF Neighbor last-seen
lan1_29 LiMe_26aebf_lan1_29 1.840s
wlan1-mesh_29 LiMe_26aebf_wlan1_mesh_29 0.160s
wlan0-mesh_29 LiMe_26aebf_wlan0_mesh_29 1.840s
That they have each other in backbone table means they correctly detected that their bat0 interface is bridged to the same lan segment, as described here:
https://www.open-mesh.org/doc/batman-adv/Understand-your-batman-adv-network.html#bridge-loop-avoidance-backbone-table
Normally there should not be a difference between connecting the devices directly vs putting a switch in the middle. Maybe some switches do something with the .1ad vlan tags?