Hi Pony, hi all,
On July 26, 2024 5:08:40 PM GMT+02:00, pony via LibreMesh <libremesh@???> wrote:
>Hi Ilario, hi all,
>
>On Sunday, 14 July 2024 16:03:52 CEST Ilario wrote:
>> I just replicated the problem at my house connecting 2 LibreMesh
>> devices (a YouHua WR1200JS and an old Ubiquiti NanoStation LoCo M2 XM
>> with just 32 MB of RAM, I had to remove some stuff in order to have it
>> working) running LibreMesh 2024.1-rc1 on OpenWrt 23 to the house
>> commercial router, via their LAN ports.
>> I observed these messages in the kernel logs (dmesg):
>>
>> [ 121.472686] batman_adv: bat0: Possible loop on VLAN -1 detected
>> which can't be handled by BLA - please check your network setup!
>>
>> [ 117.539621] br-lan: received packet on bat0 with own address as
>> source address (addr:d4:5f:25:eb:7e:ac, vlan:0)
>> [ 117.555507] br-lan: received packet on bat0 with own address as
>> source address (addr:d4:5f:25:eb:7e:ac, vlan:0)
>> [ 117.566445] br-lan: received packet on bat0 with own address as
>> source address (addr:d4:5f:25:eb:7e:ac, vlan:0)
>> [ 118.340415] mt7530-mdio mdio-bus:1f: port 1 failed to delete
>> dc:9f:db:37:28:a9 vid 0 from fdb: -2
>> [ 122.441546] net_ratelimit: 1011 callbacks suppressed
>>
>> [ 113.499078] br-lan: received packet on eth0 with own address as
>> source address (addr:dc:9f:db:37:28:a9, vlan:0)
>> [ 113.512276] br-lan: received packet on eth0 with own address as
>> source address (addr:dc:9f:db:37:28:a9, vlan:0)
>> [ 113.524799] br-lan: received packet on eth0 with own address as
>> source address (addr:dc:9f:db:37:28:a9, vlan:0)
>> [ 118.420838] net_ratelimit: 427 callbacks suppressed
>>
>
>I find it strange that batmans BLA does not work here. I think we should find out exactly when and why this happens. Unfortunately, I was not able replicate it. I installed LibreMesh24.01-rc1, obtained from https://downloads.libremesh.org/selector/ on a AVM FRITZ!Box 4040 and MERCUSYS MR70X v1 (both dsa). With default configuration, I connected them using a dumb switch. I then checked the kernel logs but there were no warnings. Here is what the backbone table and the neighbor table looks like:
>
>**AVM FRITZ!Box 4040**
>
>root@LiMe-26aebf:~# batctl bbt
>Warning - name already known (changing mac from '9A:C7:F8:77:1B:C4' to 'c2:5e:37:34:aa:15'): LiMe_26aebf_bat0
>[B.A.T.M.A.N. adv 2023.1-openwrt-6, MainIF/MAC: eth0_29/02:95:39:26:ae:bf (bat0/c2:5e:37:34:aa:15 BATMAN_IV), group id: 0x78dd]
>Originator VID last seen (CRC )
>LiMe_8611a8_eth0_29 on -1 0.950s (0x0000)
>
>root@LiMe-26aebf:~# batctl n
>Warning - name already known (changing mac from '9A:C7:F8:77:1B:C4' to 'c2:5e:37:34:aa:15'): LiMe_26aebf_bat0
>[B.A.T.M.A.N. adv 2023.1-openwrt-6, MainIF/MAC: eth0_29/02:95:39:26:ae:bf (bat0/c2:5e:37:34:aa:15 BATMAN_IV)]
>IF Neighbor last-seen
> lan1_29 LiMe_8611a8_lan1_29 0.080s
> wlan1-mesh_29 LiMe_8611a8_wlan1_mesh_29 1.040s
> wlan0-mesh_29 LiMe_8611a8_wlan0_mesh_29 1.840s
>```
>
>**MERCUSYS MR70X v1**
>
>root@LiMe-8611a8:~# batctl bbt
>Warning - name already known (changing mac from '9A:C7:F8:77:1B:C4' to 'c2:5e:37:34:aa:15'): LiMe_26aebf_bat0
>[B.A.T.M.A.N. adv 2023.1-openwrt-6, MainIF/MAC: eth0_29/02:95:39:86:11:a8 (bat0/42:55:60:34:6a:8c BATMAN_IV), group id: 0x78dd]
>Originator VID last seen (CRC )
>LiMe_26aebf_eth0_29 on -1 6.910s (0x0000)
>
>root@LiMe-8611a8:~# batctl n
>Warning - name already known (changing mac from '9A:C7:F8:77:1B:C4' to 'c2:5e:37:34:aa:15'): LiMe_26aebf_bat0
>[B.A.T.M.A.N. adv 2023.1-openwrt-6, MainIF/MAC: eth0_29/02:95:39:86:11:a8 (bat0/42:55:60:34:6a:8c BATMAN_IV)]
>IF Neighbor last-seen
> lan1_29 LiMe_26aebf_lan1_29 1.840s
> wlan1-mesh_29 LiMe_26aebf_wlan1_mesh_29 0.160s
> wlan0-mesh_29 LiMe_26aebf_wlan0_mesh_29 1.840s
>
>That they have each other in backbone table means they correctly detected that their bat0 interface is bridged to the same lan segment, as described here: https://www.open-mesh.org/doc/batman-adv/Understand-your-batman-adv-network.html#bridge-loop-avoidance-backbone-table
>
>Normally there should not be a difference between connecting the devices directly vs putting a switch in the middle. Maybe some switches do something with the .1ad vlan tags?
This testing is super useful!!!
Thanks Pony!
I can recall that with "batctl o" I could not see the other device via the ethernet cable.
So something unexpected is doing something wrong, differently from what happened in your tests.
I will not manage to do more testing until the end of August.
At some point in the future I would like to try to reproduce with different devices in the middle (to see if they break the 802.1ad QinQ VLAN packets): I can try with 2 different switches, or a commercial router or a DSA-supported router (YouHua WR1200JS) and a non-DSA supported router (I recovered from my hometown a TP-Link WDR3600).
Also I would like to try using DSA-supported routers at the ends of this 3-devices chain, but I will have to find another of such routers :)
Thanks and ciao!
Ilario