[lime] Some testing [was: GSoC - Cable purpose autodetection…

Delete this message

Reply to this message
Author: Ilario
Date:  
To: LibreMesh.org project mailing list
Old-Topics: Re: [lime] GSoC - Cable purpose autodetection
Subject: [lime] Some testing [was: GSoC - Cable purpose autodetection]
Hi Pony, hi all,

Il giorno mar 25 giu 2024 alle ore 12:15 pony via LibreMesh
<libremesh@???> ha scritto:
> What I was trying to say is that this is already implemented: https://github.com/libremesh/lime-packages/blob/4bdd010ef8d7182467bb86035e83f922b70d83d5/packages/lime-proto-batadv/files/usr/lib/lua/lime/proto/batadv.lua#L86-L96
> @Ilario, did you mean these error messages, or are there others?
> The mac address is only changed on the vlan sub-interface where batman-adv is running on. Afaik, nothing needs to be reconfigured.


Nope, I referred to another error that we observed last year at
BattleMesh in Calafou.
It happens when two LibreMesh devices are connected by cable to a
switch (of another router, regardless if this third device runs
LibreMesh or not).
You can see this happening in the topology represented here:
https://github.com/libremesh/network-profiles/tree/master/calafou#environment-and-physical-connections
where both routers C and D are connected via cable to router B.
Additionally, both routers B and E are connected via cable to router D.
I recall that speaking to Gio he mentioned that this was a corner case
that could not be managed with the default configuration, and that the
default configuration was like that to handle both common cases of: 1)
connecting one libremesh router to another libremesh router via cable
and 2) connecting a client (e.g. a laptop) to the ethernet port of a
libremesh device.
So this situation requires for sure some configuration.
In Calafou, what we did is configuring some ethernet ports for not
accepting clients, but for doing only mesh, like this:
https://github.com/libremesh/network-profiles/blob/2732653348e01b1e6c243861217a22bf0f7f7f69/calafou/indoor2/root/etc/config/lime-community#L29-L41
"lan1" and "lan2" were the ports that we wanted to use for connecting
other libremesh routers.
The other lines
    list protocols anygw
    list protocols batadv:%N1
    list protocols babeld:17
are simply a copy and paste of the list of default protocols leaving
out the lan one, so that we override the default list removing the lan
protocol (so that lan1 and lan2 ports are not included in br-lan).


On Friday we observed the same old batman-adv warnings in the network
we assembled (Pedro, Bruno and I) at Canodrom in Barcelona.

[109082.265621] br-lan: received packet on eth0.1 with own address as
source address (addr:10:fe:ed:3b:3d:72, vlan:0)
[109082.276203] br-lan: received packet on eth0.1 with own address as
source address (addr:10:fe:ed:3b:3d:72, vlan:0)
[109082.286786] br-lan: received packet on eth0.1 with own address as
source address (addr:10:fe:ed:3b:3d:72, vlan:0)

We are building there a "testbed", a network for testing mesh network
stuff, initially for the BattleMesh community but or the whole mesh
community also.
More info here:
https://www.battlemesh.org/barcelona-testbed
The topology that we thought is this one, all the routers are TP-Link WDR4300:
https://agora.exo.cat/t/propuesta-de-localizaciones-para-el-testbed-canodrom-de-digicoria/262/7

What happened is that the Canodrom has a network technician who
decided to connect all the LibreMesh routers, via their LAN ports, to
the main switch of the building (we did not ask the technician to do
this).
So, the routers could see each other via wifi mesh but the batman
stuff sent on the LAN was also reaching the other nodes.
This caused a lot of network instability.
So we applied the same config we applied in Calafou and the network
become much more stable.
You can see this in the smokeping here (the change happened around 6
PM of Friday the 12th of July):
https://barcelona-testbed.battlemesh.org/smokeping/smokeping.cgi?target=testbed
Here it is very clear to see:
https://barcelona-testbed.battlemesh.org/smokeping/smokeping.cgi?displaymode=n;start=2024-07-12%2000:00;end=2024-07-13%2008:00;target=testbed.ctrl4

The only difference from the solution here and the solution in
Calafou, is that here all 4 LAN ports are named eth0.1 instead of
lan1,lan2,lan3,lan4. So we applied the configuration to the whole of
eth0.1. Maybe in the future we will need to apply this configuration
only to one port of the 4, no idea how to do it. Problem for the
future selves.

Some of these messages were observed in the past but likely already
solved changing the mac address of the dummy0 interface:
https://github.com/libremesh/lime-packages/issues/189

I just replicated the problem at my house connecting 2 LibreMesh
devices (a YouHua WR1200JS and an old Ubiquiti NanoStation LoCo M2 XM
with just 32 MB of RAM, I had to remove some stuff in order to have it
working) running LibreMesh 2024.1-rc1 on OpenWrt 23 to the house
commercial router, via their LAN ports.
I observed these messages in the kernel logs (dmesg):

[ 121.472686] batman_adv: bat0: Possible loop on VLAN -1 detected
which can't be handled by BLA - please check your network setup!

[ 117.539621] br-lan: received packet on bat0 with own address as
source address (addr:d4:5f:25:eb:7e:ac, vlan:0)
[ 117.555507] br-lan: received packet on bat0 with own address as
source address (addr:d4:5f:25:eb:7e:ac, vlan:0)
[ 117.566445] br-lan: received packet on bat0 with own address as
source address (addr:d4:5f:25:eb:7e:ac, vlan:0)
[ 118.340415] mt7530-mdio mdio-bus:1f: port 1 failed to delete
dc:9f:db:37:28:a9 vid 0 from fdb: -2
[ 122.441546] net_ratelimit: 1011 callbacks suppressed

[ 113.499078] br-lan: received packet on eth0 with own address as
source address (addr:dc:9f:db:37:28:a9, vlan:0)
[ 113.512276] br-lan: received packet on eth0 with own address as
source address (addr:dc:9f:db:37:28:a9, vlan:0)
[ 113.524799] br-lan: received packet on eth0 with own address as
source address (addr:dc:9f:db:37:28:a9, vlan:0)
[ 118.420838] net_ratelimit: 427 callbacks suppressed

If we want the autodetection to deal with this, the situation to
identify is not simply "is a LibreMesh router on the other side of the
cable?" as this happens also when 2 LiMe devices are connected via
cable LAN-LAN to a non-LiMe router; but rather "is there any LibreMesh
device reachable from this interface?".

> Maybe LibreMesh could use it's own ipv6 multicast address, like ff12::bee, and use that to discover other LibreMesh nodes?


For this we could use the IPv6 link local ping, and the idea from Pony
to have our own broadcast specific for LibreMesh is amazing :D
Just trying with a router, seems that using ping6 ff02::1%br-lan will
get answers both if there are routers connected via wifi mesh and via
the LANport-switch-LANport. No idea how to distinguish them nor how to
add the new LibreMesh-specific broadcast direction...?

We can continue the discussion on this here and/or on Github:
https://github.com/libremesh/lime-packages/issues/1118

To me, this seems different from
https://github.com/libremesh/lime-packages/issues/1032 , Pony do you
agree or is the same situation?

Ciao!
Ilario