The NetBSD® net80211 Renewal Project

ETA and why we are not there yet.

sys/net80211
renewal

Martin Husemann <martin@NetBSD.org>

What is this about?

  • NetBSD -current only supports 802.11a, b, and g
  • The code is not NET_MPSAFE
  • There is no support for multiple wifi networks per hardware device

  • The original Renewal project

  • Sync the code with FreeBSD-current
  • Keep it easily syncable again later (and regularily sync)
  • Feed back all required changes to non-NetBSD specific code to FreeBSD
  • Expected Benefits

  • SMP ready code
  • Support for newer 802.11 standards and higher rates
  • The code is already structured for easy use outside of FreeBSD and well encapsulated
  • The code base is relatively static/quiet
  • Expected Porting Issues

  • Many USB drivers involved, the USB stacks are vastly different
  • Many NetBSD USB drivers have received empirical fine tuning on various architectures that are not easily available for testing
  • ... so we tried to keep as much of them untouched as possible
  • Throwing them all out and replacing them with the FreeBSD ones wholesale was discussed but rejected
  • The Early Years

  • Phil Nelson started first (TNF funded) porting work in a CVS branch back in 2018. Result was a linkable kernel.
  • I joined in 2020 (TNF funded)
  • James Browning (back then student of Phil) joined in 2022
  • Nathanial Sloss joined in 2022
  • Vihas Makwana worked on it during GSoC 2022
  • Soon after me starting the work, we moved it to a mercurial “topic” to avoid CVS merges and for testing our (new) mercurial infrastructure.

    This helped shake out the infrastructure a lot, but did not speed up the wifi project.

    First Successes

    All relevant userland changes were made, ifconfig(8) needed special treatment.

    We got a few PCI and USB based drivers converted. Details are documted on the wiki:
    https://wiki.netbsd.org/wifi_driver_state_matrix/

    Obstacles Solved...

    We found several differences between FreeBSD and NetBSD that had not been clear up front. Besides others:

  • The “IC” lock in FreeBSD is a big hammer lock per wifi hardware instance. It is allowed to lock recursively. There is no equivalent locking primitive in NetBSD.
  • Memory allocation in FreeBSD wifi stack is still using the old style malloc(9) primitives. NetBSD has moved on (where feasible) to the Solaris-style kmem(9) primitives and pools.
  • NetBSD has very strict locking assertions (and we are building the wifi test kernels with all of them enabled). This caught several locking errors in the original code.
  • ... more Obstacles Solved

  • ioctl locking rules in NetBSD are incosistent and a mess.
  • bpf handling is slightly different.
  • link state changes are handled differently (and have different locking rules).
  • struct ifnet is separate from struct ethercom in NetBSD, but some code assumes cast-equivalence:
    To cleanup all over the NetBSD network stack:
    
    		     struct ifnet *ifp;
    		     struct ethercom *ec = (struct ethercom *)ifp;
    		   
    We gave the VAP a struct ethercom for now.
  • debug output/logging is different (format strings vs. printing addresses)
  • ... and “Homemade” Obstacles

  • the state of the NetBSD drivers we started with was not as good as expected.
  • Many devices supported by old NetBSD drivers are hard to obtain nowadays.
  • Many NetBSD USB drivers have received empirical fine tuning on various architectures that are not easily available for testing
  • Nearly every hardware I received caused kernel diagnostic assertion failures, crashes, or was unable to create a working network connection in my test lab setup (using a -current NetBSD kernel, i.e. the non-converted driver).
  • This is even more true for non-USB hardware (anyone rember cardbus or pcmcia?)
  • It works - mostly done?

    With one of the now supported devices we could boot a system, configure VAPs and have wifi connectivity.

    
    	[~] martin@h-pulse > ifconfig -a
    	enet0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
            capabilities=0x3ff00<IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx>
            capabilities=0x3ff00<UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx,TCP6CSUM_Tx>
            capabilities=0x3ff00<UDP6CSUM_Rx,UDP6CSUM_Tx>
            enabled=0
            ec_capabilities=0x1<VLAN_MTU>
            ec_enabled=0
            address: 00:00:41:a7:3a:f1
            media: Ethernet autoselect (1000baseT full-duplex)
            status: active
            inet6 fe80::200:41ff:fea7:3af1%enet0/64 flags 0 scopeid 0x1
            inet 192.168.149.48/22 broadcast 192.168.151.255 flags 0
    	lo0: flags=0x8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33624
            status: active
            inet6 ::1/128 flags 0x20<NODAD>
            inet6 fe80::1%lo0/64 flags 0 scopeid 0x2
            inet 127.0.0.1/8 flags 0
    	[~] martin@h-pulse > sysctl net.wlan
    	net.wlan.debug = 0
    	net.wlan.debug_console = 0
    	net.wlan.devices = urtwn0
    	[~] martin@h-pulse > su
    	Password:
    	[~] root@h-pulse # ifconfig wlan0 create wlandev urtwn0
    	[~] root@h-pulse # ifconfig wlan0
    	wlan0: flags=0x8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
            ssid ""
            powersave on (100ms sleep)
            address: 3c:33:00:5e:70:96
            media: IEEE802.11 autoselect (autoselect mode 11b)
            status: no network
    	[~] root@h-pulse # ifconfig wlan0 up
    	[~] root@h-pulse # 
       

    Preparing for others to help

    We documented the changes required from an “old” driver to the new stack on the wiki:
    https://wiki.netbsd.org/Converting_drivers_to_the_new_wifi_stack/

    Technical Helpers

    For USB ethernet interfaces we have usbnet(4), which is a bit simmilar to FreeBSD's iflib(4), but speciallized for USB devices.

    We created a special version for USB wifi interfaces, usbwifi(9). This eases conversion of pure USB wifi device drivers, but it is (currently) unclear if it helps for mulit-bus devices like ath(9).

    Real World Testing...

    When testing the system more intensively / in real world use, we quickly found we are not there yet.

    The locking differences in link state change handling and multicast ioctl handling caused various diagnostic assertion failures or deadlocks.

    This took some serious amount of time to fully analyze and debug.

    We moved a lot of events to task context (via a workqueue(9)) and are still dealing with structuring the ioctl locking systematically.

    The big “IC-lock” was hold over usb calls that could sleep, which lead to various deadlock scenarios.

    ... showing homemade problems?

    The IC lock is recursive in FreeBSD, so they do not see the same deadlocks - but it is unclear to me if they are just lucky most of the time or how their drivers avoid other fallout in this scenario.

    We added additional locks to our drivers to protect different parts of the softc and hardware access.

    This is worth another review pass when everything is ready to be merged!

    Meanwhile at FreeBSD

    After being a quite stable/quiet upstream for years, the FreeBSD wifi team started doing major work/enhancements again.

    Recently they started improving locking and requiring more work from individual drivers. We expect this to help a lot, as we had to modify/extend the big-hammer “IC-lock” single lock per hardware model for link status changes and simmilar events in NetBSD already.

    A while ago they decided to avoid re-doing fast changing Linux drivers and instead try to use their code as directly as possible.

    Having used simmilar techniques in our DRM/KMS import project, we decided to follow.

    Sync and Import!

    Phil synced our hg topic with a newer FreeBSD version and started working on the LinuxKPI layer, reusing parts of DRM/KMS.

    He also imported the iwlwifi(4) driver which uses the LinuxKPI shim.

    This is close to working now.

    Open Issues

    We have two places where locking decisions are maded based on the mutex_owned() predicate, which is a nonstarter. This happens:

  • some ioctl(4) handling (did I mention locking is a mess there?)
  • some cases of beacon handling
  • Both cases will be fully analyzed and solved ASAP, we expect the result will be a actually mostly working state for all devices with already converted drivers.

    TODO

  • iwlwifi(4) and the LinuxKPI wrapper are work in progress
  • the latter needs to be coordinated with ongoing DRM/KMS updates
  • all the remaining “old” drivers need to be converted and tested
  • we have a few “new” drivers like mtw(4) (Metdiatek MT7601) ported from OpenBSD but to the new FreeBSD wifi stack) that need review, testing and merging.
  • The Plan...

  • Once all “old” USB drivers, and a few major PCI drivers are converted, merge the branch into -current.
  • Disable all non-converted drivers at that point and do further conversions/testing in -current.
  • If needed, even branch for netbsd-12 with some unconverted drivers.
  • If the top point can not be achieved quickly (or we have other reasons to branch netbsd-12 early, like a major DRM/KMS update ready for primetime) we may delay the merge for post-12.

    And the Bright! Future

  • So NetBSD 12 should come with the new WiFi stack and all(most all) drivers that NetBSD 11 had
  • We will start adding more USB drivers for modern chipset where docs or reference drivers are available
  • We will also port more LinuxKPI drivers from FreeBSD
  • We plan to feed back all required changes, after cleanup and minimizing them
  • History (2): Hindsight

  • The original FreeBSD code (at least in sys/net80211) was not as fine grained locked as we expected.
  • The current state in NetBSD was far worse than expected.
  • The idea to avoid degressions with the switch over as much as possible did not match well to the challenging availability of test systems and hardware.
  • Lessions learned

  • We should have had far better understanding of the imported code
  • ... but that is hard to get without a debuggable system where you can get logs for various situations and backtraces easily
  • We should have had a full grown plan how things would look like (especailly locking) after the import.
  • Making the “IC-lock” non-recursive (differeing from upstream) migth have been a bad idea.
  • ... but given the experience in this project which fed my personal hate for recursive locks even more - i would do it again!
  • What could have been done differently?

  • Drop the idea of keeping most/all “old” drivers during switchover and accept (temporary) losses - especially since most drivers worked relatively bad.
  • Have a relatively short sprint (like maybe 3-6 month) and then force results into -current
  • Throw more developer-power at it (at whatever costs it would have needed).
  • Avoid mixing the project with SCM experiments.
  • “Cosmetic” changes (like the kmem(9) transition) should be done in a separate (cleanup) pass.
  • When upstream became a moving target again, finish with the old state and start a new project for the new incoming enhancements (and especially the LinuxKPI shim based stuff).
  • Any Questions?

    Thank you!