There is an ever-increasing demand for more functionality and complexity to be added to wireless access points.

To address these requirements, systems designers must rely on hardware implemented features like dedicated programmable classification, hardware-based queuing and QoS with inter-layer optimization; these provide superior performance versus software-based implementations and also provide tight latency bounds in serving low power clients.

In this article, we aim to propose an innovative architecture that offloads the host processor from most of the per-packet functions across the different OSI layers, and thus allows it to serve other system specific functions and to reduce power consumption within existing PoE budgets.

The Wi-Fi module is the data acquisition point for the Wi-Fi access point and is an optimal place for deploying the functionality described above. It scales with these performance needs, eliminates unnecessary data forwarding (tunnelling), and can provide a unified policy framework to implement the much-needed QoS and short latency for low power clients. The proposed architecture is suitable for concurrent dual-band in single Wireless LAN modules (supported by the existing Ensigma RPUs) or to support two different Wireless LAN cards for 2.4 GHz and 5 GHz.

The firewall and QoS rules need to be applied on the fly on all the packets at around 1 Gbps, and the firewall rules would be changing based on client mobility etc.

Imagination provides the Ensigma Network Processing Units (NPU) and Radio Processing Units (RPU) to address the packet processing and baseband functions required to provide a wired network experience in a wireless LAN.

With ever increasing functionality and complexity, dedicated hardware based packet processing in the access points provides much superior performance compared to the software based packet processing. They also offload the host processor from network functions, by implementing the fast path portion of the policy framework shown above, allowing them to serve compute needs.

  The architecture of typical Access Points includes a networking processor, the baseband and a host CPUThe architecture of typical Access Points includes a networking processor, the baseband and a host CPU

The following are the different functions that need to be handled differently to existing solutions.

In-line classification

The packets need to be inline classified without making a round trip to the DDR memory. The DDR memory bandwidth is a scarce resource. The classification needs to be programmable owing to the diverse requirements and also changing deployments/features/standards. So a multi-core programmable engine would be best suited for this function.

This would enable the classifier to implement not only static rules/setting, but also dynamic flows as defined in the policy framework.

  • DPI
  • State full inspection of packets
  • Snooping of packets
    • Beyond IGMP to do OSPF etc. for L3 routing
  • Application level gateways

Hardware assisted QoS

As mentioned previously, processor based or software based QoS is very MIPS intensive and sub-optimal in implementation especially with large number of queues. To provide enterprise class QoS, it is essential to have per AC per STA queue in order not to have one rogue device blocking other STAs and also to restrict the bandwidth across a set of STAs (all STAs in a GUEST SSID) using dedicated hardware. To rate limit at this granularity, a three level hierarchical queuing with rate shapers at each level of the queuing is needed.

  • Per AC level
  • Per STA level
  • Port level

The hierarchical queuing and PerAC/PerSTA queuing also enables video and voice synchronization. The parameters like weights of round robin/fair queuing, shaper token and max counts are dynamically changed as per the rate adaption and the application properties.

Another unique feature of the Wi-Fi, being half duplex in a shared medium, is that the bandwidth allocated to a station could be an aggregate of the traffic in both directions. So when a packet is received from a station (per an AC), the amount of data could be deducted from the rate shaper of that AC+STA. Note that the classification provided above is implemented on both directions.

With this scheme, it is very easy to control and provide guaranteed QoS across a large number of STAs with rules such as

  • The total guest traffic will be 1 Mbps
  • Priority for certain users
  • VoWi-Fi traffic classification and prioritization with shaping

3-2-Wi-Fi transmit pathThe Wi-Fi transmit path

Spectrum analysis

This feature requires access points to detect and report non-Wi-Fi interferers e.g. Bluetooth, video monitors, cordless phones, microwaves. Typically the expected features include:

  • Classify the source of interference (e.g. microwave vs Bluetooth)
  • Provide a UI to display real-time signals
  • Real-time FFT chart – show energy levels at each frequency component
  • FFT duty cycle chart – display the duty cycle of interfering device
  • Detect and show if the device is frequency hopping

In addition to the interface from Wi-Fi, the access points need to detect and report other Wi-Fi BSSs operating on a specified channel or a list of channels and their occupancy/utilization.

During operation on a channel, if this feature is turned on then the access point shall detect the interferers without affecting the actual traffic. Alternatively, access points should also be able to initiate a spectral scan on a list of channels and report the results.

Programmable DSP based Baseband would extremely useful for real-time signal analysis to be able to download and run different code for different application environments.

Wi-Fi IEEE power management

The solution has to be very latency sensitive for certain modes of operation like U-APSD. The latency of the access point to respond to a U-APSD trigger, defines the power dissipation of the client as the client initiates PS-POLL packet and waits for the data. The total time to respond should be in the order of 100 µs. Hardware based queues and data transmission right from queues, enables much faster latencies to schedule a packet to the client. In addition special queues are needed in the MAC level is get the U-APSD packets ahead of the normal queues. The TX-QoS block can block/enable queues dynamically based on the incoming packet avoiding the processor to schedule them.

Packet coalescing and DDR store

One of the predominant factors that drives the efficiency of the spectrum/air-time is the ability to burst traffic of a particular station’s (and particular TID) traffic using AMSDU and AMPDU. With the help of the hardware based shapers per Q, and by having an additional parameter to increase the burst capability in the shaper, multiple packets from a queue can be scheduled at one scheduling instance.

3-3-Wi-Fi receive pathThe Wi-Fi receive path

In addition, owing to the number of stations and the data queues across all the stations, it is preferable that the host processor DDR memory is the only memory used for packet storage. The AMSDU+AMPDU aggregation of 802.11ac provides total packet sizes of up to one megabyte and if multiple buffers are kept internally, the solution would be cost prohibitive. So the architecture needs to traverse through the packet only once and utilize the DDR memory for packet store.

Multicast to Unicast Conversion

Wi-Fi as it is specified is not reliable for multicast traffic (no ACKs for multicast). So opportunistically, the access points convert the multicast traffic to unicast to make the communication reliable. Each time this conversion is done there is usually a full packet copy of the data to each of the unicast paths. This is a very expensive operation. A block that can perform the multicast to unicast transfers without having to do individual buffer copies and that can perform header conversion in the datapath is essential to support high rates of multicast traffic (addressed to multiple clients). It is desired to have reference counting in conjunction with this block, instead of having the host processor maintain reference counts per packet.

Programmability

AP system companies have significant value differentiation in implementing certain protocols like rate adaptation, fast roaming, DFS etc. So the solution should enable reuse of the existing software in these areas. As the deployment scenarios and standards are evolving, it is imperative to have a programmable packet classification and packet editing solution for future upgrades.

Different system solutions could have quite a significant impact in the treatment of very common fields like BSSID treatment etc. so hardware level assumptions to do look ups or classification could prove inadequate. A fine grain software control over the main/fall back rates, transmit power, carrier sense thresholds, contention window parameters is required.

So to accomplish these two tasks, it is desired to have programmability in the MAC layer and also at the packet-processing layer above the lower MAC.

Scalability

Outdoor, enterprise access points, for example at university campuses, airports and stadiums need to work with hundreds of clients at a time. The access points need to scale the number of stations with respect to security sessions, QoS across all the stations etc. The 802.11i security on the data packets and protected management packet of 802.11w should be handled in the Wi-Fi chip for scalability and to avoid tunnelling of data packets to the controller. The scalability is required with respect to the number of VLANs, multiple SSIDs and virtualized WLANs.

So, hardware based key look up and context swapping is required to support security sessions up to 256. And to support 64K sessions, 5-6 tuple-based hash is computed and it is used as an index to the flow entry table.  Hardware based QoS as described above provides scalability of the traffic management.

Airtime fairness across stations

Packet schedulers generally perform round robin arbitration across of each of the stations. Stations far away take the same Ethernet bandwidth as stations close by. However, far away stations take lot more airtime. In an AP the difference in data rates of two clients could be 10x (100 Mbps vs 10 Mbps). The airtime usage of the two clients would be 10:1, in turn reducing the overall system performance.

The AP arbitration scheme has to allocate fair time across all the clients. This is accomplished with per station/per AC shapers queues and with rate shapers inline with the rate adapted between the AP and the client. Note that the rate adapted should be the total bandwidth for the two directions as Wi-Fi is half duplex.

Conclusion

We hope you’ve enjoyed our networking-focused miniseries. Please let us know if you have any questions or contact us directly if you want to learn more about MIPS and Ensigma.

About the author: Narayanan Raman

Profile photo of Narayanan Raman