Note: I might have missed things/details, please feel free to correct it in the comments if you find any. - H
- The three day event hosted at KTPO.
- There were no storage/compute infrastructure hosted, the entire network infrastructure was dedicated to providing public wifi access and LAN connectivity to certain areas in the venue.
During the running time of the event
- The wifi connectivity faced issues throughout which impacted certain sections of the venue which in-turn affected participants.
- LAN connectivity remained fine except minor configuration mismatches on client side.
- The network team saw few un-invited access request in open ports. And there was a possibility of network takeover as mentioned by the network team but we could not confirm it.
- Conclusion later was that there were multiple issues going on which made it harder to narrow it down to one initially.
- People were able to connect to but the speed was too slow
- People were able to connect and
fast.comshows good enough speeds but latency was higher than expected - People were not able to connect to the network
- Current setup made it super easy to spoof the official SSID. No way to know if you’re connected to the legit SSID.
- People using mobile phone hotspots which lead to co-channel or adjacent channel interference.
- There might have been other unknown interferences
- Main stage console Ethernet line faced some issues, turns out they had a static IP set.
- Too way to long to determine if it was a security issue or network issue. So debugging the issue and subsequent fixes got delayed.
- When the network degraded we did not have any historical data to see when exactly the network health degraded.
- There was no logging or historical data of network usage in general. Had to manually to go locations and survey.
All the issues mentioned above could have been handled more gracefully keeping the following in mind:
- Capacity and interference planning
- Network monitoring and Intrusion detection
- Preparing the network keeping attacks in mind
Following is the individual breakdown:
What did we do?
- Tried fine tuning on channels and power, which seemed to work but we do not have any metric to back it up
- Split the network into smaller sections. This also seemed to work at first but unsure about how effective this was.
- Tried testing with other interference possibilities such as speakers etc. Did not help us.
- [ ] Which medium to use? (Ethernet/Wifi)
- [ ] Which frequency to use? (2.4GHz/5Hz/6Hz)
- [ ] Which encryption standard to use? (WPA2/WPA3/WPA3-SAE-PK etc.)
- [ ] Which channels to use?
- [ ] Transmitting power optimization.
- [ ] Checking the venue and sources of interference in advance. Speakers, Bluetooth usage, Mobile hotspots, other unexpected sources based on location etc.
- [ ] Discussion about which ports are essential etc. (There were discussions about blocking ssh port in the entire network which was something necessary for developers to operate on)
- [ ] Positioning of the wireless devices correctly
- [ ] Prioritize sections of the network and plan fallbacks accordingly. eg. Giving network to participants was utmost priority, it was fine if the public wifi is not working etc. but a total disaster when participants were not able to access the internet.
What did we do?
- We tried monitoring any suspicious devices but all of them were false positives.
- We did not have a good unified view of the network, so we resorted to what ubiquiti had to offer in the version that was currently running.
- [ ] Security being super important, we at all times need atleast one person from the network team who understands security in the venue.
- [ ] Define all the network needs as closer to reality as possible.
- [ ] Make sure there is enough logging and monitoring data available at all levels. Alerting is setup.
- [ ] Make sure debugging tools are in place if anything goes wrong with the network. Spectrum analyzer, 5ghz monitor mode adapters, whatever else, you name it.
- [ ] Programmatic access to the network is a plus
- [ ] An interface exposed to monitor the network to a handheld is a plus, organizers should have atleast readonly access to this.
What did we do?
- The network team added some firewall rules to protect their network from possible attacks
- We made an assumption that if it was a deauth attack, it most probably would be in 2.5GHz so we kept wifi to only 5GHz to minimize attack surface area.
There were multiple attacks that were hypothesized but we have absolute proof of none. There were not enough skills/equipments/tools to verify the hypothesis in the required timeframe.
- [ ] Prefer LAN for high priority networks, even if attacks are possible, isolation becomes easier. Deauth and wireless network degradation are particularly tricky because there’s no real way to prevent it(afaik) other than “deauth the deauther” or “find the person behind it” kind of approaches.
- [ ] If using wifi, prefer 5ghz/6ghz wifi with WPA3/WPA3 SAE-PSK. WiFi is prone to all kinds of attacks even when using the highest of security measures.
- [ ] Setup honeypots and decide how we could use them in any cases they become useful
- [ ] Deauth packets can also be false positives as APs “can” send these packets when it needs to reboot or change channels, or might be a badly written software on some AP. So if filtering through deauth packets, make sure to apply appropriate filters.
What did we do?
- Most support we provided was manual, most issues we were not alerted by the network system.
- [ ] Make sure IP allocation on the client is happening as expected.
- [ ] Physical places where network equipments are setup need to be attended/cctv camera monitored
- What was causing the throwing people out of the network issue, was it:
- Deauth attack?
- Congested channel?
- AP not able to take in new devices?
- Something else?
- What was causing the network degradation issue, was it:
- Someone attacking the network and consuming airtime?(this is still unlikely as we do not have any trace or logging for it)
- Too many devices connected to the AP and the AP goes under load?
- Positioning inappropriate?
- Wireless interferences?
- What caused the bandwidth improvement on Day3(Judgment day) once 2.4GHz was disabled at Zone3 SSID.
- In terms of throughput, 20 devices on one SSID is functionally the same as 5 devices on 5 different SSIDs, all on the same AP.
- But in our case, we had two different SSID on the same AP.
- Zone3 (The one that was known to everyone from day0)
- ETHDay3 for the Judges (The one that was created in the morning of Day3)
- So if there was any issues to be faced, clients connected to both of the SSIDs should have faced. But in our case
- Zone3 was facing network degradation, new devices were not even able to connect to it at times
- While ETHDay3 was working absolutely fine
- Zone3 was operating in 2.4 and 5GHz, the network team then disabled 2.4 and the network improved.
- Now we are not sure why this happened, if any attack dropped or load on the AP dropped due to which we saw improvements
- 3:00PM-4:00PM
- Initial network architecture: Zone1(new hall), Zone2(new hall), Zone3(old hall)
- Things looking great across all 3 zones, 100Mbps+ on each zone on my phone. On LAN 900Mbps+.
- No assumption of bad actors or anything at all, no stress test was done, no individual AP range/interference test was done afaik.
- Network config for wifi: 2.4+5 and 1 AP with WiFi6. All in WPA2 and most probably without PMF.
- 4:00PM-5:00PM
- Request of a new router in-front of the new hall for early-registrations.
- Due to some physical network cut, this was delayed by a little. Once setup, everything was flying like a G6.
- 8:00AM-11:00AM
- Bandwidth tested on mobile wifi, working alright, as expected.
- ~11:30AM
- Kartik mentioned latency issues in Z1 and Z2. I checked, were around 30,40, sometimes higher.
- Z1 wifi was dropping packets.
- Network team was informed about this. This was marked as a non-issue by the network team mentioning that this is what is expected and if we need better we’d need to provide ethernet to tables.
- ~3:00PM
- At this point:
- ~600 devices to Z2
- ~300 devices to Z1
- ~150 devices in Z3
- Issues and reports of bad wifi at Z2 and Z1. Z3 alright.
- We noticed that total bandwidth usage was about 100Mbps. With about 300 users in Z1. Which meant that people are not even able to use the available bandwidth.
- Latency issues persisted and got worse based on personal checks.
- At this point:
- ~6:00PM
- Preet joined in and started investigating about what could be causing what was happening.
- Wifi network was still bad and we decided to split each zone into smaller zones
- New wired connection was set up to main stage console.
- ~8:00PM
- After a while, few folks who were there in ETHIndia’18 joined in and started discussing possibilities and mitigation.
- These folks explained about how deauth attack could happen and that it seems likely the situation.
- People sitting next to each other connected to the same SSID facing drastically different treatment from the network.
- Ashwin tried creating an AP on his phone with the same name and pass and people started connecting to it.
- Debugging is very hard in this system because there are no logs or historical data. Not a lot was making sense.
- After a while, few folks who were there in ETHIndia’18 joined in and started discussing possibilities and mitigation.
- ~11:00PM
- Preet and Ashwin did some digging around ubiquiti forums and figured that channels used and power might be causing what was happening.
- ~01:00AM(next day)
- We had some fine tuning done of power, channels, only 5GHz with the network team, for that night, people got great network connectivity across zone 1 and 2 (We verified manually table by table and also got feedbacks about it from participants)
- Also tried manually telling people to turn off their mobile hotspots, we saw visible change in channel occupancy after this in 2.4GHz.
- ~9AM
- Network is working alright, not too many people in Z1 and Z2
- ~10AM
- Network went bad again but still better than previous day in comparison.
- This time we managed to capture one or two deauth packets but they were mostly false positives
- Network team was not able to access their own control panel, they suspect an attack. (Which cannot be confirmed yet). This was later resolved by manually resetting each AP in Zone2.
- ~3PM
- Security person from the network team came in the premises. Configured on-prem devices to have better security because they apparently got unwanted/unexpected request to infiltrate the network on their side from our clients.
- ~6PM
- Set wired connection to main stage and few other places
- Amogh came in with his 2.4GHz monitor mode adapter, we soon saw too many deauth requests now. Turns out none of those devices were in the network when we tried checking them. Also quite a few of them were hotspot dongles.
- In the evening we tried drilling down on interferences but did not get anything significant out of it.
- ~8PM
- We split zone2 into 3 different zones.
- Propagated the new SSID and pass to participants one at a time.
- We did not want to mass announce this as we were not sure if there was some attack and it was happening on specific SSIDs.
- Earlier: Lot of APs, 1 SSID
- Now: More SSID, per SSID not too many APs. (which makes it clients away from the AP less likely to get good connectivity)
- This now helps keep the number of devices connected to each AP less, which is good.
- Turns out this split was effective at first but not so much after a while.
- We split zone2 into 3 different zones.
- ~1AM
- Z2: Things were running in 5gz, Z2 was split. Connectivity was bad(I was getting kicked out), some participants were getting it fine.
- Z1: This was not split, things were as is. Connectivity was alright and very good at some places.
- Pretty bad at one particular place and that place was near the AP which was set to
5GHz/channel 36which was not healthy according to theWifiAnalyzerapp. Other APs in Z1 were running in>100 channel numbers
- Pretty bad at one particular place and that place was near the AP which was set to
- We were not even able to investigate what was causing the slowness because the network team was not in the venue and their laptop was not available.
- ~3:30AM
- Network team arrived and started cabling Zone3, there were some issues related to cable length but they managed it after a couple of hours.
- ~8AM
- Zone3 new SSID was created
ETHDay3.
- Zone3 new SSID was created
- ~11AM
- Main stage console was facing issues with ethernet connectivity, this was resolved by enabling DHCP on the client machine.
- ~2PM
- We came to know about the issue where the same AP running different SSIDs getting different bandwidth based on the SSID. This was surprising and was resolved by setting it to 5GHz only but could not find the root cause as mentioned above.
- MITRE ATT&CK®
- On the Robustness of Wi-Fi Deauthentication Countermeasures
- #HITB2022SIN Attacking WPA3: New Vulnerabilities And Exploit Framework
- Planning and Implementing a Wi-Fi zone for your Town
- Difference between WLAN class1 class2 and class3-802.11 class 1,2,3