Hey everyone, I’m looking to formalize my process for when things start crawling. We just moved into a new multi-story facility, and lately, I’ve been getting vague tickets about “the internet being slow”. I want to stop being reactive and actually have a “battle plan” for diagnosing these bottlenecks.
What’s your step-by-step logic when you get that first alert or complaint? I’m looking for a methodology that covers everything from the ISP handoff down to the client device. I want to know exactly where the pipe is leaking. Any specific tools or sequences you swear by?
First, I need to figure out where exactly the issue is — inside the local network (LAN) or on the external link (WAN). I usually run two parallel pings: one continuous ping to a reliable external DNS (e.g., 8.8.8.8), and the second to the default gateway. If latency increases or packets are lost at the gateway, the issue is definitely on the LAN — I check the switches or hardware. But if everything is perfect with the gateway, but the external ping takes over 500 ms, the problem is external. In this case, I either immediately call the provider or check the NAT table on the edge router — perhaps we’ve simply reached the session limit.
Users often complain about “slow internet” when their devices simply connect to a distant access point with a signal strength of -80 dBm. In a multi-story building, eliminating interference and poor coverage is critical. To identify the problem, I recommend conducting a radio survey of the building using NetSpot. You’ll get hitmaps of dead zones, assess the channel overlap situation, and see the full picture.
Don’t ignore hidden issues like DNS. If the network is slow, the primary DNS server may be frozen. This causes 2-3 second delays before switching to the backup. Check the server logs and switch the test machine to the public DNS to test the speed. Also, check the ports on the backbone switches for input errors and packet cuts. A damaged SFP module or pinched fiber patch cord will cause numerous retransmissions, which can critically reduce throughput without completely breaking the connection.
If complaints are coming from specific areas, it’s worth carefully studying the radio spectrum. Sometimes a regular microwave in the cafeteria or cheap LED lamps completely block the 2.4 GHz frequency. Evaluate the signal-to-noise ratio (SNR). This can be done using NetSpot. If the noise level is too high, it’s better to force users to switch to 5 GHz or 6 GHz. It’s also worth disabling low transmission speeds to get rid of “sticky” clients, which slow down the network and clog the airwaves for everyone else.
Thanks everyone for the helpful advice! I followed your advice. I ran an internal/external ping test, checked the network in NetSpot, and discovered that the problem with our multi-story building was twofold: clients in the lobby were constantly connecting to the access point on the third floor, and the firewall processor was working at its limit during the morning rush hour due to an incorrectly configured IPS rule. Also, checking the SFP modules was very helpful. I discovered that one module was generating CRC errors directly on the server backbone. I fixed the RF space, changed the port, and finally the requests stopped coming in.