Tucked away in suburban New Jersey, a crew of 130 people keeps watch on AT&T’s worldwide telecommunications network, looking for hiccups that could be the first sign of trouble. They’re tracking the daily movement of nearly 30 petabytes of data — including 1.4 billion voice calls and 5 billion text messages — as it travels through nearly 1 million miles of fiber-optic cable, tens of thousands of cell sites, and countless switches and routers.

Welcome to AT&T’s Global Network Operations Center — GNOC for short — a state-of-the-art facility in Bedminster, N.J., which serves as mission control for the network at the heart of the company’s business. Behind these walls, network managers route traffic around potential bottlenecks, reallocate resources to match ever-shifting demand, and play a continuous game of cat and mouse with malicious hackers and cybercriminals.

This is network management on a grand scale: AT&T contends that the GNOC is the largest command and control center of its kind in the world. Inside, it looks like something from a movie set. Rows of workstations are arranged before a 12-foot-tall by 350-foot-wide wall that’s packed with 141 screens showing real-time data on the network’s traffic load, performance and other factors.

And yet, if the folks here are doing their jobs, the facility’s role should be transparent to most network users. Although the GNOC is crucial for keeping vital communications running during high-profile disasters like Hurricane Irene or the Japan earthquake and tsunami, network managers spend 80 percent of their time fixing small problems before they snowball into something bigger.

“There’s a ton of telemetry that comes into this location. We massage it into our systems so that if we see something occurring, we can get in front of it,” said GNOC Director Chuck Kerschner. “But customers will never see most of that.”

Balancing Act

It turns out that network activity on this massive scale is remarkably predictable. People text the same co-workers and call the same friends at the same rate, day after day. These trends show up as smoothly arcing lines on the GNOC’s array of screens. Under normal circumstances, the curving graph of today’s voice and data traffic matches one taken from the same time last week.

But when disaster strikes, the network immediately feels the impact.

In an auditorium overlooking the GNOC floor, AT&T’s Steve Moser offers a glimpse of what happens when things go south. Moser, who runs the GNOC visitor program, points to a spiking graph and explains that it shows a surge of voice calls and texts during the 2003 Northeast blackout that interrupted electric service to an estimated 45 million people in eight U.S. states. The power outage struck around 4:10 p.m. on Aug. 14 (Watch Moser's demonstration).

Records from that date show an almost instantaneous jump in traffic on the AT&T network. “The network is a very sensitive barometer of what’s happening,” Moser said.

During these events, network managers scramble to balance the load. For instance, at the height of Hurricane Katrina, the network was jammed with phone calls and texts coming into the New Orleans area offering help (See the call volume spike during Katrina). A U.S. map in the GNOC that tracks network anomalies dramatically shows the strain. The map is blank during normal operations, but in August 2005 it was laced with bright lines indicating network overflow. The lines stream from the East Coast, West Coast and Midwest — all converging on Louisiana. To relieve the gridlock, network managers restricted incoming traffic so that residents in the disaster zone could call for help and access the Internet, Moser said. As the event unfolded, the ratio of incoming to outgoing traffic was continually adjusted to maintain service to the region.

Similarly, last year’s Japan earthquake quickly showed up in the GNOC as a spike in phone calls from the area, followed by alerts that the network was automatically rerouting its traffic around a damaged undersea communications cable. Within minutes of the event, GNOC staff had mashed together network telemetry and earthquake alerts from the U.S. Geological Survey to gain an understanding of the disaster and launch a response.

Spotting Cybercrime

While the big events make for dramatic examples, GNOC personnel spend most of their time watching for subtle signs of trouble. For instance, equipment in the GNOC monitors temperature and humidity in AT&T network facilities across the globe, and a map of cell towers shows equipment that may be experiencing “pre-issues” that could lead to failure.

Thanks to the GNOC’s bird’s-eye view of a huge portion of the world’s communications traffic, the facility also is a powerful tool for spotting cybercrime. Among other things, the facility monitors 65,000 IP ports for abnormal activity. Because traffic patterns are so predictable, even the slightest deviation can signal an attack.

“Your network is never going to be 100 percent protected, but the key is to see those early indicators and put the necessary things in place,” said Steve Roderick, director of security technology for AT&T. “Otherwise you’re in a reactive state and then it’s too late to execute.”

The company uses the GNOC’s security monitoring capabilities to safeguard its own network, as well as to protect commercial and government clients of its security services. Threat data also is shared with the U.S. Department of Homeland Security, the U.S. Computer Emergency Readiness Team and other agencies, along with key vendor partners like Microsoft, Apple and McAfee.

Dealing With Data

Globally, wireless data overtook wireless voice traffic in 2010, according to various industry sources. And that shift is plainly evident on AT&T’s mobility network, where text messages outnumber voice calls more than two-to-one on a typical day. The company, Moser said, is seeing annual increases of 35 percent in data and IP traffic, even in a down economy.

Changes in the makeup of network traffic have driven the adoption of new management tools over the past dozen years, he said. The growth of social networks has contributed to the shift toward data traffic and also provided network managers with new mechanisms for spotting performance issues. The GNOC follows Twitter feeds and Facebook posts for customer feedback, for instance, in addition to receiving data through more traditional channels like AT&T customer call centers.

Indeed, the perfect storm of pop culture and new technology powers one of the busiest nights of the year on AT&T’s wireless network: the finale of Fox television’s American Idol. Moser points to a graph showing what happens when viewers text and phone in their votes for the winner of the popular reality TV singing contest. What starts as a typical day instantly turns into a network-straining avalanche of wireless traffic at 9 p.m. Eastern (Watch Moser demonstrate the volume spike during voting time). “You basically have one customer doubling the usage of the network, so it’s a significant event,” he said. “The fact that you can still use the network and never notice that this is happening is really a credit to the kind of thing we’re doing here.”


 

Northeast Blackout Sends Call Volume Spiking

In 2003, a problem with the electric grid interrupted electric service to an estimated 45 million people in eight U.S. states. The power outage struck around 4:10 p.m. on Aug. 14. Records from that date show an almost instantaneous jump in traffic on the AT&T network.


Managing Network Traffic During Hurricane Katrina

See how AT&T’s Global Network Operations Center kept voice and data traffic flowing during the Hurricane Katrina disaster in 2005.


American Idol’s Impact

Few events impact the AT&T network like the final episode of American Idol. Network traffic essentially doubles when viewers vote for the annual winner of Fox TV’s popular singing contest.