Impact of the June 2012 Derecho on Communications Networks and Services Report and Recommendations A Report of the Public Safety and Homeland Security Bureau Federal Communications Commission January 2013 ii TABLE OF CONTENTS EXECUTIVE SUMMARY ...........................................................................................................................................1 1. INTRODUCTION.............................................................................................................................................3 2. PSHSB INFORMATION-GATHERING PROCESS ................................................................................................6 3. OVERVIEW OF THE DERECHO’S IMPACT ON COMMUNICATIONS ................................................................11 4. THE DERECHO’S EFFECTS ON WIRELINE NETWORKS....................................................................................15 4.1 BACKUP POWER PROBLEMS .................................................................................................................................15 4.1.1 Backup Power Problems in Virginia .....................................................................................................16 4.1.2 Backup Power Problems in West Virginia............................................................................................20 4.2 SWITCH EFFECTS ................................................................................................................................................20 4.2.1 Verizon Switch Effects..........................................................................................................................20 4.2.2 Switch Effects in West Virginia ............................................................................................................21 4.3 EFFECTS ON TRANSPORT EQUIPMENT.....................................................................................................................21 4.4 PROPOSED CORRECTIVE ACTIONS FROM WIRELINE PROVIDERS ...................................................................................21 4.4.1 Corrective Actions Proposed by Verizon ..............................................................................................21 4.4.2 Corrective Actions Proposed by Frontier..............................................................................................24 4.5 WIRELINE OUTAGES IN OTHER STATES ...................................................................................................................24 5. THE DERECHO’S EFFECTS ON PUBLIC SAFETY ANSWERING POINTS .............................................................25 5.1 DESCRIPTION OF THE 9-1-1 NETWORK...................................................................................................................25 5.2 NETWORK EFFECTS ON PSAPS .............................................................................................................................26 5.3 NETWORK EFFECTS ON PSAPS IN VIRGINIA.............................................................................................................27 5.4 VIRGINIA PSAPS................................................................................................................................................28 5.4.1 Fairfax County......................................................................................................................................28 5.4.2 Prince William County..........................................................................................................................30 5.4.3 City of Manassas..................................................................................................................................30 5.4.4 Manassas Park.....................................................................................................................................31 5.4.5 Arlington County..................................................................................................................................31 5.4.6 Other PSAPs in Virginia Supported by Verizon.....................................................................................32 5.4.7 Albemarle County ................................................................................................................................ 32 5.5 EFFECTS ON PSAPS IN WEST VIRGINIA...................................................................................................................33 5.6 EFFECTS ON PSAPS OUTSIDE OF VIRGINIA AND WEST VIRGINIA..................................................................................34 5.6.1 Ohio .....................................................................................................................................................35 5.6.2 Maryland .............................................................................................................................................35 5.6.3 Other States.........................................................................................................................................36 6. THE DERECHO’S EFFECTS ON WIRELESS NETWORKS....................................................................................36 6.1 HIGH-LEVEL SNAPSHOT OF WIRELESS NETWORK EFFECTS ..........................................................................................37 6.2 CELL SITE BATTERY LIFE.......................................................................................................................................38 7. RECOMMENDATIONS..................................................................................................................................39 7.1 RECOMMENDATIONS FOR POSSIBLE COMMISSION ACTION.........................................................................................39 7.2 DILIGENT IMPLEMENTATION OF BEST PRACTICES ......................................................................................................41 7.2.1 Actions by CSRIC ..................................................................................................................................42 7.3 RECOMMENDATIONS FOR PSAP ACTION ................................................................................................................42 7.3.1 PSAP Voluntary Reporting to the FCC ..................................................................................................43 7.4 NG9-1-1 IMPLEMENTATION................................................................................................................................ 43 8. CONCLUSION...............................................................................................................................................45 iii APPENDICES Appendix A: Preliminary Findings by the Virginia State Corporation Commission Appendix B: Recommendations of the Metropolitan Washington Council of Governments Appendix C: Recommendations from PSAPs FIGURES IN DOCUMENT Figure 1: Number of NORS Reports by Date of Incident .................................................................7 Figure 2: Number of NORS Reports by State...................................................................................7 Figure 3: 9-1-1 Network Architecture ...........................................................................................25 Figure 4: Percent Cell Sites Out by Date........................................................................................38 Figure 5: Reasons Cell Sites Out of Service....................................................................................38 TABLES IN DOCUMENT Table 1: Summary of PSAP Effects.................................................................................................27 1Executive Summary In June 2012, portions of the Midwest and Mid-Atlantic regions of the United States experienced a fast-moving, destructive windstorm called a derecho, resulting in twenty-two deaths and leaving millions without electrical power. Unlike hurricanes and superstorms, which are generally well-forecast, derechos are more like earthquakes, tornados, and man-made events for which there is little-to-no advance notice and opportunity to prepare. The 2012 derecho severely disrupted 9-1-1-related communications. Seventy-seven 9-1-1 call centers (also known as “Public Safety Answering Points” or “PSAPs”) serving more than 3.6 million people in six states lost some degree of connectivity, including vital information on the location of 9-1-1 calls, mostly due to service provider network problems. From isolated breakdowns in Ohio, New Jersey, Maryland, and Indiana, to systemic failures in northern Virginia and West Virginia, 9-1-1 systems and services were partially or completely down for up to several days. Seventeen PSAPs in three states lost service completely, affecting the ability of more than 2 million people to reach 9-1-1 at all. Even in the context of a storm like the derecho, a large-scale failure of communications – particularly 9-1-1-related communications – is unacceptable, and action must be taken to prevent similar outages in the future. To this end, at the direction of Federal Communications Commission (“FCC” or “Commission”) Chairman Julius Genachowski, the Public Safety and Homeland Security Bureau (“PSHSB” or “Bureau”) conducted an inquiry into the causes of the communications failures that resulted from the derecho and ways to prevent them during future emergencies so we can make the public safer. The Bureau’s inquiry included extensive review of confidential outage reports, public comments and related documents, as well as interviews of many service providers and PSAPs, equipment and backup power vendors, and public safety and community officials. The Bureau found that above and beyond any physical destruction by the derecho, 9-1-1 communications were disrupted in large part because of avoidable planning and system failures, including the lack of functional backup power, notably in central offices. Monitoring systems also failed, depriving communications providers of visibility into critical network functions. In most cases, the 9-1-1 and other problems could and would have been avoided if providers had followed industry best practices and available guidance. While important aspects of 9-1-1 service are under state and local jurisdiction, the Commission has a statutory obligation to ensure that our nation’s communications networks “promot[e] safety of life and property,” and action at the federal level could help prevent similar failures in the future. Thus the report recommends areas for the Commission to consider action to ensure the reliability, resiliency, and availability of 9-1-1 communications networks. These include ensuring that service providers: conduct periodic audits of 9-1-1 circuits; maintain adequate 2backup power at central offices and follow regular maintenance and testing procedures; have adequate network monitoring links; and have a more specific obligation to notify 9-1-1 call centers of breakdowns of 9-1-1 communications. The report also notes the relative advantage in reliability and performance that migration to Next Generation 9-1-1 (“NG9-1-1”), which is based on Internet-protocol architecture, will bring over “legacy” 9-1-1 systems. The Bureau’s derecho inquiry gathered information relating to broader issues of network reliability and resilience that are also important to the ability of consumers to originate successful calls for help in emergencies, including the availability of wireless networks and of power for consumers’ devices and equipment. Those issues were raised again when, during the preparation of this report, the Mid-Atlantic was hit by another devastating weather event: Superstorm Sandy. Accordingly, we will analyze these issues further, after the conclusion of the field hearings recently announced by Chairman Genachowski about that event. While Superstorm Sandy had widespread and severe impacts on communications, this report focuses on the derecho and, principally, its devastating impact on the networks that connect 9-1-1 call centers to people who need help. 31. Introduction On June 29, 2012, a fast-moving, destructive, and deadly storm called a derecho1 developed in central Iowa, worked its way across Illinois and Indiana, and by late afternoon and early evening, had moved quickly across Ohio, the central Appalachians, and the Mid-Atlantic states, continuing into the early morning of June 30. It resulted in twenty-two deaths and widespread damage, and left millions of citizens without electrical power for periods ranging from less than an hour to close to two weeks.2 The radar map below shows the derecho’s path over time.3 1 The National Weather Service defines a derecho as “a widespread, long-lived wind storm that is associated with a band of rapidly moving showers or thunderstorms. Although a derecho can produce destruction similar to that of tornadoes, the damage typically is directed in one direction along a relatively straight swath. As a result, the term ‘straight-line wind damage’ sometimes is used to describe derecho damage. By definition, if the wind damage swath extends more than 240 miles (about 400 kilometers) and includes wind gusts of at least 58 mph (93 km/h) or greater along most of its length, then the event may be classified as a derecho.” See http://www.spc.noaa.gov/misc/AbtDerechos/derechofacts.htm. 2 See, e.g., Schwartz, John, Many Areas Still in Dark After Series of Storms, N.Y. TIMES, July 2, 2012; Gresko, Jessica, Mid-Atlantic Region Hammered by Storms, Too, THE COLUMBUS DISPATCH, July 1, 2012. More than 3.7 million customers lost power, including approximately 1 million customers each in Ohio and Virginia and roughly 900,000 in Maryland, 632,000 in West Virginia, 68,000 in Washington, D.C., and 32,500 in Pennsylvania. 3 Map by G. Carbin, National Weather Service Storm Prediction Center. 4The derecho caused widespread disruptions to communications generally and 9-1-1 services particularly.4 From isolated breakdowns in Ohio, New Jersey, Maryland, and Indiana, to systemic failures in northern Virginia and West Virginia, a significant number of 9-1-1 systems and services were partially or completely down for up to several days. In all, at least seventeen 9-1-1 call centers in three states lost service completely, affecting more than 2 million residents’ ability to reach 9-1-1. One PSAP alone estimates that it did not receive approximately 1,900 emergency calls during the time it was down.5 Across the storm’s path, at least seventy-seven PSAPs serving more than 3.6 million people in six states lost some degree of connectivity, including vital information on the location of 9-1-1 calls; the overwhelming majority of these full or partial outages suffered by PSAPs were due to problems in providers’ networks. The public’s inability to reach 9-1-1 and obtain emergency assistance during and after the derecho was not just a theoretical or abstract concern. Whether, and how quickly, help can be called and a first responder arrives might make the difference between life and death.6 As The Washington Post reported, for example, a young man died after being struck by electrical wires brought down during the derecho. Bystanders who came to his aid and called 9-1-1 reportedly were not able to get through, even after calling for more than thirty minutes.7 In another instance, a woman, just a few hundred feet from her Washington, D.C. apartment, was knocked off her motorcycle and pinned under a tree, leaving her partially paralyzed. She was saved when passersby, unable to get through to 9-1-1, flagged down an ambulance that provided help.8 While it does not appear that the large-scale failures of service providers’ 9-1-1 network infrastructure were factors in these two events (those failures occurred later), these real-life situations reinforce the critical importance of the successful completion of calls to 9-1-1. Congress has given the Commission the responsibility under the Communications Act to ensure that communications networks of all types “promot[e] safety of life and property.”9 Central to 4 See, e.g., Sullivan, Patricia, 911 Failure Affected 2.3 Million in Northern Virginia, WASH. POST, July 11, 2012. 5 See Comments of Fairfax County, Virginia at 2 (Aug. 17, 2012) (“Fairfax County Comments”). These comments came in response to the Bureau’s July 18, 2012, Public Notice in Public Safety (PS) Docket No. 11-60 seeking input on the effects of the storm and potential remedies. See Public Safety and Homeland Security Bureau Seeks Comment On 9-1-1 Resiliency and Reliability in Wake of June 29, 2012, Derecho Storm in Central, Mid-Atlantic, and Northeastern United States, Public Notice, 27 FCC Rcd 8131, PS Docket No. 11-60, (PSHSB July 18, 2012) (“Derecho Public Notice.”). 6 One study of cardiac emergencies in Pennsylvania found that E9-1-1 adoption reduced the risk of mortality within six hours by 60 percent and the risk of mortality within 48 hours by 35 percent. See Athey, Susan and Stern, Scott, The Impact of Information Technology on Emergency Health Care Outcomes, January 2002, at 32, available at http://kuznets.fas.harvard.edu/~athey/itemer.pdf. 7 Sullivan, Patricia, Help Delayed For Electrocuted Man As 911 Calls Backed Up During Storm, WASH. POST, July 19, 2012. 8 Ruane, Michael E., D.C. Woman Caught In The Derecho Storm Is Left Paralyzed, But Her Attitude Is Optimistic¸ WASH. POST, Aug. 19, 2012. 9 See 47 U.S.C. § 151. 5this important responsibility is promoting the reliability, resiliency, and availability of communications networks at all times, including in times of emergency or a natural disaster such as the derecho. To that end, the Commission, through the Bureau, administers rules on communications outage reporting10 and analyzes those reports to identify various communications outage trends nationwide, especially regarding 9-1-1 capabilities. We use this information to assess network reliability and make recommendations for both individual provider and industry-wide improvements. The severity of the 9-1-1 outages that resulted from the derecho called for a more intensive review process for this inquiry. Accordingly, in July 2012, Chairman Genachowski instructed the Bureau to conduct a comprehensive inquiry into the impact of the derecho on 9-1-1 and other communications. In addition to reviewing data from more than 500 outage reports from twenty-two communications providers, the Bureau released a Public Notice seeking comment on issues including the cause of the outages, their effect on public safety, and the resiliency and reliability of 9-1-1 networks generally.11 In response to that Public Notice, the Bureau received forty-five filings, including twelve comments and reply comments from communications providers and trade associations, thirteen from PSAPs and public safety groups, and twelve from individuals. The Bureau interviewed representatives of eight communications providers (some multiple times), twenty-eight PSAPs, three battery manufacturers, one generator manufacturer, numerous state and county entities, and obtained additional information through six supplemental data requests. In addition, the Bureau participated in several federal, state, and local meetings and hearings on the effects of the derecho.12 In this report, we present our findings and recommendations. Section 2 provides an overview of the Bureau’s information gathering process. Section 3 offers a general description of the derecho’s impact on communications. Sections 4, 5, and 6 describe the derecho’s impact on three types of communications, specifically wireline, PSAP, and wireless communications. In Section 7, we offer our recommendations for addressing the specific problems identified in this report, as well as other suggestions we believe will promote the reliability of our Nation’s 9-1-1 communications.13 10 See 47 C.F.R. §§ 4.1-4.13. 11 See Derecho Public Notice, 27 FCC Rcd at 8131. 12 See, e.g., “Resilient Communications: Current Challenges and Future Advancement,” Before the Subcomm. on Emergency Preparedness, Response, and Communications, House Committee on Homeland Security, 112th Cong. (Sept. 12, 2012) (statement of David S. Turetsky, Chief, Bureau of Public Safety and Homeland Security); “Reliability of the District’s 911 Call System,” Before the District of Columbia Council Comm. On the Judiciary (Sept. 20, 2012) (statement of David S. Turetsky, Chief, Bureau of Public Safety and Homeland Security); Metropolitan Washington Council of Governments – News Room, “COG to Review 9-1-1 Outages and Other Failures Resulting from ‘Derecho’” (rel. Jul. 11, 2012) available at http://www.mwcog.org/news/press/detail.asp?NEWS_ID=584. 13 This report not only provides our own assessments, but also includes in the attached appendices preliminary findings by the Virginia State Corporation Commission (Appendix A); recommendations by the Metropolitan Washington Council of Governments (Appendix B); and recommendations from various PSAPs interviewed by the Bureau (Appendix C). 62. PSHSB Information-Gathering Process Outage Reporting and Developing the Record The Bureau first became aware of the derecho’s effects on communications infrastructure when providers began to submit derecho-related Network Outage Reporting System (“NORS”) reports on June 29, the day the storm struck.14 Communications providers filed a total of thirty- five NORS reports on June 29; the tally of NORS reports with outages beginning on June 29 would eventually reach 135. Under FCC rules, these reports are presumed confidential to protect proprietary information and data with sensitive national security implications. 15 Accordingly, outage report data in this public report is presented in the aggregate unless otherwise agreed by the source of the information or available through other public sources.16 To the extent that information derived from outage reports or other confidential sources appears in this report, each provider has waived the presumption of confidentiality with respect to that information. Figure 1 below shows the number of NORS reports17 by incident date: 14 NORS is the Commission’s mandatory web-based filing system through which communications providers covered by the Part 4 outage reporting rules must submit reports to the FCC. This system uses an electronic template to promote ease of reporting and encryption technology to ensure the security of the information filed. The Cybersecurity and Communications Reliability Division of the Public Safety and Homeland Security Bureau administers NORS, monitors the outage reports submitted through NORS, and performs analyses and studies of the communications disruptions reported. Generally, a NORS report must be filed when the effects of an outage reach a certain threshold (e.g., lasting at least thirty minutes and potentially affecting 900,000 user-minutes). Then, the filing party has up to thirty days to supplement the filing with more complete information. The NORS team aggregates the data in order to identify outage trends. See 47 C.F.R. §4.1 et seq. See also Network Outage Reporting System (NORS), http://transition.fcc.gov/pshs/services/cip/nors/nors.html. 15 See 47 C.F.R. § 4.2. 16 See New Part 4 of the Commission’s Rules Concerning Disruptions to Communications, ET Docket No. 04-35, Report and Order and Further Notice of Proposed Rulemaking, 19 FCC Rcd 16830, 16855 ¶ 46 (2004) (noting that “the analytical substance of these reports is essential to the development and validation of best practices” and that “[a]s a consequence, we will also use information from those reports in analyses that will enable us to provide guidance to [advisory bodies and other organizations] in a way that does not provide sensitive information to those who might use it for hostile, or competitive, purposes”). 17 We note that providers filed in NORS on a rolling basis, with discrete providers filing multiple times during the event. 70 50 100 150 200 250 6/29 6/30 7/1 7/2 7/3 7/4 7/5 7/6 Incident Date Number of NORS Reports by Date Figure 1: Number of NORS Reports by Date of Incident Figure 2 provides a geographic breakdown of outage reports.18 As this chart illustrates, the effects of the derecho were widespread geographically, from Illinois in the west, to New Jersey in the east, with West Virginia, Maryland, Virginia and Ohio experiencing the largest impact on communications. 0 20 40 60 80 100 120 140 WV MD VA OH DC IN PA IL NJ TN DE KY Co un t State Number of NORS Reports by State Figure 2: Number of NORS Reports by State 18 There were 534 derecho-related NORS reports from June 29 to July 6, 2012. We included outage reports that mentioned the storm, or that indicated that the outage was power-related. Most of the outage reports were for events that occurred on June 29 or June 30. We included outage reports for events through July 6 because we believe that these reports included residual events from the restoration effort, although some of the reports from July 1 through July 6 may relate to other storms. 8Of the 534 outage reports filed, 111 describe effects to wireline users, 161 describe effects to wireless users, 249 describe effects to transport facilities, and eight reports describe effects to cable telephone users. In addition, forty-five reports involve 9-1-1 outages. Although there were numerous outage reports, most of the reported effects were in Virginia, Maryland, Ohio, and the District of Columbia. West Virginia had many PSAPs affected but relatively few wireline or wireless users.19 Other states experienced lesser – but not inconsequential – communications outages related to the storm. In total, more than 1.2 million wireline communications customers in twelve states experienced outages (not counting other residents affected by the inability to reach 9-1-1 on all platforms). In addition more than 30,000 high capacity transport lines (“DS3s”)20 were affected. On June 30, 2012, the Commission activated a modified and targeted version of the Disaster Information Reporting System (“DIRS”).21 DIRS allows service providers in the designated area to submit reports on the status of their networks to the Commission during emergencies on a voluntary basis.22 Working with the National Cybersecurity and Communications Integration Center (“NCCIC”) at the U.S. Department of Homeland Security, the Commission activated this modified version of DIRS at 4:20 p.m. for the following cities and counties: · West Virginia: Berkeley, Hampshire, Harrison, Jefferson, Kanawha, and Wood counties · Virginia: Alexandria, Manassas Park, Prince William County, Loudoun County, Arlington County, Falls Church, and Fairfax County · Maryland: Montgomery, Prince George’s, Calvert, Charles, Anne Arundel, and Howard counties, Baltimore City, and Baltimore County · The District of Columbia 19 We note that the derecho’s effects were widespread and severe across West Virginia, but its impact on population was greatest in densely-populated northern Virginia. 20 A Digital Signal 3 (“DS3”) line is a digital signal level 3 T-carrier. It may also be referred to as a T3 line. The data rate for this type of signal is 44.736 Mbit/s. See Digital Signal 3, WIKIPEDIA, http://en.wikipedia.org/wiki/Digital_Signal_3. 21 DIRS is a voluntary, web-based system that communications companies, including wireless, wireline, broadcast, and cable providers, can use to report communications infrastructure status and situational awareness information during times of crisis. See Disaster Information Reporting System (DIRS), http://transition.fcc.gov/pshs/services/cip/dirs/dirs.html. Information submitted into DIRS is presumed confidential but may be shared with federal agencies such as the Department of Homeland Security on a confidential basis. See The FCC’s Public Safety and Homeland Security Bureau Launches Disaster Information Reporting System (DIRS), Public Notice, DA 07-3871 (PSHSB 2007). 22 Sometimes a modified version of DIRS is activated, which calls for only certain fields in the system to be completed and only by certain types of communications providers. During the derecho, for example, the Commission did not seek DIRS data from companies such as broadcasters, who were generally understood to be less adversely impacted by this storm. 9Overall, the Bureau received DIRS submissions from seven communications providers. Additional Information from Communications Providers Shortly after the storm, the Bureau began a series of meetings with cable, wireline, and wireless providers, particularly those serving Virginia and West Virginia, and issued several follow-up requests for additional confidential information to supplement the NORS and DIRS data. In particular, the Bureau asked Verizon Communications, Inc. (“Verizon”) and Frontier Communications Corporation (“Frontier”), the 9-1-1 service providers23 in the most severely affected areas, to provide a detailed timeline of events relating to each PSAP in their service areas affected by the derecho. In addition to information about the cause of each outage, we asked for a description of the 9-1-1 architecture that supports each PSAP that lost service. The Bureau also asked the major wireless providers serving the affected areas—Verizon Wireless, AT&T, Sprint, and T-Mobile—for additional information on the causes of cell site outages during the storm and for information on cell site battery life. Follow-up requests were made of these and other providers, who supplied the Bureau with supplemental reports and explanatory e- mails.24 Providers generally were cooperative and responsive to our requests for information. One issue, however, affected the scope and timing of our inquiry. Information originally made available to the Bureau about whether the generators were in working order just prior to the storm at Verizon’s Arlington central office, where a loss of power severely affected 9-1-1 service in northern Virginia, differed from later accounts. Verizon indicated to the Bureau and a number of governmental entities at a meeting of the Metropolitan Washington Council of Governments (“MWCOG”), and on other occasions in July 2012, as reported by The Washington Post, that the critical generators at its Arlington central office had functioned properly during a maintenance test just days prior to the derecho.25 However, the Bureau learned from the Virginia State Corporation Commission (“SCC”) Preliminary Staff Report (“Virginia SCC Report”) 23 “9-1-1 service providers” are responsible for routing and delivering 9-1-1 calls to PSAPs. In the current 9-1-1 system, the “provider” of 9-1-1 service and associated network facilities is typically the incumbent local exchange carrier (“ILEC”) in a given area, which aggregates and delivers all incoming 9-1-1 calls (including those originating from other providers, such as wireless providers and VoIP providers) to the PSAP via selective routers. Next- generation technologies such as emergency services Internet protocol networks (“ESInets”), however, may change that pattern by more easily allowing entities other than ILECs to provide 9-1-1 service. 24 At publication of this report, Verizon supplemented its responses with a series of documents not provided earlier that the Bureau will review. 25 Verizon’s statements about the generator test were reported in the press. See Sullivan, Patricia, 911 Failure Affected 2.3 million in Northern Virginia, WASH. POST, July 11, 2012 (reporting that a Verizon representative told the MWCOG that “[b]oth generators underwent routine testing three days earlier”); Sullivan, Patricia and Flaherty, Mary Pat, Verizon, 911 Service Providers Out of Sync on Storm Outage, WASH. POST, July 12, 2012 (reporting that “one of the [Arlington] generators, which had worked smoothly in a routine test three days before, would not start”); Flaherty, Mary Pat, Verizon Details Errors in Derecho, Calls Response to 911 Outages ‘Insufficient,” WASH. POST, August 13, 2012 (repeating Verizon’s public statement that “the generators failed . . . despite having been tested three days earlier”). 10 when it was publicly released on September 14, 2012, that the Arlington central office generator did not pass the test, as confirmed by Verizon’s own maintenance logs.26 Additional Information from PSAPs In order to gather information regarding PSAP responses to the derecho, the Bureau interviewed personnel from twenty PSAPs in West Virginia, five PSAPs in Virginia, and three PSAPs in Ohio. We also considered public comments filed by Fairfax County, Virginia, as well as reply comments filed by Loudoun County, Virginia, and Montgomery County, Maryland, and an ex parte letter from Arlington County, Virginia.27 These comments, meetings, and telephone calls greatly improved our understanding of what happened during the storm and informed the recommendations we offer in this report to improve and strengthen 9-1-1 infrastructure and service. Public Comments in Response to Public Notice On July 18, 2012, the Bureau released a Public Notice seeking public comment on the background, causes, and restoration efforts related to communications services and facilities impacted directly or indirectly by the derecho.28 Specifically, the Bureau sought to develop a complete and accurate record of all the facts surrounding the outages resulting from the storm as well as outages resulting from other relevant natural disasters in order to evaluate the overall resiliency and reliability of the Nation’s 9-1-1 system and services. The Bureau also sought comment on the impact the outages had on various segments of the public, including consumers, hospitals and public safety entities.29 In response, the Bureau received forty-five filings, including twelve comments and reply comments from communications providers and trade associations, thirteen from PSAPs and public safety groups, and twelve from individuals.30 26 See Virginia State Corporation Commission, In the Matter of Investigating 911 Emergency Call Service Outages and Problems, Case No. PUC-2012-0042, Staff Report of Preliminary Findings at 8 (Sept. 14, 2012) (“Virginia SCC Report”) (finding that the generator that failed to start in the Arlington central office did not start during routine testing two days before the derecho and noting also that “[a] review of the maintenance logs for the backup generators in the Arlington and Fairfax central offices shows a lack of compliance with Verizon’s maintenance and testing procedures”). See also Flaherty, Mary Pat, Verizon ‘Failures and Deficiencies’ Blamed in Disruption of 911 Services During June Storm, WASH. POST, Sept. 14, 2012 (quoting a Verizon spokesman as saying that the company’s previous statement that the generator started successfully “was based on our knowledge at the time”). 27 See Fairfax County Comments; Reply Comments of Loudoun County, Virginia (Aug. 29, 2012); Reply Comments of Montgomery County, Maryland (Sept. 4, 2012); Ex parte Letter from Joseph N. Pelton, Chair, Arlington County (Virginia) Information Technology Advisory Committee, to FCC Chairman Julius Genachowski (Sept. 5, 2012). 28 See Derecho Public Notice, 27 FCC Rcd at 8131. 29 See id. 30 See, e.g., Comments of Dianna Arens, PS Docket No. 11-60 (urging undergrounding of wires and cutting back trees where undergrounding unfeasible) (Jul. 20, 2012); Comments of Robert F. Duffy, PS Docket No. 11-60 (raising issues related to battery lifespan) (Aug. 7, 2012); Comments of Phillip Wherry, PS Docket No. 11-60 (suggesting questions that the Commission ought to address to providers) (Jul. 20, 2012). 11 Industry Best Practices The Commission and Bureau have worked with a variety of industry and public safety organizations to develop voluntary solutions and recommendations to improve the reliability of emergency communications. In 2011, for example, the Bureau joined with the Alliance for Telecommunication Industry Solutions (“ATIS”) Network Reliability Steering Committee (“NRSC”) to develop recommendations to prevent failure of centralized automatic message accounting (“CAMA”) 9-1-1 trunks during mass call events, such as the spikes in 9-1-1 calling from natural disasters.31 Similarly, the Communications Security, Reliability, and Interoperability Council (“CSRIC”)32 is a chartered federal advisory committee tasked with developing recommendations for the Commission on actions to enhance the security, reliability, and interoperability of communications systems.33 Many CSRIC recommendations are voluntary “best practices” that the Bureau and stakeholders within the telecommunications industry encourage providers to implement at their discretion.34 As such, these best practices are generally not codified in Commission rules, and the Bureau gauges their implementation primarily through its review of providers’ mandatory outage reports. For reasons explained below, communications failures during the derecho revealed that many providers failed to implement crucial best practices developed by CSRIC that could have mitigated or prevented many of the storm’s most serious effects on communications networks, including 9-1-1 service outages. This failure, and the resulting damage, was costly. 3. Overview of the Derecho’s Impact on Communications The derecho affected communications in eleven states and the District of Columbia, with the greatest effects felt in Virginia, West Virginia, Maryland, and Ohio. Most significantly, the derecho disabled 9-1-1 service in counties in Virginia, West Virginia and Ohio, with the greatest number of people affected living in northern Virginia. Four northern Virginia PSAPs lost 9-1-1 service completely: Fairfax and Prince William Counties, Manassas, and Manassas Park. These 31 See NRSC 9-1-1 CAMA Trunk Throughput Optimization Analysis (ATIS-0100034) (rel. Aug. 2011), available at http://www.atis.org/legal/Docs/NRSC/CAMATrunk_Transmittal_Final.pdf. 32 See FCC Encyclopedia, Communications Security, Reliability, and Interoperability Council,http://www.fcc.gov/encyclopedia/communications-security-reliability-and-interoperability-council-iii. CSRIC replaces the Network Reliability and Interoperability Council (“NRIC”), which performed a similar role from 1992 until CSRIC’s creation in 2007. Members for each CSRIC have been selected from among commercial communications entities, public safety agencies, and consumer or community organizations or other non-profit entities to provide a diverse balance of expertise and viewpoints. 33 See CSRIC Charter, http://transition.fcc.gov/pshs/docs/advisory/csric/CSRC_charter_03-19-2009.pdf. 34 The volume of CSRIC best practices (including those developed by CSRIC’s predecessor organization NRIC) has grown to several thousand and covers a wide range of issues involved in providing communications services. In January 2011, CSRIC prioritized these best practices in an effort to help communications providers focus their application. All of the best practices and the prioritization are available on the Commission’s website. See https://www.fcc.gov/nors/outage/bestpractice/BestPractice.cfm. 12 9-1-1 communications failures resulted, in significant part, from the loss of commercial power followed by generator failures in Verizon’s Arlington and Fairfax central offices. Under current 9-1-1 architecture, the networks of incumbent wireline providers typically connect 9-1-1 call centers to those seeking help, whether the call for assistance originates on a landline or a wireless phone. Wireline outages, therefore, can have a more sweeping impact on 9-1-1 service than wireless outages. Below is a more detailed summary of our findings. Most Significant Wireline Service Impacts: · The derecho had significant effects on wireline communications infrastructure and resulted in partial or complete 9-1-1 service outages lasting up to several days for customers in large swaths of northern Virginia, West Virginia, and Ohio. The 9-1-1 connections affected by the derecho in Virginia are served by Verizon and CenturyLink. Frontier serves the affected PSAPs in West Virginia. Ohio’s affected PSAPs are served by Frontier, CenturyLink, and AT&T. · The greatest service losses occurred in northern Virginia as the result of backup power failures in Verizon’s Arlington and Fairfax central offices. The service losses included multiple switches in Virginia that became “SS7 isolated,”35 isolated 9-1-1 switches, and major transport system failures. · More than a dozen of Verizon’s host switches and more than three dozen remote switches in Virginia went out of service or were SS7 isolated from the telecommunications signaling network. · A significant amount of transport equipment failed, according to Verizon, because of power surges and low voltages in central offices. For example, more than 200 circuit boards in one of Verizon’s digital cross-connect systems36 failed and had to be replaced. 35 SS7 is a global standard for telecommunications defined by the International Telecommunication Union (“ITU”). The standard defines the procedures and protocol by which network elements in the public switched telephone network (“PSTN”) exchange information over a digital signaling network to effect wireless and wireline call setup, routing and control. See Performance Technologies, Inc., SS7 Tutorial, http://pt.com/resources/tutorials/ss7- tutorial. A switch becomes “SS7 isolated” when it cannot communicate with the rest of the SS7 network, meaning that interoffice calls cannot be completed. In practical terms, this means that only the most local of calls that originate and terminate through the same central office will go through. 36 A digital cross-connect system is a piece of circuit-switched network equipment that allows lower-level time- division multiplexing (“TDM”) bit streams to be rearranged and interconnected among higher-level TDM signals. These devices can be used for switching traffic from one circuit to another in the event of a network failure, supporting automated provisioning, and other applications. See Digital Cross Connect System, WIKIPEDIA, http://en.wikipedia.org/wiki/Digital_cross_connect_system. 13 Most Significant Wireless Service Impacts: · Overall, the wireless networks in the path of the derecho were adversely impacted, but performed better than during other powerful storms that the Bureau has tracked since establishing NORS, with just about 11 percent of all cell sites in the affected area down at the peak and a rapid restoration of service from July 2 through July 4. Among the states affected by the derecho, West Virginia suffered the greatest percentage of cell sites lost. No mobile switching centers were down during the storm. Failure of cell site infrastructure was a bigger concern during Superstorm Sandy than in the derecho, and the Bureau expects to learn considerably more on this issue through upcoming field hearings and related work.37 · The two main reasons reported for cell site outages were loss of power and the disabling of transport facilities that carry calls from cell sites to mobile switching centers. Cell sites were as likely to fail because of disruptions to landline backhaul communications as due to backup power exhaustion in the first days following the derecho. Still, backup power remains a serious concern for cell sites. The Bureau also learned that many service providers do not retain and analyze over time data concerning how long batteries called upon to power cell sites actually last. This information would seem helpful when making plans to replenish power to sites before battery power is exhausted. · Even where wireless networks remained operational, data and anecdotal evidence suggest that many wireless customers still could not reach 9-1-1 or complete calls to landlines because of coverage or congestion problems in wireless access networks and disruptions across the PSTN as a whole.38 · Less densely-populated West Virginia suffered the greatest percentage of cell sites lost, making that state susceptible to having sizeable areas in which some residents may not have had any wireless coverage at all.39 More urban areas, with their higher cell density, are more likely to get coverage during disasters – although any such service may suffer from higher network congestion during disasters since the available resources are shared across more users. 37 See FCC Chairman Genachowski Announces Post-Superstorm Sandy Field Hearings to Examine New Challenges to Resiliency of U.S. Communications Networks During Natural Disasters & Other Times of Crisis, News Release (Nov. 21, 2012) (“Superstorm Sandy Field Hearings PN”). 38 See, e.g., Sullivan, Patricia, After Storm, 911, Phone Service Remains Spotty, WASH. POST, July 2, 2012 (reporting that a Washington, D.C., resident’s cell phone “dropped half a dozen calls” and could not receive text messages for several hours). 39 See Cart, Kallie, Metro 911 Outage Causes Problems at Emergency Operations Center, WCHS EYEWITNESS NEWS, July 1, 2012 (reporting that “in some areas, cell service is sketchy,” and “the best thing to do is rely on your neighbors”). 14 Other Communications Issues The key role broadcasters played during and following the derecho should also be recognized. As in many times of crisis, broadcasters served as “first informers,” providing the public with information on the storm’s path, the damage it caused, and its effects on other communications services.40 Even where citizens affected by the storm lost commercial power, many could still receive radio and television broadcasts on battery-powered devices.41 PSAPs and other public safety entities took advantage of this capability by working together with broadcasters to provide updates on 9-1-1 service and alternate ways of obtaining emergency assistance.42 For instance, some reports advised residents to call 9-1-1 only for true emergencies, and provided ten-digit administrative numbers for other questions about cooling shelters and restoration of power.43 Since the June derecho, other regions of the United States have experienced significant weather-related disasters. In August 2012, Hurricane Isaac struck the Gulf Coast,44 and in October 2012, Superstorm Sandy struck the Eastern Seaboard – most destructively in New York and New Jersey.45 While all three of these events had major impacts on communications, the aftermath of the derecho stands out for having the most devastating impact on networks connecting citizens to 9-1-1 call centers. As discussed in depth below, the derecho resulted in switch failures and massive outages to 9-1-1 communications; the derecho thus revealed critical 9-1-1 and core backup power issues. In contrast, available data indicates that problems with 9-1-1 communications networks, including switches, were not as glaring in the regions impacted by Isaac or Sandy; powerful as those storms were, most of the basic 9-1-1 infrastructure appears to have held, and there were no reports of widespread, prolonged inability to reach emergency services through 9-1-1 due to network outages. There were, however, problems receiving location information and other data with some calls, and individual PSAPs may have had isolated, non-network issues. Because the derecho was swift and unanticipated, it differed from hurricanes and superstorms, like Isaac and Sandy, for which there typically is advance warning and an opportunity for communications providers to prepare by testing equipment, adjusting staffing levels and 40 See Comments of the National Association of Broadcasters, PS Docket No. 11-60, at 1-2 (Aug. 17, 2012). 41 See id. 42 See Fairfax County Comments at 12-13, Exhibit 3 (describing Fairfax County’s efforts to communicate with residents during the derecho through social and traditional media). But see No Answer at 911, WASH. POST, July 20, 2012 (“Local officials, desperate to get the word out, resorted to traditional and social media – which in turn were not accessible to many people who lost power and cellphone service.”). 43 See Cart, Kallie, Metro 911 Outage Causes Problems at Emergency Operations Center, WCHS EYEWITNESS NEWS ONLINE, July 1, 2012 (providing ten-digit numbers for non-emergency questions). 44 See, e.g., Robertson, Campbell and Severson, Kim, Isaac Drenches Gulf Coast and High Water Cuts Off Many, N.Y TIMES, Aug. 29, 2012. 45 See, e.g., Barron, James, Storm Barrels Through Region, Leaving Destructive Path, N.Y. TIMES, Oct. 29, 2012. 15 supplementing training over an extended period of time. As noted, the derecho arrived with very little notice, and in that sense, had more in common with an earthquake, tornado, or man- made event. Nonetheless, all three events (i.e., the derecho, the hurricane, and the superstorm) highlighted shortcomings in the reliability and resiliency of communications, and raised concerns about commercial power and telecommunications providers’ implementation of procedures to ensure adequate backup power.46 Moreover, such events shed light on the possible impact of power outages on consumers who rely at their premises on communications devices that operate on commercial power (as opposed to power through a copper telephone line) and/or have a limited battery life (e.g., cordless phones, Voice over Internet Protocol (“VoIP”)47 equipment, and cell phones).48 These and other vital issues will be considered more fully after the Commission has concluded its field hearings, which will include those topics. 4. The Derecho’s Effects on Wireline Networks In this section, we review the derecho’s impact on core wireline networks. This impact was widespread and severe not only for wireline customers, but also for users of other communications services that rely on the wireline network. Most notably, because certain wireline infrastructure supports 9-1-1 calls placed from wireless as well as wireline phones, the failures discussed in this section affected 9-1-1 service for both wireless and wireline customers. 4.1 Backup Power Problems The derecho exposed significant vulnerabilities in wireline providers’ central-office infrastructure, specifically backup power systems designed to maintain service during commercial power outages. Verizon in particular reported a significant number of backup generator failures, resulting in the failure of key network components and the loss of vital communications services to customers across much of the Mid-Atlantic region. According to Verizon, nine out of 136 central-office backup generators in Verizon’s service area affected by 46 See, e.g., Chen, Brian, Cellphone Users Steaming at Hit-or-Miss Service, N.Y. TIMES, November 2, 2012. See also Press Release, U.S. Senator Charles E. Schumer, Schumer Calls on Federal Communications Commission To Ensure That Vital Communications Networks Aren’t Cut After Storms or Power Failures (Nov. 19, 2012) (stating that “thousands of residents and first responders [were] left without vital cell service in the wake of Sandy, hindering response and endangering lives,” and calling for a “plan to ensure continuation of cell service in the wake of disasters”). 47 VoIP can refer to facilities-based services, interconnected services, and services that do not connect to the PSTN. VoIP technologies generally differ from traditional telephone service in that VoIP operates on commercial power rather than power from a central office through a copper line. This can have impacts for customers if commercial power is lost for long periods of time. 48 See Metropolitan Washington Council of Governments 9-1-1 Telecommunications Network Steering Group, Preliminary Report of 9-1-1 Service Gaps During and Following the Derecho Storm on June 29, 2012 (Nov. 14, 2012) at 16 (“MWCOG Report”) (calling for increased awareness of the “vulnerability of newer technologies that require battery or backup power”). 16 the derecho did not operate properly.49 Thus, about 7 percent of Verizon’s generators in central offices failed to operate properly when needed. Frontier also reported generator failures in West Virginia and Ohio, causing service disruptions in those states. Some Frontier remote terminals were not equipped with backup generators, and the vulnerability of portable generators placed at other sites led to additional difficulties supplying power to these facilities. 4.1.1 Backup Power Problems in Virginia Verizon acknowledges that generator failures in its Arlington and Fairfax central offices after the loss of commercial power were the primary causes of the 9-1-1 network failures in northern Virginia.50 In addition, these backup power failures caused widespread communications disruptions across Virginia. For example, critical switching and transport equipment was lost when the Arlington central office went down, seriously impacting Verizon’s network in northern Virginia. In Fairfax, loss of transport equipment made inter-switch communications impossible, which also had broad impacts on customers and PSAPs alike. Together, these problems resulted in more than a dozen switches being SS7 isolated. Because of their importance, we discuss the effects at each office separately. 4.1.1.1 Verizon’s Arlington Central Office The Arlington central office played an important role in the delivery of 9-1-1 services in much of northern Virginia, yet it is not clear Verizon understood the extent of this vulnerability until after the derecho. The failure of backup power at the Arlington central office directly resulted in the loss of 9-1-1 service to residents in northern Virginia, key switching capabilities, and virtually all of Verizon’s network monitoring capabilities in the area. These high-impact failures could have been prevented, or at least mitigated, through the application of CSRIC best practices and other sound engineering practices. The specific events leading to the loss of 9-1-1 service are as follows: At 10:55 p.m. on June 29, the Arlington central office lost commercial power. The backup power system at the Arlington site includes two generators that must operate in tandem to power the facility, which contravenes CSRIC best practice 8-7-5281.51 According to Verizon, one of the two generators 49 See Verizon, 911 Service and the June 29, 2012, Derecho at 1, footnote 1 (Aug. 13, 2012) (“Verizon Public Report”). 50 See Comments of Verizon and Verizon Wireless at 2 (Aug. 17, 2012) (“Verizon Comments”); Verizon Public Report at 2. 51 CSRIC Best Practice 8-7-5281 provides that “[n]etwork operators, service providers and property managers with buildings serviced by more than one emergency generator, should design, install and maintain each generator as a standalone unit that is not dependent on the operation of another generator for proper functioning, including fuel supply path.” See https://www.fcc.gov/nors/outage/bestpractice/DetailedBestPractice.cfm?number=8-7-5281. 17 failed to start because air had entered the fuel system.52 Consequently, the second generator became overloaded and automatically shut down.53 A power technician arrived at 12:28 a.m. on June 30 but could not start the generator.54 Verizon personnel also attempted to connect a mobile replacement generator, but commercial power was restored before the portable generator could be brought online.55 Without any power from backup generators, Verizon’s Arlington central office ran on battery power for approximately six hours until the batteries ran out around 5:00 a.m. on June 30.56 According to Verizon, some network equipment is particularly sensitive to low voltages and failed even before the batteries were completely exhausted.57 Commercial power was restored to the Arlington central office at 12:45 p.m. on June 30, about eight hours after the batteries fully depleted.58 Although it contravenes a best practice to rely on two generators in tandem to power an entire central office, providers should at a minimum develop procedures to power critical equipment with one generator if the other fails, or to add a third generator, thereby preventing a complete loss of service. In Arlington, according to Verizon, the procedures for getting the working generator online were complicated and were not available to the on-site technicians during the derecho. Verizon has since developed procedures for bringing one generator online and shedding non-essential electrical loads when the other generator in a pair fails. Verizon also states that it is adding a third generator backup, and may take steps to make additional backup generators easier to connect. Generators are critical equipment needed during disasters, when commercial power often fails. Here, Verizon’s Arlington generators powered a key central office with critical network equipment. An important central office such as this should not be left in a vulnerable state in which a commercial power failure could result in all the equipment in a central office failing. Worse, Verizon allowed this critical facility to operate with effectively no backup generator power because one Arlington generator had failed to start during routine testing before the derecho and a single generator could not power the entire office. Verizon also stated that actual load testing of the Arlington generators had been suspended since at least 2011 because of a problem with the uninterruptible power supply (“UPS”) in that office. Verizon’s standard operating procedures appear to require monthly generator tests 52 See Verizon Public Report at 3-4. 53 See id. at 3. 54 See id. 55 See id. 56 See id. 57 See id. 58 See id. 18 under actual site load.59 Yet, it does not appear that Verizon corrected the UPS problem at the Arlington central office to permit actual site load testing of the generators until after the derecho. That fact means that, for at least several months, Verizon failed to follow its own maintenance procedures. Finally, the Arlington office was crucial to Verizon’s ability to use its geographically distributed network operations centers (“NOCs”) to monitor the status of equipment at thirty-four sites across its northern Virginia network.60 For example, multiple Verizon NOCs provided visibility about the operating status of critical assets in northern Virginia, but these NOCs were interconnected only through the Arlington hub, which Verizon is now in the process of diversifying. Having a single point of vulnerability that affects monitoring of such a large and densely populated service area is unwise because communications providers rely heavily on telemetry data to identify failing equipment, determine staffing levels, and prioritize repairs during a disaster. Backup power for the monitoring equipment in Arlington was provided by a UPS that lasts only thirty minutes.61 Thus, thirty minutes after losing commercial and generator power in Arlington, and shortly after the rest of the Arlington office went on battery power, Verizon lost the ability to monitor its network in northern Virginia, crippling its repair efforts and ability to receive alarms that signal additional equipment failures.62 Accurate telemetry is essential to restoring service effectively after an outage, and leaving network monitoring equipment solely supported by a thirty-minute UPS, let alone at such a critical point, is not adequate backup power. 4.1.1.2 Verizon’s Fairfax Central Office Like Arlington, Verizon’s Fairfax central office is a major hub for routing calls in northern Virginia. In addition to housing several switches, the Fairfax central office also hosts a large collection of transport equipment used to route 9-1-1 calls in the area. The Fairfax central office has two backup generators, each powering different network components on different floors.63 When commercial power failed at 10:35 p.m. on June 29, the generator on the second floor failed to start because of a malfunction in the auto-start mechanism.64 The second floor of the Fairfax office was powered for several hours by 59 Testing under actual site load means that the office is switched off the commercial power grid onto generator power. Unlike testing with simulated load banks, this method verifies that the generator can power the office in an emergency. 60 See Verizon Public Report at 3; Virginia SCC Report at 9. 61 See Virginia SCC Report at 9. The telemetry equipment used to terminate the monitoring links apparently requires line power, not the direct current (“DC”) power provided by the battery plant typically deployed in central offices like Arlington. 62 See Verizon Public Report at 3; Virginia SCC Report at 9. 63 See Verizon Public Report at 3. 64 See id. at 4. 19 batteries. However, by 2:30 a.m. on June 30, the battery voltage had dropped below the level needed to power some equipment on the second floor, and by 6:15 a.m. on June 30, the voltage was too low to power any equipment.65 According to Verizon, when a technician arrived in the office around 7:30 a.m. on June 30, the technician noticed that the generator on the lower floor was functioning but failed to check the generator on the second floor.66 By 12:15 p.m. on June 30, according to Verizon, the technician noticed the problem and manually restarted the second generator.67 The error that Verizon describes could – and should – have been prevented through better internal procedures and training and improved communication with technicians in the field. Of course, it is not clear that the technician knew about equipment failures in the Fairfax office in light of Verizon’s inability to monitor its network. For important buildings with more than one generator, the procedures for determining the effects on power should include verifying that each generator is functioning. Furthermore, a miscommunication within Verizon about the severity of the problem in the Fairfax office appears to have hindered restoration efforts. Although Verizon received a “battery on discharge” alarm signifying a power failure at the Fairfax office, the alarm was labeled “major” rather than “critical,” reducing its priority in the maintenance process.68 When batteries in a central office are draining because of a generator failure, the alarms for these conditions should be labeled “critical” and resolved urgently.69 4.1.1.3 Other Verizon Generator Problems As noted previously, Verizon reported nine generator failures out of 136 central office backup generators in its impacted service area.70 In addition to the failures mentioned above at the Arlington and Fairfax central offices, there were generator failures at Verizon facilities in Virginia and Maryland resulting from factors such as blown fuses, exhausted start batteries, a faulty voltage regulator, and failure of an auto-refueling mechanism. One generator initially started but failed to produce adequate power under load until a technician manually shed non- essential loads. At one location, Verizon personnel were able to connect a portable replacement generator before commercial power was restored. 65 See id. at 3. 66 See id. at 4. 67 See id. 68 See Virginia SCC Report at 9. 69 It is our understanding that Verizon has committed to do this. See Verizon Comments at 32 (stating that “[w]e have enhanced our notification and mobilization procedures to trigger activity more quickly when batteries are activated or when telemetry is lost”). 70 See Verizon Public Report, at 1, footnote 1. 20 Notably, Verizon could not locate maintenance logs for one of the generators that failed when the Bureau requested that information. Although Verizon states that its “rigorous network resiliency and service restoration practices largely worked as designed throughout most of the area affected by the Derecho,”71 the Bureau has serious concerns regarding Verizon’s actual repair practices and compliance with its own maintenance policies. 4.1.2 Backup Power Problems in West Virginia In West Virginia, generator failures also led to disruptions in communications over Frontier’s network, affecting more than 15,000 wireline customers. Commercial power outages following the derecho forced 126 of Frontier’s 230 wireline facilities in the state to operate on generator power.72 Many Frontier remote terminals are not equipped with backup generators due either to their small size or physical location. In St. Albans, West Virginia, a central-office backup generator operated for approximately twenty-two hours before failing because of a broken belt, resulting in telephone service outages to over 8,600 customers for more than three hours.73 Generator problems in Petersburg and Worthington, West Virginia, affected more than 3,300 customers, and issues with four smaller generators in remote facilities disrupted service to another 3,700 customers across the state. Although Frontier deployed mobile generators in some locations, as many as twenty were stolen from Frontier facilities in the storm’s aftermath, causing additional backup power failures.74 4.2 Switch Effects 4.2.1 Verizon Switch Effects During the derecho, more than a dozen Verizon host switches went out of service in areas ranging from Baltimore, Maryland, to Richmond, Virginia. Ten of these were SS7 isolated75 due to transport failures in Verizon’s network, and four lost power because of generator failures. Over three dozen of Verizon’s remote switches also went out of service during the storm. In all, nearly 1 million wireline customers lost service for durations ranging from two hours to more than two days. 71 See Verizon Comments at 2. 72 Comments of Frontier Communications Corp. at 2 (Aug. 17, 2012) (“Frontier Comments”). 73 See generally Frontier Comments at 4 (stating that “generator use . . . was not a foolproof solution” and that “[i]n some instances back-up generators pre-placed in the [c]entral [o]ffices failed, which led to outages while Frontier replaced those generators”). 74 See Frontier Comments at 5; Workman, Megan, Frontier, Suddenlink Bills Will Be Prorated, THE CHARLESTON GAZETTE, July 10, 2012 (reporting that “[t]hieves stole generators at multiple Frontier facilities, which affected service to customers”); Staff Report, Eleven Arrested in Storm Generator Thefts, THE CHARLESTON GAZETTE, July 26, 2012. 75 See supra, note 35. 21 In the Bureau’s view, Verizon could not initially explain the routing of key circuits in its network when the Bureau inquired how failures in one office could affect communications in other areas. Based on this uncertainty and the time that passed before we received this information, it is clear to the Bureau that Verizon was not fully aware of the routing of its own critical circuits until a considerable time after they failed. 4.2.2 Switch Effects in West Virginia According to Frontier, the derecho caused five host switches and more than thirty remote switches to fail or become isolated across West Virginia, impacting more than 50,000 wireline customers. The major causes of these failures included fiber cuts and extended commercial power outages leading to drained backup batteries. Not all power-related switch outages involved generator failures, however, because it is not always feasible to equip smaller switching hubs with backup generators. Frontier also lost the ability to monitor its network in West Virginia, preventing it from receiving network alarms and disrupting its ability to reroute 9-1-1 calls. Frontier attributed this problem to a combination of a fiber cut, which caused the initial outage, and the fact that it had disconnected a backup circuit, which would have provided needed redundancy, a week prior to the storm. Without functional network monitoring, Frontier’s NOCs obtained network access through a more time-consuming dial-up network before they could begin rerouting 9-1-1 calls. Individual Frontier host switch outages affected between 870 and 8,600 customers each and lasted from 3.3 hours to more than seven days. Frontier also experienced more than thirty remote switch outages affecting approximately 31,000 customers. Individual remote switch outages affected between 127 and 4,800 customers for five hours to nearly ten days and were caused by loss of commercial power and subsequent exhaustion of backup power, SS7 isolation, and transport network failures. 4.3 Effects on Transport Equipment The derecho had a significant effect on transport equipment. This equipment supports network traffic of all kinds, from 9-1-1 calls to everyday communications. For example, more than 600 circuit boards in high-capacity digital cross-connect systems in Verizon’s network were destroyed, likely by power surges or the low voltage that occurred as batteries drained. With one digital cross-connect system, approximately 200 circuit packs failed when fuses mounted on each circuit board blew. 4.4 Proposed Corrective Actions from Wireline Providers 4.4.1 Corrective Actions Proposed by Verizon Since the Bureau began its inquiry, Verizon has submitted materials to the FCC and other regulatory bodies indicating that it has repaired defective equipment, revised internal policies, 22 and taken other measures to prevent similar failures in the future. Some of these changes have already occurred in areas affected by the derecho, while others are planned to be implemented over time in other service areas. In July 2012, Verizon and metropolitan Washington PSAPs agreed on a set of immediate actions, including adoption of the National Incident Management System (“NIMS”) model for disaster response; use of a Reverse 9-1-1® type system to notify PSAPs of known or suspected 9-1-1 outages; semi-annual 9-1-1 outage drills; and a current contact list of Verizon personnel, escalating to the vice-president level, that PSAPs can reach directly in the event of an outage.76 Area PSAPs also requested a Verizon representative to be physically present at each jurisdiction’s operations center during an emergency, although Verizon has said it is exploring “virtual participation” through electronic communications. As of November 2012, Verizon and MWCOG reported that these steps are in “various stages of completion.”77 According to filings in PS Docket No. 11-60 and other fora, Verizon also has committed to the following remedial actions: · Backup power system audits of “mission critical” facilities in the Washington metropolitan region, including the Arlington and Fairfax central offices.78 · A complete review and update of monthly and annual preventative maintenance requirements for generators, batteries, and rectifiers that supply power at host central offices. Power technicians will be trained in critical facility “blackout” testing to simulate total commercial power failure, as well as manual generator start procedures and “prioritized system load transfer” scenarios to distribute backup power to critical equipment.79 · A redesign of its telemetry network to ensure that it conforms to new diversity guidelines adopted by Verizon in August 2012, after the derecho. Verizon has committed to redesigning its E-9-1-1 tandem architecture in Virginia according to these guidelines.80 · Meetings with the Fairfax County, Prince William County, Manassas, and Manassas Park PSAPs to discuss 9-1-1 trunk diversity improvements specific to each jurisdiction.81 76 See Verizon Public Report at 7-9; MWCOG Report at 9. 77 See MWCOG Report at 9. 78 See Verizon Comments at 5. 79 See Verizon Public Report at 5; Verizon Comments at 5. 80 See Verizon Comments at 5. 81 See Verizon Public Report at 6 (stating that “Verizon will work directly with the specific PSAP partners to decide on improvements”). 23 Verizon also notes that its wireline division’s practices for follow-up when a generator fails a routine test are being incorporated into a standard, company-wide policy. Specifically: · The power technician immediately contacts the supervisor if a power equipment deficiency is identified that has potential for broader impact on network power or network reliability, prior to leaving the site. · The power technician and supervisor jointly assess the criticality of the problem and formulate and implement a multi-solution “action plan.” The action plan can involve the purchase of new parts, contacting appropriate vendors, procuring a portable generator, and escalating service restoration with the utility, as appropriate. The supervisor will provide the area manager with initial notification, and the area manager, in turn, will communicate the information upward to director-level management and above as necessary. · The power technician updates the National Power Network Operations Center and e- mails other members of the local power team, advising them of the abnormal condition and current action plan. The supervisor and/or vacation coverage supervisor (if applicable) are copied on the e-mails. · The supervisor updates the area manager with the action plan, and the area manager will communicate the information upward to director-level management and above as necessary. · The supervisor submits a “jeopardy report” (depending on the nature and impact of the outage) following site stabilization. The Bureau believes that these actions are appropriate steps, but are not yet sufficient based on the Bureau’s assessment of the information it has received thus far. Verizon correctly notes positive changes in its backup power and network diversity policies and progress in implementing improvements, particularly in the region that includes Virginia, with respect to the facilities most relevant to the provision of connections vital to 9-1-1 service, even as it recognizes it has more to accomplish. It also has much more to do nationwide. For example, we have not received information that would lead us to conclude that Verizon has determined yet whether it has other generator configurations similar to Arlington across its entire footprint, although it states that it is working to do so, with a specific budget and timetable, and has made that determination in some areas. There also are metrics as to which we have not received information to assess—e.g., whether Verizon has committed to maintain central-office backup power for any specific length of time or established specific timeframes for generator repairs. Valuable progress on the auditing of 9-1-1 circuits in some areas has been made, even as further information is sought about the timetable and budget for completing the task throughout Verizon’s relevant service areas. Moreover, this inquiry has raised significant concerns about the extent to which Verizon has followed its stated maintenance procedures in 24 the past. It would be beneficial if Verizon provided additional clear and specific commitments, with budgets and specific timetables for implementation of upgrades and improvements. 4.4.2 Corrective Actions Proposed by Frontier Frontier’s public comments provided the following list of “lessons learned” based on its investigation into the derecho’s impacts on PSAPs. Generally, the Bureau agrees with Frontier’s commitments to provide alternate ways to monitor its network, audit 9-1-1 circuits for diversity, and ensure that backup generators are available in locations affecting PSAPs. These commitments, however, still require additional specificity, budget, and timetables. Moreover, the derecho revealed instances where many of these procedures should have been followed already, and Frontier must demonstrate that it is committed to implementing them in practice as well as in theory. · Enhancing preventative maintenance plans to include proactively testing its backup modems monthly. · Performing quarterly checks via dial-up modems on host offices to ensure network reliability. · Reviewing Frontier’s network to determine where additional redundancy would be feasible. · Adding additional remote access to Points of Presence (“POPs”) to ensure increased ability to monitor the network. · Prioritizing 9-1-1 center sites and facilities for generator back-up. · Revising and augmenting Frontier’s generator plan. · Establishing alternate dial-up access to key switches and exploring other backup options.82 4.5 Wireline Outages in Other States More than 300,000 wireline customers in Ohio, Maryland, Illinois, Pennsylvania, Indiana, Tennessee, New Jersey, and the District of Columbia lost service to some degree during the derecho, according to an analysis of aggregated NORS data. The bulk of those outages occurred in Ohio, with more than 170,000 wireline customers affected for periods ranging from one hour to more than five days, and in Maryland, where roughly 100,000 wireline customers lost service for as long as six days. 82 See Frontier Comments at 6-7. 25 Most of the Ohio outages resulted from loss of commercial power followed by failure of backup power systems, based on an analysis of NORS data. At least two backup generators in Ohio failed to start, leading to central office outages once backup batteries ran out. Repair crews attempted to distribute portable generators to remote terminals without dedicated backup power, but in some cases had to “rotate” available generators between multiple locations until commercial power was restored. The outages in Maryland were caused by a number of factors, such as transport and switching failures, in conjunction with loss of commercial power. A lesser number of reported outages in Maryland resulted from lightning strikes, cable cuts, and storm- related damage to network infrastructure. 5. The Derecho’s Effects on Public Safety Answering Points 5.1 Description of the 9-1-1 Network The 9-1-1 network, which is shown at a high level as typically configured in Figure 3, was designed so that anyone dialing 9-1-1 will have his or her call answered by the geographically appropriate PSAP. The primary function of the 9-1-1 network is to route the call to the PSAP serving the caller. The calling number and the location of the caller are also sent to the PSAP to assist in responding to the emergency. PSTN (Selective Router) Mobile Switching Center (MSC) PSAP VoIP Wireless Wireline Figure 3: 9-1-1 Network Architecture When a caller dials 9-1-1 on a wireline telephone, the call is handled similarly to any other call and goes to the local switch serving that caller. The local switch then sends the call to an aggregation point called a selective router, which uses the caller’s phone number and address to determine to which PSAP the call should be sent. Calls to 9-1-1 from wireless phones flow through a switch called a mobile switching center before reaching the selective router. For 26 wireless calls, the sector of the cell tower serving the call provides the approximate location of the caller and is used to determine to which PSAP the call is sent. To complete the call, a circuit is set up between the selective router and the appropriate PSAP. If all the usual circuits are busy or out of service between a selective router and the PSAP, the selective router implements an alternate path called a reroute. Once a call reaches the PSAP, the PSAP queries an automatic location information (“ALI”)83 database to determine the location of the caller. For wireline calls, this location is based on the address associated with the caller’s phone number. For wireless calls, providers use various technologies to determine the caller’s location based on global positioning system (“GPS”) data from the caller’s phone and/or other network resources. Because ALI information is passed to the PSAP along a different path than the one carrying 9-1-1 calls, it is possible for a PSAP to lose ALI links without losing 9-1-1 service completely. 5.2 Network Effects on PSAPs Because PSAPs typically rely on the incumbent wireline communications provider’s infrastructure to receive emergency calls placed from any device, failures in Verizon and Frontier’s wireline networks also disrupted 9-1-1 service to many PSAPs. While any communications breakdown raises concerns, disaster-caused breakdowns affecting 9-1-1 service heighten those concerns because they directly affect the public’s ability to call for help at a time when it is likely that the greatest number of people will need emergency service. During and after the derecho, at least seventy-seven PSAPs experienced adverse effects ranging from a complete loss of 9-1-1 service to failures of ALI and/or automatic number information (“ANI”)84 and other partial disruptions. The following table shows the location and causes of complete or partial service disruptions to PSAPs in various states: 83 ALI provides the PSAP with the caller’s telephone number, the address/location of the telephone, and supplementary emergency services information. See Glossary of Technical Terms: Wireless E 9-1-1, http://www.apcowireless.com/library/ICMAGlossary.pdf. 84 ANI provides only the telephone number associated with the access line from which a 9-1-1 call originates. See Glossary of Technical Terms: Wireless E 9-1-1, http://www.apcowireless.com/library/ICMAGlossary.pdf. 27 State Completely down ALI/reroute/9-1-1 circuits failed CAMA trunk issue85 Total West Virginia 11 15 1 27 Virginia 4 21 1 26 Ohio 2 11 1 14 New Jersey 6 6 Maryland 3 3 Indiana 1 1 Total 17 57 3 77 Table 1: Summary of PSAP Effects 5.3 Network Effects on PSAPs in Virginia The most significant impact of Verizon’s network problems in northern Virginia was to cause the total loss of 9-1-1 service connectivity to the Fairfax County, Prince William County, City of Manassas, and Manassas Park PSAPs. Specifically, [t]he failure of one of two back-up generators to start at each of [Verizon’s] Arlington and Fairfax central offices following the loss of commercial power caused the Northern Virginia 911 disruptions. Multiple failures cascading from these specific generator problems and damage to the transport network combined to cause the outages for the four PSAPs. Included among those failures were systems that enable [Verizon] to monitor the condition of [its] network facilities in Northern Virginia, and that loss of visibility over [Verizon’s] network hindered [its] initial efforts to assess and repair damages.86 PSAPs, lacking information from Verizon, at times arranged reroutes to non-working routes after the regular route again became operational, or continued to attempt to reroute even though service could have been restored on the original route.87 85 Centralized Automatic Message Accounting (“CAMA”) trunks are a legacy technology used to route 9-1-1 calls to PSAPs in many jurisdictions. During times when a PSAP receives a large volume of calls, a timing mismatch between the selective router and the customer premises equipment (“CPE”) at the PSAP results in trunks being taken out of service even though these trunks have not failed. Whenever a trunk is taken out of service, the number of 9-1-1 calls that can be processed is reduced. 86 Verizon Public Report at 2. See also Fairfax County Comments at 14-18 (noting that the 9-1-1 outage was caused by the failure of Verizon’s backup power sources and equipment failure/damage, and that the outage was compounded by Verizon’s failure to provide prompt and effective notice to the Fairfax PSAP). 87 See Verizon Public Report at 6 (certain PSAPs “would have been better off [not rerouting calls], but without the appropriate information, they were unable to make that determination at the time”). 28 5.4 Virginia PSAPs 5.4.1 Fairfax County Fairfax County has a population of more than 1.1 million people, about 20 percent of the entire population of the Washington, D.C. metropolitan area.88 Its PSAP, the largest in Virginia and among the ten largest in the United States, receives approximately 1 million 9-1-1 calls per year.89 According to Fairfax County’s comments in response to the Bureau’s Public Notice:90 [T]he derecho hit Fairfax County at approximately 10:30 at night on Friday, June 29, 2012. At 7:36 the next morning, as hundreds of thousands of County residents awoke to assess the full extent of the damage in daylight, the phones stopped ringing. Over the next seven hours, no calls were completed to the County’s 9-1-1 Call Center. On the afternoon of Saturday, June 30, sporadic, incomplete service was restored. Three additional days passed before 9-1-1 service was fully restored at 11:30 a.m. on Tuesday, July 3, 2012. Fairfax County’s 9-1-1 Call Center operated exactly as it was designed, intended, and constructed to operate. Verizon has told the County that the 9-1-1 failure was caused primarily by a loss of power in two key Verizon central offices (Arlington and Fairfax). Many other Verizon central offices in the region lost power as well, which exacerbated the effects of the loss of Verizon’s Arlington and Fairfax central offices. The 9-1-1 failure affected several other Northern Virginia jurisdictions in addition to Fairfax County. The data Verizon has provided the County to date show that nearly 1,900 calls made to 9-1-1 entered Verizon’s system but were not routed to the County during the first 29 hours. The County is awaiting the receipt of additional data from Verizon.91 The Fairfax County PSAP is served by redundant selective routers. However, this redundancy was compromised as a result of the transport failures caused by the generator failure in Verizon’s Fairfax central office and other single points of failure in Verizon’s network. According to Fairfax County: 88 Fairfax County Comments at 3. 89 Id. at 4. 90 See Derecho Public Notice, 27 FCC Rcd at 8131. 91 Fairfax County Comments at 2 (footnotes omitted; emphasis in the original.) 29 [P]ortions of [Verizon’s] transport network, in effect one of the “main communications highways” used to complete 9-1-1 calls, and in many cases even regular phone calls, were not operational for more than seven hours on [June 30, 2012]. Power equipment in Verizon’s Fairfax central office also failed, isolating the Fairfax E9-1-1 tandem switch and preventing the routing of 9-1-1 calls to the Fairfax County PSAP through the Fairfax network route. The Alexandria E9-1-1 tandem switch, a secondary route for 9-1-1 call transport, remained operational but the capability to route 9-1-1 calls to the Fairfax County PSAP failed. Essentially, the Verizon- provided 9-1-1 telephone switching systems into Fairfax County facilities were not operating, even though the Fairfax County PSAP staff and Fairfax County telephone and computer systems were operational and unaffected by the power outages (Fairfax County has power-fail systems implemented in its PSAP).92 A diversity audit by Verizon, as called for in CSRIC best practice 8-7-0532,93 might have identified these single points of failure and prevented the loss of service to the Fairfax County PSAP through additional redundant connectivity. According to Fairfax County, Verizon also did not provide adequate notification of the existence and scope of problems in its network affecting 9-1-1 service. Instead: Verizon sent a cryptic e-mail to designated Fairfax County staff saying that the Arlington central office was without power or backup battery/generator. The references to Arlington suggested that 9-1-1 service was affected only in Arlington County. Without a corresponding phone call explaining the situation and the e- mail, Fairfax County’s PSAP staff continued with their normal operations, unaware that incoming 9-1-1 call service from Verizon was about to rapidly deteriorate.94 Fairfax County also identified multiple previous outages where Verizon failures affected PSAPs in the region.95 It asserts that this outage “was the latest in a series of recent 9-1-1 problems in 92 Fairfax County Comments at 15. 93 CSRIC Best Practice 8-7-0532 provides that “[n]etwork operators should periodically audit the physical and logical diversity called for by network design and take appropriate measures as needed.” See https://www.fcc.gov/nors/outage/bestpractice/DetailedBestPractice.cfm?number=8-7-0532. 94 Fairfax County Comments at 7. 95 See Fairfax County Comments at 18-20. Disruptions to 9-1-1 service availability included during a January 2011 snowstorm in the Washington, D.C., metropolitan area; a February 2011 equipment failure at Verizon’s Fairfax central office; and a May 2011 power failure in Verizon’s Newark, New Jersey, central office that affected wireless ALI data for PSAPs in Maryland, Virginia (including parts of Fairfax County), Delaware, and Pennsylvania. 30 the National Capital Region that demonstrate that the 9-1-1 infrastructure is not as resilient or as reliable as it needs to be.”96 5.4.2 Prince William County Prince William County has a population of 420,000. The PSAP is served by redundant selective routers. In June 2012, before the derecho, the Prince William County PSAP received 15,940 calls to 9-1-1, and 25,698 non-emergency calls. 97 The Prince William County PSAP told the Bureau that by 8:07 a.m. on June 30, PSAP staff had determined that they were not receiving 9-1-1 calls. The PSAP attempted unsuccessfully to reroute the calls to backup phones, but found that the only lines working were the area code 703 non-emergency numbers. At that point Prince William County PSAP officials contacted Verizon to let it know that circuits were down. At the same time, they began to inform the public via media and alerting systems to use the non-emergency numbers to contact the PSAP. At about 10:00 a.m., the Virginia Communications Coordinator (“VCC”) contacted the Prince William PSAP to see if it was having a problem. The VCC then contacted the FCC Operations Center, which also contacted Prince William County. A half-hour later, at 10:30 a.m., Verizon’s technical service manager contacted the PSAP to say Verizon was experiencing 9-1-1 problems and was working on the problems. Verizon tried unsuccessfully to reroute 9-1-1 calls to the Prince William County PSAP’s administrative lines. By the afternoon of June 30, the PSAP began receiving intermittent wireless calls, and by early morning on July 1, it received intermittent wireless and landline calls. Around 11:17 a.m. on July 1, the PSAP was processing most 9-1-1 calls, albeit without ANI or ALI. By early on July 2, Verizon had restored ANI and ALI. 5.4.3 City of Manassas The City of Manassas, Virginia, has a population of approximately 38,000 and is surrounded completely by Prince William County. The Manassas PSAP is served by redundant selective routers and normally receives fifty to sixty 9-1-1 calls per day from wireline users. Wireless 9-1- 1 calls generated from Manassas are normally processed by the Prince William County PSAP, which then routes them to Manassas if the caller needs police. Prince William County dispatches fire and other emergency personnel. Representatives of the Manassas PSAP told the Bureau that on June 30 between 6:00 and 6:55 a.m., the Manassas PSAP realized it was not receiving 9-1-1 calls, and contacted Verizon to alert them to the outage. At about the same time, the Manassas PSAP received an e-mail from 96 Id. at 3. 97 Approximately 60 percent of these calls were wireless, and 40 percent were wireline. 31 Verizon stating that it was aware of the situation. Despite the 9-1-1 service outage, all non-9-1- 1 landlines within the Manassas PSAP worked. The Manassas PSAP requested that Verizon reroute 9-1-1 calls to the non-emergency lines, where they could get ANI but not ALI information. Verizon tried, but was unable to deliver 9-1- 1 calls to the overflow lines or the administrative lines. Normally if there is a problem with overflow lines, calls to Manassas will reroute to Prince William County. However, Prince William County’s lines were not restored until July 1. At that time, the Manassas PSAP’s calls were rerouted to Prince William County. By July 1 between 7:00-8:00 p.m., PSAP trunks were partially restored at the Manassas PSAP, and by 9:00 a.m. on July 2, the Manassas PSAP was fully operational. 5.4.4 Manassas Park Manassas Park has a population of about 15,000. The Manassas Park PSAP receives certain 9-1- 1 calls and others go to or are sent by Manassas Park to PSAPs in either Prince William County or Fairfax County. The Manassas Park PSAP receives about 900 wireline 9-1-1 calls per month. The Manassas Park PSAP told the Bureau that it realized that its 9-1-1 lines were not working on June 30, at 8:00 a.m., and attempted, unsuccessfully, to call Verizon for assistance. It was six hours before the Manassas Park PSAP finally reached Verizon to report the outage. According to PSAP officials, Verizon was unaware of the outage but agreed to try to reroute landline 9-1-1 calls to another PSAP. The Manassas Park PSAP informed Verizon that other Virginia PSAPs had problems as well. On July 1 at 4:00 p.m., the PSAP rerouted 9-1-1 calls to Prince William County following an e- mail from Verizon stating that 9-1-1 calls to the Manassas City and Manassas Park PSAPs were still not getting through. On the afternoon of July 2, the PSAP routed 9-1-1 calls back to Manassas Park’s administrative lines without ANI or ALI functionality. On July 3 at 12:15 p.m., Verizon fully restored 9-1-1 service to the Manassas Park PSAP. By July 6, Verizon confirmed that the 9-1-1 system was fully functional. 5.4.5 Arlington County Arlington County has a population of approximately 213,000. The design of the Arlington County PSAP was based on state-of-the-art concepts, including redundant access “from two different exchanges, for purposes of diverse routing, to ensure 9-1-1 service even if one access route were severed or otherwise failed.”98 Consequently, Arlington County is “quite concerned that Verizon’s commercial service failed despite these precautions and resulted in major telecom and network facility outages of extended duration.” 98 See Ex parte Letter from Joseph N. Pelton, Chair, Arlington County (Virginia) Information Technology Advisory Committee, to FCC Chairman Julius Genachowski (Sept. 5, 2012). 32 The Arlington County PSAP is served by redundant selective routers and has four ALI links to redundant ALI servers. During the derecho, both links to one ALI server and one of the links to the other failed. All three of these links passed through transport equipment in one Verizon central office. The Bureau believes that a diversity audit by Verizon of these ALI links may have revealed these diversity problems. The Arlington County PSAP told the Bureau that while it never lost service completely, it experienced intermittent problems and made multiple test calls to 9-1-1 that did not go through. PSAP officials stated that on June 30 around 5:00 a.m., three of the PSAP’s four ALI links and all of its administrative lines failed, although not necessarily simultaneously. Around 9:40 a.m., Verizon’s Fairfax central office had failed, and that failure took down half of the Arlington County PSAP’s 9-1-1 trunks. At the same time, the 9-1-1 trunks coming from a second selective router experienced intermittent problems but were always able to deliver some calls to the PSAP, which did not reroute calls to other jurisdictions. Although ten-digit administrative lines were back in service by July 1, the Arlington County PSAP could not receive 9-1-1 calls from Verizon Wireless customers for at least part of July 2. Verizon’s service to the PSAP finally became stable on July 3. 5.4.6 Other PSAPs in Virginia Supported by Verizon According to Verizon, in addition to the four PSAPs that experienced total 9-1-1 outages (i.e., Fairfax and Prince William Counties, and Manassas City and Manassas Park), and the Arlington PSAP, which had substantial difficulties but to which 9-1-1 service was not completely lost, 9-1- 1 service to twenty additional PSAPs partially failed. Most of these problems resulted from service disruptions in Verizon’s network, although some were caused by power or CPE issues at the PSAP. Twelve of these PSAPs suffered ALI outages, and eight more experienced various other issues such as failure or partial failure of 9-1-1 trunks, problems with CPE, and power loss. 5.4.7 Albemarle County The PSAP in Albemarle County, Virginia, which is served by CenturyLink, also was affected by the derecho. At least some 9-1-1 trunks to the Albemarle County PSAP experienced CAMA trunk throughput issues99 discussed in a 2011 report by the ATIS/NRSC. 100 Recommendations in the ATIS/NRSC Report were widely communicated to industry, and CenturyLink was a member of the industry team that created them. The report provided detailed recommendations to prevent CAMA trunks from mistakenly being removed from service during mass call events. CenturyLink adopted the recommendation to provide more active monitoring of the network to identify service issues, but it chose not to implement the recommendation to consider changing the trunk busy percentage parameters in its selective router serving the Albemarle County PSAP to prevent trunks from being removed from service. Based on the 99 See supra, note 85. 100 See supra, note 31 and accompanying text. 33 experience of other providers that followed both of these recommendations, the Bureau believes that fewer 9-1-1 trunks would have been taken out of service and more 9-1-1 calls would have been received by the Albemarle County PSAP had CenturyLink taken additional actions as included in the ATIS/NRSC report. CenturyLink argues that the Bureau’s conclusion does not account for the fact that changing the trunk busy percentage parameters increases the potential for other service-affecting issues to occur. 5.5 Effects on PSAPs in West Virginia Of the fifty PSAPs in West Virginia served by Frontier, twenty-seven experienced adverse effects from the derecho.101 Eleven West Virginia PSAPs lost 9-1-1 service completely for durations ranging from less than one hour to more than twelve hours, affecting approximately 400,000 residents. Other PSAPs lost ALI links or had to reroute 9-1-1 calls to administrative lines or PSAPs in other jurisdictions. In all, storm-related 9-1-1 issues potentially affected approximately 1.2 million West Virginia residents. Service to most of the affected PSAPs was fully restored by July 1, two days after the storm; however, two PSAPs had to reroute 9-1-1 calls for more than three days. According to Frontier, “[t]he lack of commercial power was the predominant cause of PSAP interruptions, both because it affected Frontier’s ability to provide communications services, and also because it affected the ability of the PSAPs to use their own equipment.”102 Problems with backup power to Frontier’s network after commercial power outages led to service impacts at twelve West Virginia PSAPs.103 Widespread commercial power outages forced 126 of Frontier’s 230 wireline facilities in the state to run on generator power.104 Frontier acknowledges, however, that “[g]enerator-use . . . was not a foolproof solution” and that “[i]n some instances back-up generators pre-placed in the Central Offices failed, which led to outages while Frontier replaced those generators.”105 Frontier’s other PSAPs were affected by physical damage to the network, including one PSAP that lost service when a truck struck an aerial fiber which was hanging low because of the storm. Another PSAP served by Frontier experienced CAMA trunk throughput problems addressed in the ATIS/NRSC report described above.106 Eight of the twenty-seven affected PSAPs lost service because of issues at the PSAP itself rather than in Frontier’s network. These issues included loss of power, lightning strikes, and failure of CPE. Although many PSAPs were able to operate on generator power and continue operations 101 See Frontier Comments at 3-4. 102 Id. at 4. 103 Id. at 5. 104 Id. at 2. 105 Id. at 4. 106 See supra, notes 31, 85, and accompanying text. 34 despite commercial power failures, backup generators at several PSAPs failed to start. As power was slowly restored, power surges created problems with CPE at some PSAPs, hindering Frontier’s ability to reroute traffic. Staff members’ personal cell phones were essential at several of the PSAPs. However, wireless service in several counties was down or was unreliable during the storm, adversely affecting the ability to seek 9-1-1 assistance in instances where customers relied on wireless phones to make emergency calls. Frontier was delayed in repairing transport failures by a lack of monitoring access to damaged sites. 107 Frontier’s NOC lost connectivity to a key monitoring circuit, which prevented it from communicating with the various networks in the area and remotely accessing selective routers to reroute calls to the PSAPs that were down due to storm damage. This circuit was the only way Frontier could directly monitor the status of network equipment in West Virginia from its NOC. Several days before the derecho, as part of a circuit rearrangement, Frontier had disconnected a redundant circuit that would have provided it an alternate way to monitor the network equipment in West Virginia. Apparently due to a miscommunication within Frontier, this circuit was not reconnected prior to the derecho and the only remaining circuit was lost during the storm, isolating the Frontier NOC from critical network assets in West Virginia. The NOC management system was fully restored two days after the storm. Due to the loss of monitoring access, Frontier walked its on-site field technicians through the restoration process via phone. Where possible, Frontier used a front-end processor, which collects alarms and allows for remote access into the offices, to access the offices in West Virginia via dial-up. This technique also allowed Frontier to make the 9-1-1 translation changes necessary to reroute the calls for the PSAPs. Though this process worked, it was slower and more difficult to implement than using the direct monitoring that had failed. Although most PSAPs we contacted have concerns about the overall 9-1-1 architecture in West Virginia, they generally were pleased with Frontier’s responsiveness during the derecho. Nevertheless, the majority of the PSAPs made the initial call to Frontier to report their outages before Frontier initiated contact with the PSAP. This was mainly due to the fact that Frontier lost the ability to monitor its network in West Virginia, preventing it from seeing network alarms and disrupting its ability to perform reroutes of the 9-1-1 calls to the PSAPs. 5.6 Effects on PSAPs Outside of Virginia and West Virginia Beyond its effects in Virginia and West Virginia, the derecho also disrupted communications to PSAPs in Ohio, New Jersey, Maryland, and Indiana, according to state regulators and our analysis of NORS data. 107 See Frontier Comments at 5 (noting that “disruptions on Frontier’s major transport facilities led to lost visibility and remote access to some parts of the network, which in turn resulted in communication impacts to some PSAPs”). 35 5.6.1 Ohio In Ohio, two PSAPs served by Frontier lost 9-1-1 service completely for durations ranging from three hours to twelve hours, affecting more than 13,000 residents. One PSAP was not receiving 9-1-1 calls because of a transport failure between the PSAP and a Frontier selective router. Another PSAP lost commercial power within the PSAP and went to battery backup immediately since it had no generator onsite. When the batteries were exhausted, the PSAP lost connectivity until commercial power was restored twelve hours later. Later that day, commercial power was again lost and, since the batteries had not had time to recharge, the PSAP did not receive 9-1-1 calls for another two hours. CenturyLink also reported that the Warren County, Ohio, PSAP had CAMA trunk throughput problems,108 resulting in diminished 9-1-1 call capacity. While CenturyLink adopted the recommendation to provide more active monitoring of the network to identify service issues, it chose not to implement the recommendation to change the trunk busy percentage parameters in the selective router serving the Warren County PSAP to prevent trunks from being removed from service – had it done so, it may have reduced the impact of that problem without causing other service-affecting issues. Four Ohio PSAPs served by AT&T lost ALI for up to four days after commercial power failures disabled equipment in the network. 5.6.2 Maryland According to the Maryland Emergency Number Systems Board, several Maryland PSAPs experienced communications issues during and after the derecho, although none appear to have impacted the public’s ability to reach 9-1-1.109 In Garrett County, for example, U.S. Cellular mistakenly routed a cellular trunk to a wireline 9-1-1 trunk, although this had no immediate impact on ALI or the public’s ability to reach 9-1-1.110 In Caroline County, wireless 9- 1-1 calls were temporarily rerouted to another PSAP.111 In Montgomery County, Verizon implemented a “mass call mitigation plan” that prevented CAMA trunk issues112 from impacting 9-1-1 service during the derecho. The PSAP later discovered, however, that Verizon failed to return four 9-1-1 trunks to service for “several days after the storm,”113 a condition that increased the probability that 9-1-1 calls would be lost. Verizon has since updated its 108 See supra, notes 31, 85, and accompanying text. 109 See Maryland Emergency Number Systems Board, Derecho Storm – Maryland Interim Report (Oct. 23, 2012) at 1 (“Maryland ENSB Report”). 110 Id. 111 Id. 112 See supra, note 85 and accompanying text. 113 Maryland ENSB Report at 2. 36 procedures to ensure that all trunks are restored following mass call events before closing the associated trouble ticket.114 5.6.3 Other States Some PSAPs in Indiana and New Jersey retained at least some 9-1-1 functionality but lost ALI links or had to reroute calls through other PSAPs. Most of these PSAPs cited commercial power outages as the cause of their problems, although some ALI equipment also failed. Partial disruptions in 9-1-1 service ranged from approximately one hour to more than four days. For example, one PSAP served by Frontier in Indiana lost ALI for almost an hour and a half, potentially affecting more than 73,000 people. In New Jersey, five PSAPs served by Verizon lost ALI for periods ranging from two hours to more than two days, while a sixth PSAP lost commercial power and rerouted calls to another PSAP. 6. The Derecho’s Effects on Wireless Networks Commercial wireless networks generally withstood the derecho much better than their wireline counterparts. Cell site outage rates during and after the storm varied by provider and jurisdiction, but were modest both in number and duration in comparison to other superstorms or hurricanes. That being said, wireless customers reported service problems and lost calls immediately after the storm,115 and many likely suffered from cascading effects of wireline service outages. Yet, compared with our observations in other events of similar magnitude, impacts on service were not nearly as pervasive as might have been expected, and most outages were rectified relatively quickly. Issues arose throughout the affected region as cell towers began to lose all sources of power (commercial, battery, generator) in the hours and days after the storm had dissipated. Generally, cell cites operate on batteries charged either by commercial power or, in some cases when commercial power fails, a backup generator. When commercial power fails and no generator is available, cell site operation depends on the life of the site’s batteries as a backup power source. Depending on their battery and generator configurations, some wireless providers reported a majority of cell site outages caused by power failures at the site, while others reported more outages due to transport failures between cell sites and the rest of the network. In general, increased deployment of generators at cell sites reduces the probability of outages due to power loss. The major wireless providers that serve the area most affected by the storm are the nation’s four largest: Verizon Wireless, AT&T, Sprint, and T-Mobile. The Bureau sought information 114 Id. 115 See Sullivan, Patricia, After Storm, 911, Phone Service Remains Spotty, WASH. POST, July 2, 2012 (reporting that a Washington, D.C., resident’s cell phone “dropped half a dozen calls” and could not receive text messages for several hours). 37 from each of these providers on the performance of their networks during and immediately after the storm, including information on battery backup systems implemented at cell sites. Our analysis of the information we received shows that outages were most extensive on June 30, when roughly 10.8 percent of cell sites were out of service in the affected reporting area.116 Hurricane Isaac, which hit the gulf coast in August, resulted in 21.8 percent of area cell sites out of service, and during Hurricane Gustav in 2008 that figure was 28.9 percent—with outages comparable to the worst of the derecho still lingering even four days after restoration efforts had begun. By contrast, only roughly 2 percent of cell sites in the derecho’s affected area remained out of service as of July 3. The comparatively favorable performance of wireless networks during the derecho as compared to other major storms, and the severe impacts of the wireline network issues on the basic functionality of 9-1-1 service, has led the Bureau to focus most (but not all) of its recommendations in this report on wireline network enhancements and other efforts to improve 9-1-1 service reliability. But the Commission’s examination of the resiliency of wireless networks in the face of major storms will continue through field hearings planned for early 2013117 and a series of other activities planned by the Bureau. As the Bureau collects information through these mechanisms, the Bureau may, as appropriate, recommend additional areas for the Commission to consider. 6.1 High-Level Snapshot of Wireless Network Effects As previously noted, the percentage of cell sites unable to provide service as a result of the derecho reached 10.8 percent on June 30 but declined steadily thereafter. The most heavily affected jurisdictions were portions of West Virginia and Virginia, the jurisdictions that also experienced the most extensive wireline and 9-1-1 service disruptions. 116 See supra, Section 2. The Bureau collected data from an area including six counties in West Virginia, seven jurisdictions in Virginia, eight jurisdictions in Maryland, and the District of Columbia. 117 See Superstorm Sandy Field Hearings PN. 38 0% 2% 4% 6% 8% 10% 12% 30-Jun 1-Jul 2-Jul 3-Jul 4-Jul Percent of All Cell Sites Out by Date Figure 4: Percent Cell Sites Out by Date Loss of power and failure of transport (i.e., backhaul) facilities each accounted for approximately one-half of the overall site outages in the first days; physical damage to cell sites was a minimal factor. Figure 5: Reasons Cell Sites Out of Service118 6.2 Cell Site Battery Life Batteries provide a critical source of backup power for cell sites. In the event that commercial power is unavailable, providers rely on other forms of power (generators and batteries) to keep 118 This graph reflects the most common causes of cell site outages in the area affected by the derecho. In some cases, cell sites experienced unspecified problems or more than one problem simultaneously (e.g. power and transport failures). Thus, some daily percentages may be greater or less than 100 percent. 39 their systems operating. We note that most cell towers have backup battery power, ranging from several hours to a few days. It appears that most of the major wireless providers do not retain records of or analyze the actual lifespan of their cell site batteries when they support the site during a loss of commercial or generator power, and thus were unable to provide the Bureau with usable data on battery life during the derecho. We understand that actual battery life depends on a number of factors, including the specifications of the battery, its age, whether it has been used previously without commercial or generator power, the extent of usage of the site it supports, and other factors. This collection and review of actual performance information as a tool in assessing how long batteries will likely last during a storm when relied on as the sole source of power for a cell site would be useful to know. But providers appear to lack this relevant data and cannot statistically analyze the information to estimate battery life and guide their maintenance strategies. It seems that providers should collect and analyze this information. The Bureau recommends further research, both within the Commission and by wireless providers, to better understand the many issues related to the lifespan of these vital batteries. 7. Recommendations For many years, the Commission has worked with industry stakeholders to promote the implementation of industry-led best practices to ensure the availability and reliability of 9-1-1 communications. As a result of our inquiry following the derecho, it became apparent that service providers’ implementation of best practices was neither as diligent nor consistent as needed, and the result was costly for 9-1-1 service. Consequently, in this section, the Bureau recommends consideration of specific action by the Commission to supplement the current best-practice approach in key areas. We also emphasize the need for providers to implement established best practices, and suggest how the Bureau can promote improved engineering through additional best practices to address apparent shortfalls. Finally, we encourage the continued implementation of NG9-1-1, which could lessen a future storm’s impact on emergency communications. 7.1 Recommendations for Possible Commission Action The Bureau recommends that the Commission consider action to ensure improved 9-1-1 circuit auditing, central office backup power, and diversity of monitor and control links. Each of these areas has already been addressed in vital CSRIC best practices, but it appears that service providers have not consistently or fully implemented those practices, and problems have resulted.119 Consequently, the areas we suggest to the Commission include: 119 In addition to information learned through this inquiry, the Bureau had already issued public notices that emphasized the need for compliance after it received information suggesting that providers may not have been following some relevant best practices. See, e.g., FCC’s Public Safety and Homeland Security Bureau Reminds Telecommunications Service Providers of Importance of Implementing Established 9-1-1 and Enhanced 9-1-1 Services Best Practices, Public Notice, DA 12-891, 27 FCC Rcd 6085 (PSHSB rel. June 6, 2012) (“June 2012 Best 40 · 9-1-1 Circuit Auditing: Auditing should lead to fewer 9-1-1 outages and enhance the reliability of 9-1-1 communications. If providers do not regularly audit the physical routes of 9-1-1 circuits and ALI links, they will be ill-equipped to verify diversity and understand, avoid, or address instances where a single failure causes loss of all E9-1-1 circuits or all ALI links for a PSAP. The derecho and other experiences in recent years120 call into question the extent to which providers are implementing these important audits. We believe the benefits of this recommendation will likely outweigh the costs, given the large numbers of customers that can be served successfully in emergencies by circuits that are diverse, and the harms that could result from avoidable failures. The burden would be modest because this obligation would apply only to a limited number of high-priority circuits rather than to the entire commercial wireline network. Nor would auditing necessarily encompass a requirement that providers diversify all circuits in areas that are particularly expensive. · Central Office Backup Power: The derecho experience makes clear how important it is for the provision of emergency service and reliable and resilient communications to ensure that providers maintain robust, resilient backup power in central offices, supported by appropriate testing, maintenance, and records retention. As the Commission has recognized previously,121 reliable central office backup power is essential for communications during large-scale emergencies. Failure of central office backup power during a commercial power outage can disable wireline communications for a community, including emergency communications. It is likely that the benefits of this recommendation will outweigh the costs, given the significant public-safety concerns and the limited number of central offices; moreover, providers most likely can comply affordably given that much of the needed infrastructure may already be in place. · Diversity of Monitor and Control Links: The derecho makes clear how vital it is for a provider’s network operations center to have diverse monitor and control links and capabilities throughout the network to ensure network reliability, resiliency, and rapid recovery. We believe the benefits of this recommendation will likely outweigh the costs due to the relatively small set of links involved in network monitoring and control, and the potentially serious impacts of a loss of these links. Some points at which these monitoring networks gain access to the equipment they monitor can become single points of failure, Practices Public Notice”) (reminding telecommunications service providers of the “importance of providing diversity and redundancy in the provisioning of 9-1-1/E9-1-1 services”). See also FCC’s Public Safety and Homeland Security Bureau Reminds Telecommunications Service Providers of Importance of Implementing Advisory Committee 9-1-1 and Enhanced 9-1-1 Services Best Practices, Public Notice, DA 10-494, 25 FCC Rcd 2805 (PSHSB rel. March 24, 2010) (“March 2010 Best Practices Public Notice”) (noting that “the Bureau has observed a significant number of 911/E911 service outages caused by a lack of diversity that could have been avoided at little expense to the service provider”). 120 See June 2012 Best Practices Public Notice; March 2010 Best Practices Public Notice. 121 See, e.g., In the Matter of Reliability and Continuity of Communications Networks, Including Broadband Technologies, et al., Notice of Inquiry, PS Docket No. 11-60, et al., 26 FCC Rcd 5614 (2011). 41 but many can be fortified at modest cost. Only modest, but important, changes from existing practices may be required. · Revised PSAP Notification Rule: Section 4.9 of the Commission’s rules122 requires that providers suffering an outage of facilities that potentially affect a PSAP must notify the PSAP as soon as possible. The Commission should consider stating what is expected of providers with more specificity, including, for example, methods of notification and a minimum level of detail in the information provided to PSAPs. Such clarification may improve compliance and result in greater situational awareness for PSAPs. 7.2 Diligent Implementation of Best Practices Over the years, the FCC has worked with industry stakeholders to promote development and implementation of best practices, and we expect this process to continue. The CSRIC best practices have been developed on the basis of widespread industry participation. This creates a strong presumption that providers would be inclined to implement them—particularly those recommendations that are deemed most vital. Still, many providers failed to implement crucial best practices throughout the area affected by the derecho, which includes the densely populated National Capital Region. We call on providers again to review and implement CSRIC and other best practices and emphasize the importance of doing so. The proper implementation of CSRIC best practices could have prevented many of the derecho’s most serious effects on communications networks, including 9-1-1 service outages. The Bureau’s inquiry revealed multiple failures to implement CSRIC best practices including: · Network operators, service providers and property managers with buildings serviced by more than one emergency generator, should design, install and maintain each generator as a standalone unit that is not dependent on the operation of another generator for proper functioning, including fuel supply path. (CSRIC best practice 8-7-5281) · Network operators, service providers and property managers should exercise power generators on a routine schedule in accordance with manufacturer’s specifications. For example, a monthly one-hour engine run on load, and a five-hour annual run. (CSRIC best practice 8-7-0662) · Network operators, service providers and property managers should design standby generator systems for fully automatic operation and for ease of manual operation, when required. (CSRIC best practice 8-7-0657) · Network operators, service providers, equipment suppliers and property managers should ensure that all critical infrastructure facilities, including the security equipment, 122 See 47 C.F.R. § 4.9. 42 devices and appliances protecting it, are supported by backup power systems (e.g., batteries, generators, fuel cells). (CSRIC best practice 8-7-5058) · Network operators should periodically audit the physical and logical diversity called for by network design and take appropriate measures as needed. (CSRIC best practice 8-7- 0532) 7.2.1 Actions by CSRIC The Bureau recommends charging CSRIC with the development and modification of solutions with respect to certain issues, such as low voltage effects, that are addressed in this report but are not covered by the Bureau’s recommendations for consideration of Commission action. The Bureau also recommends that CSRIC consider adopting additional best practices consistent with sound engineering practices identified in this report. Finally, the Bureau may discuss with CSRIC ways to increase the adoption and effective implementation of best practices. As a result of the information already available in the aftermath of the derecho, the Bureau recommends asking CSRIC to address these issues at the earliest opportunity. 7.3 Recommendations for PSAP Action Our meetings with PSAPs also produced constructive recommendations to ensure 9-1-1 remains in service during future storms. Although PSAPs were not responsible for the widespread breakdown of emergency communications during the derecho,123 the Bureau recommends that to the extent not already done, they take this opportunity to consider several proactive improvements. Recommendations: · PSAPs should have several different means of communication available, such as mobile phones from different providers, to provide as many alternate means as possible of communicating during an emergency. Some PSAPs may already follow this recommendation, and others that do not could implement it immediately. · As many do already, PSAPs should have multiple means of backup power, such as multiple generators, and run periodic tests under actual load. Where not already in place, PSAPs should consider installing a simple disconnect switch to permit taking out one generator and installing another. 123 See MWCOG Report at 19 (finding that “[b]y all indications during this event, the systems and processes in place by the public safety agencies in the COG region, operated as designed, and the 9-1-1 centers were fully prepared to provide service to the public,” but encouraging PSAPs “to perform a full assessment of their current 9-1-1 systems and operations to assure reliability and continuity of 9-1-1 service”). 43 · Several PSAPs left reroutes in place for longer than necessary after the derecho.124 In some cases, after the E9-1-1 trunks into a PSAP fail, the PSAP will attempt to activate a reroute, only to discover that the backup route is also out of service. In such instances it is impossible to predict whether the provider will restore the original route or the backup route first. When the provider restores either route, it should notify the PSAP; however, in some cases the provider may not notify the PSAP and may not even realize that it has restored a route used for emergency calls. In other instances the provider may believe that it has restored a route even though problems remain on that route. Therefore, even if the PSAP is not notified, after a few hours, it should consider deactivating the reroute, then reactivate it a few hours later, and repeat until service is restored or until it is notified about a working route by the provider. This recommendation could be implemented immediately whenever reroutes are necessary. 7.3.1 PSAP Voluntary Reporting to the FCC Under NORS and DIRS, the Commission currently is informed by network providers of outages meeting specified criteria that adversely impact PSAPs. The Bureau also has frequently engaged in outreach to PSAPs in disasters, particularly PSAPs identified as having been adversely impacted, to hear directly about their status. In some instances, including after the derecho, PSAPS have contacted the Bureau directly to advise us of issues that they are facing. There currently is no specific channel or template for PSAPs who may voluntarily wish to report information to the Commission, and thus provide a direct source of information on their status. We are consulting with public safety organizations, and plan to work on a voluntary basis with PSAPs to establish a standardized format and channel that they can use, if they choose, to report network concerns directly to the Commission. 7.4 NG9-1-1 Implementation NG9-1-1 relies on IP-based architecture rather than the PSTN-based architecture of legacy 9-1-1 to provide an expanded array of emergency communications services that encompasses both the core functionalities of legacy 9-1-1 and additional functionalities that take advantage of the enhanced capabilities of IP-based devices and networks.125 While this report focuses on recommendations for improving the reliability of existing E9-1-1 systems, we note that NG 9-1- 1 architecture offers certain advantages over legacy technologies, including greater redundancy and reliability,126 the ability to provide more useful information for first responders, wider 124 See Verizon Public Report at 6 (noting that certain PSAPs “would have been better off [not rerouting calls], but without the appropriate information, they were unable to make that determination at the time”). 125 For an overview of NG9-1-1 architecture, see Framework for Next Generation 911 Deployment, PS Docket No. 10-255, Notice of Inquiry, 25 FCC Rcd 17869, 17877-80, ¶¶ 18-26 (2010) (“NG9-1-1 NOI”). 126 As described in the NG9-1-1 NOI, NG9-1-1’s use of IP-based architecture provides far more routing options than legacy circuit-switched architecture because it is not constrained by the location of the caller or the nearest PSAP to the caller. For example, in circuit-switched networks, selective routers must be relatively close to the PSAPs they serve, whereas in NG9-1-1, 9-1-1 traffic can be easily rerouted to servers and locations outside the affected 44 public accessibility (including to those with disabilities), and enhanced capabilities for sharing data and resources among emergency responders.127 Had these NG9-1-1 architectures and capabilities been in place in the affected areas, they likely could have significantly lessened the derecho’s impact on emergency communications. Below, we summarize some of the most notable potential benefits of implementing NG9-1-1. The Commission should encourage jurisdictions that implement NG9-1-1 to take advantage of these and other capabilities. · NG9-1-1 routers are being designed to handle fail-over automatically, which would eliminate the need for manually activating and sometimes manually de-activating reroutes. While NG9-1-1 has yet to be widely implemented, its ultimate deployment would likely have helped to avoid or mitigate some derecho-related issues at PSAPs that lost connectivity. · With E9-1-1, the PSAP generally needs to rely on E9-1-1 trunks and ALI links provided by the ILEC. With NG9-1-1, the PSAP can obtain and combine links from a variety of providers. Typically, at least five diverse IP access link options are available at most PSAP locations: ILEC services; DOCSIS-based128 services provided by the local multiple systems operator; fixed wireless or line-of-sight laser to large institutions, such as a university, hospital, other government agency or multi-tenant office building, located within about a two-mile radius; one or more 4G (LTE) offerings; and satellite. Large PSAPs can also add their own dark fiber providers for additional physical diversity. Thus, it is generally possible for PSAPs to improve access reliability. We encourage PSAPs to take advantage of increased access diversity opportunities when they implement NG9- 1-1; we note that we also have seen NG9-1-1 designs where PSAPs fail to do this and instead depend on just one access link from the ILEC rather than diverse links from diverse providers. · Calls on NG9-1-1 systems can be rerouted, both partially and completely, to any number of backup PSAPs, including out-of-area PSAPs. · NG9-1-1 service logic (e.g., call routing, databases) can be housed in professionally- managed data centers which may be more reliable than ILEC central offices and/or data centers. area, providing more resiliency and redundancy in disaster situations. See NG9-1-1 NOI, 25 FCC Rcd at 17880-81, ¶¶ 26, 29. 127 See id., 25 FCC Rcd at 17878-79, ¶¶ 21-25. 128 Data Over Cable Service Interface Specification (“DOCSIS”) is an international telecommunications standard that permits the addition of high-speed data transfer to an existing cable TV system. It is employed by many cable television operators to provide Internet access over existing hybrid fiber-coaxial infrastructure. See DOCSIS, Wikipedia, http://en.wikipedia.org/wiki/DOCSIS. 45 · With NG9-1-1, wireless, VoIP and landline providers can connect their 9-1-1 systems to several networks, at multiple locations, rather than all going through the ILEC selective router as with the current system. This would improve reliability for customers and PSAPs that today depend on only one selective router or one route between the PSAP and both routers. · With NG9-1-1, network monitoring becomes easier and less subject to single points of failure. For example, servers can “ping” other servers and “traceroute” can identify out- of-order network segments. 8. Conclusion The June 2012 derecho that affected so much of the central and Mid-Atlantic regions of the United States was unusual in that it hit with very little warning, leaving both communications providers and the public little time to prepare. As such, the storm tested the readiness and day-to-day reliability of a large portion of the Nation’s critical communications infrastructure. The effects of the storm revealed considerable flaws in system design, personnel management, policies, and procedures of the primary providers of the 9-1-1 networks in the affected region. The storm also revealed that the major wireline providers serving the affected region had not fully implemented best practices and industry-developed solutions relating to backup power, 9- 1-1 circuit diversity, and 9-1-1 trunk design – issues with a direct link to the loss of 9-1-1 service after the Derecho. Accordingly, the Bureau recommends that the Commission, while continuing to promote use of vital best practices, consider taking additional action to ensure the reliability of the communications infrastructure, especially with respect to 9-1-1 service. 46 Appendix A: Preliminary Findings from Virginia State Corporation Commission The Virginia State Corporation Commission Staff Report of Preliminary Findings released September 14, 2012, announced numerous findings consistent with this report and helpful to the Bureau in its inquiry. Below, we include selected findings by the SCC regarding 9-1-1 failures in Virginia:129 · Verizon was the only LEC in Virginia that experienced significant 9-1-1 service problems following the June 29 Derecho. · The total loss of 9-1-1 capabilities to the Prince William County, Fairfax County, Manassas, and Manassas Park PSAPs was an extremely serious event and it is very fortunate that there were not catastrophic consequences to any citizens in Northern Virginia. · The Prince William County, Fairfax County, Manassas, and Manassas Park PSAPs were fully prepared to respond to the June 29 Derecho and were not responsible for the 9-1-1 - service failures. · The cause of the 9-1-1 service outages in Northern Virginia from the June 29 Derecho began with the failure of two backup generators that did not start automatically when commercial power was lost. Specifically, a generator in each of Verizon’s Arlington and Fairfax central offices did not start. · A review of the maintenance logs for the backup generators in the Arlington and Fairfax central offices shows a lack of compliance with Verizon’s maintenance and testing procedures. · The generator that failed to start in the Arlington office did not start during routine testing conducted two days before the June 29 Derecho. The maintenance log indicated that work to the generator was needed. Verizon confirmed with the FCC that this information is correct; the generator did fail the test two days before the storm. 129 See Virginia SCC Report at 7-10. 47 · A total of nine generators (out of 136) failed to operate properly during the commercial power outages from the June 29 Derecho in Verizon’s Mid-Atlantic region. · The scope of 9-1-1 problems went well beyond the calling areas served by the Arlington and Fairfax central offices. · ALI is an important component of 9-1-1 service. The lack of delivery of ALI to many PSAPs could have put citizens across Virginia at greater risk. · The initial battery on discharge (“BOD”) alarm 13 for the Fairfax central office was sent to the National Power Network Operation Center (“NPNOC”) at 10:29 PM on June 29, 2012 when the one generator failed to start. Under Verizon’s procedures, any BOD alarm should have been seen as a critical power alarm requiring immediate action. However, according to Verizon, this alarm was incorrectly categorized as a major power alarm condition when sent to the NPNOC. · The Regional Network Center (“RNC”) received a repair ticket (identified as a major alarm as mentioned above) from the NPNOC for the Fairfax central office at 10:32 PM on June 29, 2012. At that time, and on the morning of June 30, the RNC was only working critical alarms and a power technician was not dispatched to the office until after the backup batteries had drained completely. · The telemetry system (alarm monitoring) in the Arlington central office was only supported by the Uninterruptible Power Supply (“UPS”) (i.e., battery power source) which was designed with a 30 minute reserve. The UPS failed at 11:23 PM on June 29, 2012. · The very early failure of the telemetry system resulted in Verizon being unable to receive further alarms and remotely access its switches to monitor, test, or reroute traffic to 34 sites in the area. Verizon’s inability to monitor its facilities and network in the Northern Virginia area significantly impacted the restoral process from the June 29 Derecho. · The delay in identifying and repairing the critical conditions in the Fairfax and Arlington offices resulted in unnecessary 48 damage to Verizon’s network and extended the 9-1-1 problems and outages. There were hundreds of damaged or impacted pieces of equipment in those two offices (i.e., circuit cards and digital cross connects). · The loss of the transport systems in the Arlington and Fairfax central offices was profound and collectively resulted in 17 switches becoming SS7 isolated, and therefore incapable of completing (originating or terminating) any interoffice local, long distance, or 9-1-1 emergency calls. The loss of those transport systems was also responsible for the loss of ALI to the PSAPs. 9 Verizon did not activate its emergency Area Control Center located in Maryland until 10 AM on June 30, 2012. · Verizon did not always provide sufficient, accurate, or timely communications to the affected PSAPs regarding its 9-1-1 problems and outages following the June 29 Derecho. · Some battery reserves supporting major equipment systems in the Arlington (other than telemetry) and Fairfax central offices were depleted within approximately 3 to 5 hours. In addition, some equipment in those offices failed even before the batteries exhausted because of sensitivity to low voltage conditions. · In many instances, Verizon’s workforce was not timely dispatched, prepared, or trained to recognize or correct the critical conditions from the June 29 Derecho. · Verizon is making progress in implementing its corrective action plan, however, at this time, not all items have been fully defined or timelines determined. 49 Appendix B: Recommendations of the Metropolitan Washington Council of Governments In July 2012, 9-1-1 directors in the MWCOG region (including City of Alexandria, Arlington County, Fairfax County, Prince William County and Stafford County) agreed on five immediate recommendations to Verizon based on their experience during the derecho. All of these recommendations, which are listed below, have been accepted by Verizon and are in “various stages of completion,” according to MWCOG.130 1. Verizon adopt, embrace, instruct, train and utilize the National Incident Management System (NIMS) model, to address and mitigate any and all significant events/incidents impacting providing 9-1-1 service to the aforementioned jurisdictions. 2. Verizon obtain and utilize a Reverse 9-1-1® type system to notify, via voice and text, those persons identified by the above jurisdictions, as soon it is known or suspected by Verizon that there is or may be an interruption of 9-1-1 service to any or all of the above jurisdictions. The immediately transmitted voice and text message should contain, in plain language, the nature of the problem, current or potential impact of the problem, what Verizon is doing to address the problem, recommend actions the impacted 9-1-1 center(s) should take and other appropriate information and include the name of the sender and the telephone number (business and mobile) at which the sender can be reached, and their email address. 3. Verizon work with the jurisdictions to develop, by no later than December 31, 2012, a method to semi-annually, conduct a drill/exercise with each jurisdiction on actions to be taken by Verizon and the impacted jurisdiction(s) in the event of a potential or actual 9-1-1 outage. 4. Verizon provide the above jurisdictions, during the first week of each month, a current contact list; beginning with the name and contact information (email, business telephone number, business mobile telephone number and any other appropriate information) for the Verizon account manager assigned to the jurisdiction and four immediately escalating Verizon personnel up to a Vice President level. 130 See MWCOG Report at 18. 50 5. Verizon, if/when requested by any of the above jurisdictions, have a Verizon representative with authority to act/react; respond to and to be present at the jurisdictions Emergency Operations Center (EOC), to provide current accurate information concerning 9-1-1 service and outages, other telephone service, etc. and liaison with other parties staffing the EOC, when the EOC is activated. On November 14, 2012, MWCOG released a preliminary report of its own findings and recommendations regarding 9-1-1 failures during the derecho. In addition to the initial recommendations noted above, the MWCOG Report calls for the following actions:131 1. Federal and State Regulatory Authorities should strongly encourage Verizon and other 9-1-1service providers to perform a comprehensive independent audit of the entire infrastructure, processes and procedures that support 9-1-1 service and related systems, to assure the reliability and continuity of 9-1-1 service under any circumstance. Based on the results of these audits, comprehensive plans and strategies should be developed to immediately resolve any findings. The results of these audits and resolution plans should be made available to the 9-1-1stakeholders. 2. It is highly recommended, that Verizon and other 9-1-1 service providers should provide subject matter expertise and make recommendations to the 9-1-1 centers and their stakeholders to assure reliability and continuity of 9-1-1 service. This should include, but not be limited to, network redundancy, 9-1-1 center equipment and systems, and best practices and procedures. 3. It is critical, that Verizon review their communications and public notification plans with each 9-1-1 center’s communicators and/or Public Information Officers (PIO) regarding the dissemination of emergency messages (using both traditional and social media) to the public during 9-1-1 outages and update as needed. This process should also explore alternative methods to communicate with the public in case of widespread power and telephone outages. Verizon should coordinate with National Capital Region 131 See id. at 17-19. 51 communicators/PIOs during any future outages, to inform and keep the public updated, and amplify the 9-1-1 center-specific public messages and information. 4. Verizon should keep the public informed of any service issues, the extent of the outage and time for resolution. 5. Federal and State Regulatory Authorities should evaluate the steps and actions of Verizon, related to this event, and the above audits, to ensure Verizon has adequately resolved all issues and continues to improve their processes and infrastructure to ensure reliability and continuity of 9-1-1 service. 6. COG members and localities should work with their State and Federal regulatory authorities and Legislators, as needed, to assure, through proper oversight, best practices and procedures by establishing service level agreements to ensure reliability and continuity of 9-1-1 service. 7. It is recommended that there be further investigation by State and Federal Regulators, on whether the 9-1-1 supporting infrastructure of other telecommunications providers other than Verizon, was also impacted by the Derecho. As an example, AT&T Wireless in their comments to FCC PS Docket No. 11-60, indicated there was some impact to their infrastructure during and after the Derecho. The MWCOG Report noted that “[b]y all indications during this event, the systems and processes in place by the public safety agencies in the COG region, operated as designed, and the 9-1-1 centers were fully prepared to provide service to the public.”132 It did, however, make the following recommendations to state and local governments:133 1. State and local 9-1-1 authorities should be encouraged to perform a full assessment of their current 9-1-1 systems and operations to assure reliability and continuity of 9-1-1 service. 2. It is recommended that State and Federal regulatory authorities, review current laws and regulations related to 9-1-1 service, to assure it places emphasis and favors public 132 Id. at 19. 133 Id. 52 safety versus the 9-1-1 service providers or telecommunications providers. The interest of the public and public safety should come first over the interest of commercial providers. 3. State and local 9-1-1 authorities should work with their Legislators to ensure that the funding required to support the current 9-1-1 services and future Next Generation 9-1-1 are adequate and available, and that the fees and funds collected from the citizens of their States for 9-1-1 services are dedicated and used solely for the purpose as intended for the implementation, operation and maintenance of 9-1-1 emergency telephone services as required by the Enhance911 Act of 2004 (Pub. Law 108-494). In addition, the fees collected should be fairly and equally distributed to the 9-1-1 authorities. 53 Appendix C: Recommendations from PSAPs During the preparation of this report, the Bureau interviewed personnel from twenty-eight PSAPs and considered public comments filed by several jurisdictions affected by the derecho. Several recurring themes emerged from these conversations, which are consistent with the Bureau’s conclusions and recommendations in this report. Although PSAPs often suggested actions by communications providers to improve the reliability of the 9-1-1 network, they also identified areas for proactive action by PSAPs themselves. Below is a summary of the recommendations we received from PSAPs: · Multiple Virginia PSAPs commented that Verizon needs to improve the circuit diversity and redundancy in its network to eliminate single points of failure. They also stated that Verizon should have more reliable backup power, including central-office backup generators that are properly maintained and tested regularly. Some PSAPs stated that Verizon should consider additional backup generators at critical points in the network, and that employees must be trained to manually restore critical equipment when automated systems fail. Fairfax County in particular asserted that Verizon could have avoided failures during the derecho by implementing CSRIC best practices and should perform an audit of its own operating procedures in relation to those best practices. 134 · West Virginia PSAPs also called for improved redundancy and circuit diversity in Frontier’s network. Some PSAPs asserted that Frontier should have backup generators at all points in the network affecting 9-1-1 service, or at least more portable generators available for remote locations. · Several PSAPs recommended that PSAPs themselves should have redundant backup generators with enough fuel to last through a prolonged failure of commercial power. PSAPs in some locations recommended that staff members should maintain at least one traditional copper telephone connection and cell phones from a variety of wireless providers to ensure that some devices remain operational if other networks fail. · PSAPs in both states requested better notification and communication from Verizon and Frontier when problems in their networks have a current or potential effect on 9-1-1 service. PSAPs stated that they should be informed directly of outages, including the specific areas affected, rather than having to infer problems from reroutes or changes in call patterns. They also suggested that providers should use a variety of communications platforms (e.g. telephone, text, e-mail) to ensure that PSAP personnel actually receive notifications. Some PSAPs also requested help from providers when notifying the public of 9-1-1 outages, including alternative numbers they can call for assistance until 9-1-1- is restored. 134 See Fairfax County Comments at 22-23.