June 15, 2020 T-Mobile Network Outage Report PS Docket No. 20-183 A Report of the Public Safety and Homeland Security Bureau Federal Communications Commission October 22, 2020 TABLE OF CONTENTS Para. I. INTRODUCTION 1 II. BACKGROUND 3 III. INCIDENT AND RESPONSE 4 A. Architecture of T-Mobile’s Network 4 B. Root Cause and Event Summary 11 C. Mitigation and Restoration Efforts 19 IV. ANALYSIS 26 A. Impact on Calls over T-Mobile’s Network 26 B. Impact on Calls Initiated Outside T-Mobile’s Network 29 C. Impact on 911 Calls Handled by T-Mobile 35 D. Public Impact 38 V. CORRECTIVE ACTIONS BY T-MOBILE 43 VI. NEXT STEPS 45 APPENDIX A: Outage Timeline I. INTRODUCTION 1. Midday on June 15, 2020, T-Mobile experienced an outage on its wireless networks that lasted over twelve hours, disrupting calling and texting services nationwide, including 911 service, as well as access to data service in some areas. The Public Safety and Homeland Security Bureau (PSHSB) estimates based on data provided by T-Mobile and other affected service providers that at least 41% of all calls that attempted to use T-Mobile’s network during the outage failed, including at least 23,621 failed calls to 911. The outage was initially caused by an equipment failure and then exacerbated by a network routing misconfiguration that occurred when T-Mobile introduced a new router into its network. In addition, the outage was magnified by a software flaw in T-Mobile’s network that had been latent for months and interfered with customers’ ability to initiate or receive voice calls during the outage. 2. The Bureau investigated this incident, its effects, and the response. As part of its investigation, Bureau staff reviewed and analyzed outage reports, interviewed T-Mobile personnel, submitted written questions to affected service providers, and issued a Public Notice seeking comment on the effects of the outage on public safety and consumers, which yielded 190 responses. Public Safety and Homeland Security Bureau Seeks Comment on the Effects of June 15, 2020 T-Mobile Outage on Public Safety Entities, Government Entities, and Consumers, PS Docket No. 20-183, Public Notice, 35 FCC Rcd 6462 (PSHSB June 23, 2020). This report presents the Bureau’s findings and recommendations. This outage provides the Federal Communications Commission (Commission) and stakeholders with the opportunity to learn valuable lessons about network reliability and the implementation of industry-accepted best practices. For example, the outage demonstrates the importance of network operators periodically auditing the diversity of their networks and taking appropriate measures to ensure resilience as needed. II. BACKGROUND 3. The Commission stays abreast of disruptions to the Nation’s communications infrastructure through outage reports filed by communications providers in the Network Outage Reporting System (NORS) in the wake of major disruptions to their networks. NORS is the Commission’s web-based filing system through which communications providers covered by the part 4 outage reporting rules must submit reports to the Commission. These reports are presumed confidential to protect sensitive and proprietary information about communications networks. See 47 CFR § 4.2. As noted below, however, this report includes material that is not subject to such confidential treatment. As part of this reporting framework, Commission rules require wireless service providers to report to the Commission “significant degradation[s] in the ability of an end user to establish and maintain a channel of communications as a result of failure or degradation in the performance of a communications provider’s network.” 47 CFR 4.5(a); New Part 4 of the Commission’s Rules Concerning Disruptions to Communications, ET Docket No. 04-35, Report and Order and Further Notice of Proposed Rulemaking, 19 FCC Rcd 16830, 16895-902, paras. 127-143 (2004). An outage that occurs on facilities that a wireless service provider owns, operates, leases, or otherwise uses is reportable when it is at least 30 minutes in duration and, inter alia, potentially affects at least 900,000 user minutes of telephony and associated data service or potentially affects a 911 special facility (e.g., a Public Safety Answering Point (PSAP)). 47 CFR § 4.9(e)(1)(ii), (v); see also 47 CFR § 4.9(e)(1)(i), (iii), (iv) (requiring a wireless service outage be reported when it affects a Mobile Switching Center, when it affects at least 667 OC3 minutes, or when it potentially affects any special offices and facilities). PSHSB’s meetings with T-Mobile to discuss their outage reports during the pendency of a permit-but-disclose proceeding fall within exception 10 of the Commission’s ex parte rules. 47 CFR § 1.1204(a)(10). Accordingly, PSHSB did not ask T-Mobile to disclose these meetings in the docket for this proceeding, PS Docket No. 20-182, because doing so would have interfered with the effective conduct of its investigation. Wireless providers must submit this notification in NORS within 120 minutes of discovering that such an outage has occurred. 47 CFR § 4.9(e)(1); 47 CFR § 4.11 (specifying additional information that these reports must contain). Wireless providers must also file an Initial Outage Report not later than 72 hours after discovering the outage, and a Final Communications Outage Report not later than 30 days after discovering the outage. 47 CFR § 4.9(e)(4). The Commission requires wireless providers to notify 911 special facilities as soon as possible when they discover outages that could affect them. See New Part 4 of the Commission’s Rules Concerning Disruptions to Communications, ET Docket No. 04-35, Report and Order and Further Notice of Proposed Rulemaking, 19 FCC Rcd 16830 (2004) (2004 Part 4 Report and Order); 47 CFR § 4.9. Wireless providers must convey all available and potentially useful information to the 911 special facility to help mitigate the effects of the outage on callers to that facility. See New Part 4 of the Commission’s Rules Concerning Disruptions to Communications, ET Docket No. 04-35, Report and Order and Further Notice of Proposed Rulemaking, 19 FCC Rcd 16830 (2004); 47 CFR § 4.9(e). III. INCIDENT AND RESPONSE A. Architecture of T-Mobile’s Network 4. T-Mobile operates a nationwide wireless network which supports cellular calling over 4th Generation Long Term Evolution (LTE), 3rd Generation (3G), and 2nd Generation (2G) cellular technologies, as well as calling over Wi-Fi. The overwhelming majority of T-Mobile customers use T-Mobile’s LTE network. All Voice over LTE (VoLTE)-capable handsets sold by T-Mobile’s network have the capacity to make calls over the 3G, 2G, or Voice over Wi-Fi networks when VoLTE calling is not available. While each of these technologies uses different facilities to access the network, they also work interdependently to ensure that phone calls can complete across different networks. 5. Each LTE tower in T-Mobile’s network relies on wired connections and several routers to take the necessary steps to complete calls and provide Internet access. This network is illustrated in Figure 1 below. When a mobile device initiates a VoLTE phone call, the call transits from the device to the LTE tower, from the LTE tower through the Evolved Packet Core, and then to the IP Multimedia Subsystem where the device is registered and the call is routed appropriately towards the destination. The Evolved Packet Core allows the devices to access the Internet and provides a multitude of other critical functions. When a mobile device uses LTE data, this data connection transits from the device to the LTE tower, from the LTE tower to the LTE Evolved Packet Core, and then unlike a VoLTE call, to the Internet. Figure 1 – Paths of Calls and Data Across T-Mobile’s Network 6. The vast majority of mobile devices connected to T-Mobile’s network are simultaneously registered and connected to both LTE and T-Mobile’s 3G and 2G circuit-switched networks. This allows for calls to be placed over circuit-switched networks in the event a VoLTE or Wi-Fi call fails or times out, or if LTE and Wi-Fi coverage is poor in a particular area. To access the 3G and 2G circuit-switched networks, as well as the 911 network, the IP Multimedia Subsystem uses a node to send calls to the correct gateways that lead to other networks. 7. Routing. Routers connect T-Mobile’s LTE towers to T-Mobile’s LTE network. These routers utilize a routing protocol called Open Shortest Path First. Under this protocol, each set of lines that connects T-Mobile’s routers (also known as a “link”) is assigned a weight. The network decides where to send LTE traffic by selecting the route with the lowest cumulative weight. Figure 2 illustrates an example of how the optimal route is determined. Figure 2 is for illustrative purposes only and does not reflect the actual weighting of T-Mobile links, T-Mobile’s network topology, or the locations of T-Mobile routers. Figure 2 – Example of the Open Shortest Path First Protocol 8. In Figure 2, each line that connects routers is assigned a number that represents its weight. For traffic that originates from the Seattle router and is destined for Miami router, the path with the lowest cumulative weight transits through the Los Angeles router. However, if the link between Seattle and Los Angeles loses connectivity, the network will adapt by finding a new, alternative route. In that instance, the path through Denver and New York will now have the lowest cumulative weight. 9. Mobile Device Registration. To make a VoLTE or Wi-Fi call, mobile devices must register with T-Mobile’s Internet Protocol (IP) Multimedia Subsystem prior to sending and receiving any phone calls or most text messages on T-Mobile’s network. The IP Multimedia Subsystem’s registration system utilizes three nodes that manage the secure connection to the mobile device, route the call, and initiate the connection with the called party. The mobile device registration system periodically refreshes user registrations to ensure that information that it needs to connect calls and text messages stays updated. To promote efficiency, T-Mobile programmed the system to retain, for each user, a record of which node was last used to initiate a connection with a called party. If that specific node is congested, the system is designed to try using a different node to complete the registration. T-Mobile states that its IP Multimedia Subsystem contains many instances of this three-node registration system to provide both regional and national redundancy. Conversely, LTE data does not require registration with the IP Multimedia Subsystem. 10. Mobile devices also do not have to go through this same registration authentication process to place 911 calls. FCC rules require unregistered mobile devices to be able to complete 911 calls to Public Safety Answering Points (PSAPs). See 47 CFR § 9.4 (requiring commercial mobile radio service (CMRS) providers subject to the 911 rules to transmit all wireless 911 calls “without respect to their call validation process”); see also 911 Call-Forwarding Requirements for Non-Service-Initialized Phones, Notice of Proposed Rulemaking, 30 FCC Rcd 3449, 3450, para. 1 (2015) (stating that the rule requires providers to transmit both 911 calls originating from customers that have contracts with CMRS providers and calls originating from “non-service-initialized” devices to PSAPs). Accordingly, in T-Mobile’s network, the IP Multimedia Subsystem reroutes 911 calls made over VoLTE to a different node on T-Mobile’s network that is responsible for processing 911 calls. As a result, in the event of an outage affecting T-Mobile’s IP Multimedia Subsystem, 911 calls made over VoLTE and Voice over Wi-Fi would complete at a higher rate than other VoLTE and Voice over Wi-Fi calls. B. Root Cause and Event Summary 11. On June 15, 2020, T-Mobile was midway through the process of installing routers in the southeast region of its network. Once the first of the two routers was installed, active, and handling customer traffic, T-Mobile planned to slowly integrate the second new router into its network. T-Mobile had planned to configure the second new router so that it was passive, connected to the network but only receiving network traffic if another router or link between routers failed. To do this, T-Mobile deliberately configured the links connected to the new passive router to have high Open Shortest Path First weights. In the course of that configuration, T-Mobile misconfigured the weights of the links of another router that was already active in the network segment but was not designed to process call signaling traffic. T-Mobile did not have a fail-safe process in place to prevent or provide notice of this misconfiguration. In the event of a router or link failure, the low Open Shortest Path First weights to this router would cause it to receive a large percentage of call signaling traffic, which it could not pass. 12. Link Failure Exacerbated by Routing Misconfiguration. At 12:33 PM EDT, a fiber transport link in the southeast region of T-Mobile’s VoLTE network failed. A full timeline of this outage and T-Mobile’s attempts to mitigate it is included as Appendix A. Although T-Mobile generally designed its network to mitigate this kind of failure by transferring traffic across a different link, T-Mobile had misconfigured the weight of the links to one of its routers as illustrated in Figure 3. This prevented the traffic from flowing to the new active router as intended. Configuration A represents how T-Mobile misconfigured the Open Shortest Path First weights of the links in the network segment that precipitated this outage. As a result, when the fiber transport link failed and traffic was redirected to the lowest cumulative weight route, it arrived at a router that was not prepared to receive call signaling traffic, resulting in the traffic being dropped. Configuration B represents how T-Mobile could have configured the Open Shortest Path First weights to properly transfer traffic in the event of a link failure. If T-Mobile had set the weights of the Open Shortest Path First links to the routers that could not process call signaling traffic higher, as Configuration B shows, then traffic would not have flowed to routers that could not process call signaling traffic and dropped. Instead, the path with the lowest cumulative weights would have passed only through routers that were prepared to handle call signaling traffic and to the passive router, as intended. Instead, traffic flowed to a router that was not prepared to receive that traffic and not properly configured to pass a large percentage of the call signaling traffic it received. The traffic dropped was Multi-Protocol Label Switching (MPLS), which is used for traffic engineering and optimizing the resources of a network. Because the router could not pass the traffic, the Atlanta market became isolated, causing all LTE users in the market to lose connectivity to LTE data, VoLTE, and the 3G and 2G circuit switch network, which disrupted voice, text, and data services in the Southeast. After twelve minutes, at 12:45 PM EDT, the fiber transport link was restored without intervention, ending the isolation of the Atlanta market. However, registration system congestion caused by the link failure would continue to affect T-Mobile’s networks. 13. Software Error. When the fiber transport link failed and the Atlanta market became isolated, mobile device registration attempts in that market timed out. Mobile devices in the Atlanta area then tried to re-register with the IP Multimedia Subsystem using Wi-Fi. While the network was designed to connect to a different node to complete the registration, a software error triggered by the market isolation prevented that connection from being completed. This software error likely did not cause problems before this outage occurred because the outage was the first notable market isolation since T-Mobile integrated this software into its network. Instead, the registration system repeatedly routed re-registration attempts for each mobile device to the last node retained in its records, which was unavailable due to the market isolation. Accordingly, mobile devices repeatedly attempted and failed to register using Wi-Fi, creating a “registration storm” that congested the IP Multimedia Subsystem. After the failed optical link was restored at 12:45 PM EDT, the mobile devices that had attempted and failed to register over Wi-Fi now attempted to re-register over VoLTE. Because the IP Multimedia Subsystem was still congested by the registration storm, the VoLTE re-registration attempts failed, and their network activity further exacerbated the congestion. T-Mobile customers that could not re-register were unable to make VoLTE and Wi-Fi calls, but could use LTE data. 14. Troubleshooting Misdiagnosis Exacerbates Outage. While T-Mobile engineers attempted to recover from this outage and restore service, they ended up exacerbating its impact because they misdiagnosed the problem. T-Mobile believed that the fiber transport link that failed earlier in the day was continuing to cause the ongoing outage. Acting on this belief, T-Mobile manually shut down the link in an attempt to transfer traffic away from it. Due to the still-misconfigured Open Shortest Path First weights, however, these steps recreated the outage’s initial conditions. LTE customers in the Atlanta market were again disconnected from the LTE network and forced to establish calls over Wi-Fi, and their registration attempts again failed and created a registration storm that added further congestion to T-Mobile’s IP Multimedia Subsystem. The outage remained contained to the Atlanta market at this point because T-Mobile’s network is designed so that registration traffic in one region cannot overflow into other regions. 15. T-Mobile engineers almost immediately recognized that they had misdiagnosed the problem. However, they were unable to resolve the issue by restoring the link because the network management tools required to do so remotely relied on the same paths they had just disabled. When T-Mobile engineers were able to access the equipment on site and correct their mistake by restoring the link an hour later, customers in the Atlanta market were again able to attempt to register to VoLTE. However, this again created additional congestion because T-Mobile engineers had not yet addressed the software error that prevented registrations from completing. 16. Nationwide Spread. This wave of Voice over Wi-Fi and VoLTE registration attempts resulted in the outage spreading out of the Atlanta market and across the country. When the IP Multimedia Subsystem’s registration system for the Atlanta market was unavailable, external incoming traffic destined for that system was redirected to the IP Multimedia Subsystem registration systems for other regions. This, in turn, created enough congestion in those registration systems to cause the T-Mobile network to send the registration attempts to other nodes. The software error again routed re-registration attempts to the last node on record, which was likely already experiencing severe congestion. Around 3:00 PM EDT, IP Multimedia Subsystem, VoLTE, and Voice over Wi-Fi registrations began to fail nationwide as all IP Multimedia Subsystem registration nodes became increasingly congested. 17. Spread to 3G and 2G Networks. The vast majority of T-Mobile customer mobile devices that were unable to connect to the VoLTE or Voice over Wi-Fi networks after 10 seconds fell back to T-Mobile’s 3G and 2G circuit-switched networks to make and receive calls while the device continued its registration attempts to the VoLTE network. The large number of devices attempting to fall back to the 3G and 2G networks created intermittent congestion in those networks, too. When 3G and 2G calls began to fail due to that congestion, the network nodes that choose gateways for IP Multimedia Subsystem calls destined for those networks would hold the resources for these call sessions after the call terminated. These abandoned call sessions’ resource reservations overwhelmed these nodes’ computing resources, which caused many 3G calls and 2G calls to fail. 18. Spread to 911 Networks. While mobile devices do not need to have their registration authenticated in order to complete 911 calls, 911 calls were nonetheless affected by the 3G and 2G network congestion because the same network nodes that choose gateways for calls destined for 2G and 2G networks also choose gateways for 911 calls. When those nodes’ computing resources became overwhelmed by abandoned call sessions’ resource reservations, it also caused many 911 calls to fail. C. Mitigation and Restoration Efforts 19. T-Mobile noticed service disruptions in its LTE network starting with the initial link failure at 12:33 PM EDT on June 15, 2020 when it confirmed that a fiber transport link to one of its routers failed. From 12:45 PM EDT until 3:22 PM EDT, T-Mobile believed the service disruptions were caused by either the new router it connected to the network or the link that had failed at the start of the outage, which led T-Mobile engineers to manually shut down the external link to the new router. T-Mobile filed a notification in NORS at 3:06 PM EDT. T-Mobile filed its initial report on June 18, 2020 at 2:57 PM EDT and its final report on July 15, 2020. PSHSB also reviewed NORS reports filed in connection with the outage from AT&T, US Cellular, and Verizon. 20. From that point onward, T-Mobile’s restoration efforts were primarily focused on quelling the registration congestion. Although T-Mobile had not yet diagnosed or fixed the software error, T-Mobile reduced the number of registrations retries allowed by the IP Multimedia Subsystem registration nodes. T-Mobile attempted to mitigate some of the congestion by requesting that its wholesale transport provider, Inteliquent, Inc. (Inteliquent), lock inbound local and long-distance traffic. In instances in which T-Mobile does not directly interconnect with another telecommunications service provider to deliver voice calls, T-Mobile generally has arranged to exchange voice calls indirectly via Inteliquent’s network (e.g., by designating Inteliquent as T-Mobile’s default tandem). Further, T-Mobile increased capacity of the registration system by activating additional IP Multimedia Subsystem registration nodes to increase capacity. T-Mobile also turned off those nodes’ overload controls, which had transferred excessive signaling to other, regional nodes, spreading the outage beyond the Atlanta market. Finally, T-Mobile restarted, removed, and replaced some of the nodes that choose gateways for IP Multimedia Subsystem calls destined for 3G, 2G, and 911 networks to clear congestion from them. Together, these changes reduced network congestion and restored the network to a normal working state at 12:46 AM EDT on June 16. 21. Public Communication. T-Mobile also attempted to mitigate the effect of the outage by communicating with its subscribers through a variety of channels, including direct communications with subscribers (both individual and enterprise) as well as public statements and responses to media inquiries. Neville Ray, T-Mobile President of Technology, first confirmed the existence of the outage for T-Mobile’s subscribers and the public on Twitter at 4:18 PM EDT by describing it as a “voice and data issue that has been affecting customers around the country,” and he would later tweet to encourage people to use over-the-top voice apps like WhatsApp, iMessage, Signal, and FaceTime. Neville Ray, Twitter, https://twitter.com/NevilleRay/status/1272624569707184128 (last visited Aug. 17, 2020); Neville Ray, Twitter, https://twitter.com/NevilleRay/status/1272650750665953280 (last visited Aug. 18, 2020). T-Mobile tweeted in Spanish to confirm the existence of the outage at 6:25 PM EDT. T-Mobile Latino, Twitter, https://twitter.com/TMobileLatino/status/1272656463148781568 (last visited Aug. 17, 2020). At 7:00 PM EDT, T-Mobile posted an outage statement as a splash screen across all of their digital properties. T-Mobile also used Twitter to announce that the outage had been resolved. 22. Over-the-top Voice Applications. T-Mobile states that the availability of over-the-top voice applications, such as WhatsApp calling, Facetime, and Facebook Messenger, may have mitigated the customer impact of the outage. Consistent with T-Mobile’s recommendation via Twitter, some T-Mobile subscribers completed calls during the outage in this way. T-Mobile states that over-the-top voice applications were functional and used by customers throughout this outage with the exception of short periods in the Atlanta market when LTE data was unavailable. The LTE data on which over-the-top voice applications rely continued to be operational during this outage because LTE data does not require registration with the IP Multimedia Subsystem. Although T-Mobile states that it cannot calculate the exact number of over-the-top calls made during the outage, its data suggests that customers used over-the-top applications to make voice calls during the outage. Not all over-the-top calls carried by T-Mobile’s network were necessarily successful. Some of these calls may have failed because of the data outage in the Atlanta market and because while some over-the-top calling apps are data-only (e.g., WhatsApp), other over-the-top calling apps allow calls to use the public-switched telephone network (e.g., Skype-out), which could have been affected by this outage. The increase in over-the-top traffic was likely driven by users’ inability to complete calls over T-Mobile’s network. 23. PSAP Notification. T-Mobile also attempted to mitigate the effect of the outage by notifying PSAPs. T-Mobile avers that it began notifying all the PSAPs with which it is connected nationwide via email and/or phone call immediately after determining that the outage was reportable. T-Mobile states that it updates PSAP contact information promptly upon receiving updates from PSAPs, and proactively requests the PSAPs to update their contact information twice a year. Specifically, T-Mobile states that it determined the outage was reportable at 2:35 PM EDT and that it began notifying PSAPs at 2:41 PM EDT. T-Mobile’s notification to PSAPs stated that “911 calls are still completing” and warned PSAPs only that the delivery of location information may be affected. Immediately after sending this email message, T-Mobile placed automated phone calls to those PSAPs around the country that had informed T-Mobile that they require telephonic confirmation of outage notifications. T-Mobile’s Initial PSAP Notification stated that “T-Mobile is notifying you that it is working to resolve a network degradation that may impact the delivery of location information to your PSAP(s). T-Mobile understands that voice calls to 911 are still completing. The FCC will be notified when T-Mobile files the appropriate Outage Notification. T-Mobile’s Network Operations Center is available for any inquiries 24 hours a day, 7 days a week at [phone number omitted], Option #7. Please reference Trouble Ticket [number omitted] when contacting T-Mobile.” Although this notification understated the impact of the outage on 911 calling, T-Mobile states that the information it provided to PSAPs was consistent with its understanding of the outage’s impact at that time. T-Mobile did not follow up with PSAPs to update their understanding of how the outage may be affecting them until it began sending emails and/or phone calls to the same PSAPs to inform them that the outage was resolved. T-Mobile’s Follow-Up PSAP Notification stated that “The network degradation T-Mobile recently informed you of (reference Trouble Ticket [number omitted]) is resolved. T-Mobile’s Network Operations Center is available for any inquiries 24 hours a day, 7 days a week at [phone number omitted], Option #7. Please reference Trouble Ticket [number omitted] when contacting T-Mobile.” 24. PSHSB elicited input from public safety entities on this outage’s impact and the notification that they received from T-Mobile. PSHSB did not receive complaints from PSAPs that T-Mobile did not notify them, and PSAPs did not submit comments on the record raising concerns that T-Mobile’s PSAP notifications were inaccurate. Jefferson County, Colorado Emergency Communications Authority (Jefferson County, Colorado) commented that it did not receive an official notification from T-Mobile about the outage until 3:00 PM EDT and that, because of that lapse in time, it “was forced to resort to piecing available information together to discover the scope of the outage,” including later trying to determine which service providers were impacted by the outage. Jefferson County Communications Center Authority July 2, 2020, Comments at 2. 25. Public Notification by PSAPs. Public safety officials in Seminole County, Florida and Jefferson County, Colorado sent emergency alerts using the Emergency Alert System and Wireless Emergency Alerts to mitigate the outage’s impact on the public’s access to emergency services. Both entities’ alerts informed the public that 911 service was down for “some carriers,” and instructed the public that, if they needed emergency assistance, they should call a PSAP’s alternative 10-digit number, which they included in the body of the message. The public notification that these PSAPs provided is a standard response to an observed degradation of 911. T-Mobile’s PSAP notification did not include sufficient information to inform PSAPs about how the public could work around this outage. Jefferson County, Colorado states that, when it sent out its alert at 7:02 PM EDT, it received “122 emergency and administrative calls in the next [five] . . . minutes,” Id. at 1. suggesting that there was significant pent-up demand to reach the PSAP because callers were not able to reach it by dialing 911. Jefferson County, Colorado states that it received over 1,800 calls to its administrative line on June 15, 2020, more than on any other single day ever. Id. Some public safety officials also notified the public about T-Mobile’s outage via social media. For example, Allegheny County, Pennsylvania tweeted that some cell phone callers may not be able to call 911; Allegheny County, Twitter, https://twitter.com/Allegheny_Co/status/1272636542276849664 (last visited Aug. 21, 2020). the Harris County, Texas Sheriff’s Office tweeted that T-Mobile’s outage was affecting 911 service; Harris County Sheriff’s Office, Twitter, https://twitter.com/HCSOTexas/status/1272625869492715521 (last visited Aug. 21, 2020). and the South Salt Lake, Utah Police Department posted on Facebook that texting 911 and its administrative line may work for T-Mobile customers while 911 service was down. South Salt Lake Police Department, Facebook, https://www.facebook.com/SSLPD/posts/1215015118846786 (last visited Aug. 21, 2020). IV. ANALYSIS A. Impact on Calls over T-Mobile’s Network 26. T-Mobile states that its network experienced an 18% reduction in completed calls during the over-12-hour period of the outage when compared to the same period during the previous Monday. T-Mobile states that some T-Mobile customers nationwide may have experienced intermittent issues placing and receiving calls over VoLTE, depositing and retrieving voicemails over LTE, and sending and receiving text messages. T-Mobile also states that, with the exception of subscribers in the Atlanta market during the initial transport link failure and market isolation, all T-Mobile customers were able to use LTE data services throughout the event. T-Mobile states that this outage was unrelated to any work to integrate the T-Mobile and Sprint networks, and legacy Sprint customers were unaffected. 27. Critically, an 18% reduction in call success does not mean that only 18% of calls on T-Mobile’s network failed during the outage. T-Mobile acknowledges that network congestion likely required many of its subscribers to make 2-3 call attempts before successfully connecting. The record in this proceeding demonstrates that consumers were frustrated that their calls failed, rather than satisfied because their calls eventually succeeded after many retries. See, e.g., Denae Jones June 24, 2020, Comments at 1; Mackenzie Rouse and Jake Rouse June 26, 2020, Comments at 1; Franco Eulogio Mau June 24, 2020, Comments at 1; Menachem R. June 24, 2020, Comments at 1; Scott Sprague June 23, 2020, Comments at 1. 28. As such, PSHSB requested for T-Mobile to disclose the number of failed calls during the outage, rather than the reduction in completed calls. T-Mobile states that it is limited in its ability to measure call failures. T-Mobile states that it cannot provide estimates of the total number of call attempts, nor the total number of failed calls during the outage because its network does not record call attempts that failed to successfully register with the IP Multimedia Subsystem or that failed during the earliest stages of the multi-stage call completion process. T-Mobile also states that it cannot accurately estimate the number of calls originating from other service providers (i.e., calls sent to, rather than by, T-Mobile subscribers) that failed while attempting to reach T-Mobile’s network. T-Mobile was able to estimate the number of call attempts that failed as a result of the congestion on their 3G and 2G networks because those calls failed after registration was complete. T-Mobile states that its customers were generally able to place calls and send messages on the T-Mobile network using 2G, 3G, and over-the-top voice and text applications, with “some limitations.” In short, the outage measurements that T-Mobile provided to PSHSB in the course of its investigation likely do not fully capture the call failures caused by T-Mobile’s outage nor accurately reflect the consumer experience of the outage. B. Impact on Calls Initiated Outside T-Mobile’s Network 29. Although T-Mobile states that it cannot accurately estimate the number of calls initiated outside T-Mobile’s network that failed due to the outage, T-Mobile did estimate that the outage resulted in an incremental loss of 1.5% of calls originating on other carrier networks that T-Mobile states it would have attempted to complete on a normal day. PSHSB’s investigation of this outage, however, gives it access to data that was not available to T-Mobile when T-Mobile estimated its outage’s impact. PSHSB finds that T-Mobile’s estimate of failed calls originating from other service providers’ networks is significantly lower than, and conflicts with, some of those providers’ estimates. PSHSB estimates, based on confidential and non-confidential data that other service providers shared with PSHSB, that over 250 million calls (or 73% of calls shared with PSHSB) from other service providers’ subscribers to T-Mobile subscribers failed due to the outage. 30. Based on confidential call success and 3G and 2G call failure data shared by T-Mobile, together with data on 911 calls and calls originating outside of T-Mobile’s network, PSHSB estimates that at least 41% of all calls that attempted to use T-Mobile’s network during the outage did not complete successfully. This estimate does not include any possible call failures arising from T-Mobile subscribers’ VoLTE or Voice over Wi-Fi call attempts, which could not be determined. However, PSHSB expects that if this number could be determined, it would result in PSHSB’s estimate being much larger. 31. CenturyLink. CenturyLink found that its July Monday Average for failed calls was 0.195% of the number of failed calls it experienced on June 15; and that its July Monday Average for dropped calls was 0.053% of the number of dropped calls it experienced on June 15. In other words, CenturyLink experienced more than 500 times more failed calls and more than 1,800 times more dropped calls as a result of this outage. 32. US Cellular. US Cellular reports that 285,497 calls from its network successfully completed into T-Mobile’s network during the outage, as compared to 951,271 during the same period on the preceding Monday, a 70% reduction in call success. US Cellular states that during two periods, between 3:10 – 5:30 PM EDT and 8:20 – 9:20 PM EDT, most calls (99%) were blocked as compared to the average Monday where 1.9% of calls are blocked. US Cellular indicates that a higher percentage of calls did go through between 5:30 PM EDT and 8:20 PM EDT. Further, US Cellular reports that it received 308,766 calls from T-Mobile’s network during the outage as compared to 1,032,543 during the same period on the preceding Monday, a 70% reduction in call success. 33. AT&T. AT&T reports that 30,410,776 AT&T Mobility calls and 93,459 wireline calls from AT&T networks were blocked from delivery to T-Mobile’s network during the outage, as compared to 213,704 combined AT&T Mobility and wireline calls blocked from delivery to T-Mobile’s network on an average Monday. AT&T estimates that it experienced over 99.9% call blocking from AT&T’s network to T-Mobile’s network from 2:00 PM EDT to 6:00 PM EDT, and over 90% call blocking from AT&T’s network to T-Mobile’s network from approximately 6:00 PM EDT to 11:00 PM EDT. AT&T further reports that 8.2 million calls from AT&T Mobility’s wireless network successfully completed into T-Mobile’s network during the outage, as compared to 30.5 million during the same period of the preceding Monday, a 73% reduction in call success. Further, AT&T estimates that 940,000 calls were successfully completed from T-Mobile to AT&T Mobility during the outage period as compared to 19.6 million during the same period on the preceding Monday, a 95% reduction in call success. 34. Verizon. Verizon reports that approximately 11,800,000 Verizon Wireless calls and 373,460 wireline calls were blocked from delivery to T-Mobile’s network during the outage, as compared to less than 10 per hour on an average Monday. Verizon states that, to the best of its knowledge, all wireline calls to the T-Mobile network were blocked for the entirety of the event. With respect to wireless calls, Verizon states that, at 12:33 PM EDT, the failure rate to T-Mobile was 18%; at 2:48 PM EDT, it oscillated between 70% and 95%; at 5:25 PM EDT, it fell to around 50%; and at 10:31 PM EDT, it fell to 8%. By 11:45 PM EDT, the call failure rate fell to Verizon’s baseline call failure rates for a typical Monday. C. Impact on 911 Calls Handled by T-Mobile 35. According to T-Mobile, the outage prevented 23,621 of the 134,874 calls to 911 that reached T-Mobile’s network (17.5%) from reaching PSAPs. 911 calls made over T-Mobile’s VoLTE and Voice over Wi-Fi networks completed at a higher rate than other calls made during the outage because T-Mobile’s 911 infrastructure did not require 911 callers to register using the congested registration nodes. The 911 calls that failed on T-Mobile’s network did so for several reasons: · 7,469 calls to 911 failed due to congestion after reaching the part of T-Mobile’s IP Multimedia Subsystem that sends calls to the 911 network. · 16,152 calls to 911 failed due to congestion after reaching T-Mobile’s 3G and 2G circuit-switched networks. 36. Within this account of failed 911 calls due to congestion, 2,501 calls to 911 failed because of the congestion on PSAP administrative lines, which would have resulted in 911 callers receiving a busy signal, T-Mobile states that these calls failed due to either a lack of availability of 911-call takers or the fact that the administrative line may have been congested by users testing their ability to reach 911. and 1,128 calls to 911 failed because of congestion at a national emergency call center used by T-Mobile to deliver calls that cannot otherwise be routed. T-Mobile also counts within this total 572 calls to California PSAPs that failed because they were not default routed, which T-Mobile asserts to be consistent with state requirements. Cal. Gov't Code § 8592.8-.9 (2017) (stating that a 911 call may be routed to the California Highway Patrol call center or local PSAP after an annual review assesses the appropriate call routing to “maximize the efficiency of the 911 system” based on where the 911 call originates, whether routing is “economically and technologically feasible,” and whether routing “benefit[s] public safety”). T-Mobile delivered location information for 134,524 emergency calls during the outage. Of these, it provided Phase II location information for 111,454 calls, suggesting that only Phase I location was delivered for 23,070 calls. 37. As with its analysis of failed calls during T-Mobile’s outage, PSHSB expects that T-Mobile’s records as to the number of attempted 911 calls are more likely to be complete than T-Mobile’s overall calling records because the 911 calls that did fail during the outage failed because of congestion, not because of an inability to register. PSHSB expects that the average customer may have needed to make multiple call attempts to reach 911. T-Mobile states that it saw a 30% increase in 911 call attempts on June 15 as compared to the average number of 911 calls attempted during the previous two weeks. Some of this increase may have been due to “911 Hang-up/Checks,” where the initial caller called 911 just to confirm whether they would be able to reach 911 call-takers in the event of an emergency, but then hung up, prompting the PSAP to call back. Jefferson County Communications Center Authority July 2, 2020, Comments at 1. Jefferson County, Colorado states that this outage generated a higher-than-normal number of “911 Hang-up/Checks,” which may have delayed needed emergency services because of the need for the PSAP to call back. Id. D. Public Impact 38. Comments confirm PSHSB’s findings the public lost data service in some areas during this outage, and lost calling and texting services nationwide. Due to the congestion in the IP Multimedia Subsystem, T-Mobile customers would have experienced random call success as the network would have blocked normal public calls indiscriminately. This would mean that, while some callers might have been able to connect a call on the first or second try, many other callers would have needed many more call attempts to complete a call. The Bureau highlights the following themes extracted from these public comments to illustrate this outage’s impact. 39. Based on the record, the June 15 outage on T-Mobile’s networks prevented some consumers from summoning the help that they needed during emergencies. Not only were some consumers unable to reach PSAPs by dialing 911, but they also were unable to reach roadside-service providers, medical professionals, and family. One commenter noted that his mother, who has dementia, could not reach him after her car would not start and her roadside-assistance provider could not call her to clarify her location; she was stranded for seven hours but eventually contacted her son via a friend’s WhatsApp. Kevin Fuhr June 23, 2020, Comments at 1. One medical professional said that this outage prevented him from connecting to his patients for telephone and video appointments, exacerbating the stress induced by the coronavirus COVID-19 pandemic. Jake Walsh June 29, 2020, Comments at 1. Another medical professional said he felt lucky that he was not on-call at his hospital on June 15 because he was unreachable due to his cellphone being his primary link to the hospital. Vincent Romanelli June 24, 2020, Comments at 1. Two commenters described being unable to alert family about surgery complications while alone in the hospital due to COVID-19-related restrictive guests policies. See, e.g., Dawn Allen June 25, 2020, Comments at 1 (“I ended up being hospitalized for what was suppose[d] to be an outpatient procedure. I was unable to be reached by my family[,] . . . trying to get a status on my care[,] nor could I reach them. Being in pain and hospitalized while alone with no communication or explanation as to why [telecommunication] services were disrupted compounded…my situation.”); Doug Bass June 25, 2020, Comments at 1 (“I had my knee replaced at 8:45 . . . When I came out of surgery at 11:30, I was unable to get in touch with my mother (the single visitor I was allowed to have for the entire day…due to COVID). I had to wait until almost 2 pm when the nurse was able to get a phone (landline handset) to bring to my room for me to use.”). Fortunately, the Bureau did not receive any comments suggesting that individuals experienced physical harm as a direct result of this outage. 40. The outage likely produced a large financial impact for individuals, employees, and businesses. The record suggests that this outage resulted in a lost day of productivity for many who rely on communications networks to do their jobs. For example, one commenter, a social worker, could not communicate with at-risk children and families. Xiomara Cosme June 24, 2020, Comments at 1 (“I am a Children and Youth Social Worker where most of my job duties are done from my phone. Due to the outage, I was not able to assess the safety of children or follow up with families that where in need.”). Another commenter stated that he missed client phone calls and text messages, which cost him more than $3,000 in billable hours. Jordan June 24, 2020, Comments at 1. Others could not accomplish basic work-related tasks such as scanning packages or using ride-sharing services to commute to work. Brian Elsman June 23, 2020, Comments at 1; Denae Jones June 24, 2020, Comments at 1. Others expressed frustration at paying for T-Mobile’s wireless service, and then not being compensated by T-Mobile when that service became unavailable. Denae Jones June 24, 2020, Comments at 1 ("T[-M]obile never sent a message, nor an apology[, n]or compensated/credited me for that day.”); Mackenzie Rouse and Jake Rouse June 26, 2020, Comments at 1 ("T-Mobile has stated they won’t credit our account because of the outage and I think that’s wrong.”); Franco Eulogio Mau June 24, 2020, Comments at 1(“No refunds.”). The record does show, however, that some individuals who contacted T-Mobile to complain about the outage received $5 or $10. Menachem R. June 24, 2020, Comments at 1 (“A lot of business lost . . . For my issues over two business days, I was given $10 . . . [D]isgusting!”); Scott Sprague June 23, 2020, Comments at 1 (“I reached out to T-Mobile that evening to see what was going on and if I could get some sort of credit from them. I fought with them for over two hours because all they wanted to credit me was five dollars when I had missed out on a couple of hundred dollar jobs.”); Nikki Gilbert July 15, 2020, Ex Parte at 1 (“You charge [$]75.00 or more for family plans but yet you offer me [$]5.00 in a crisis?  Sad.”). 41. Based on the record, the effects of the COVID-19 pandemic on consumers’ employment circumstances appear to have worsened this outage’s disruption of people’s work lives. One commenter stated that he was unable to remotely log in to his workplace because it requires two-factor authentication, which did not function properly due to the outage. George Rasko June 24, 2020, Comments at 1 (“My employer requires two-factor authorization for many on-line activities. Logging-in with a password isn't enough . . . During the T[-]Mobile outage, the IT Help Desk spent two hours trying to figure out why . . . I was not getting a call to finalize my computer login.”). Several commenters stated that the outage caused them to miss job opportunities, including phone interviews, Rachel Church June 24, 2020, Comments at 1; Jordan Abad June 23, 2020, Comments at 1; Michael Farough June 24, 2020, Comments at 1; Samantha Dixon June 24, 2020, Comments at 1. and frustrated furloughed or laid-off employees’ ability to file for unemployment. Martin Jamison-LeGere June 24, 2020, Comments at 1. 42. PSHSB received more than 60 comments reporting a lack of communication from T-Mobile. One commenter states that he thought the issue was his phone, so he bought a new one. James Parziale July 1, 2020, Comments at 1. Another commenter states that he drove to multiple T-Mobile stores during the pandemic to identify the issue. David L. Risdon June 24, 2020, Comments at 1. One commenter states that he tried calling T-Mobile for more information, but T-Mobile did not answer, Brian Elsman June 23, 2020, Comments at 1 (stating that he eventually reached customer service via email). and another commenter states that he called T-Mobile, but the person he spoke to said it was an issue impacting all service providers. Victor Burns June 30, 2020, Comments at 1. Commenters also highlighted opportunities that T-Mobile could have taken to communicate more effectively with the public about this outage. Several commenters noted that T-Mobile could have alerted its customers about the outage via email or text message because LTE data service was available for many consumers. Michael Thaler June 24, 2020, Comments at 1 (“There is NO REASON for T-M not sending texts to ALL CUSTOMERS who might have been affected by the outage and to let us know they are working on it. SMS was working as was data—meaning they could have sent emails.”); Stephanie Christmas June 24, 2020, Comments at 1 (“No information about the outage was available on their website or Mobile App.”); Nathaniel Leandro June 23, 2020, Comments at 1 (“My data services were not interrupted and I have T-Mobile's carrier app installed on my phone. They should have sent a push notification…[or] text alerts.”); Kirk Ealy June 24, 2020, Comments at 1 (“No text from T-Mobile about the issue and at this time my ability to send and receive a SMS message was working! . . . The customer app showed no alert.”); but see J Bibi June 25, 2020, Comments at 1 (stating “I expect consumers to be knowledgeable enough . . . to perform a simple Google search to learn about the outage.”) V. CORRECTIVE ACTIONS BY T-MOBILE 43. After T-Mobile resolved the outage, it took steps to prevent a recurrence of a similar event. Specifically, T-Mobile: · Optimized Open Shortest Path First weights on links connecting to the routers in the Atlanta market; · Created a separate communications channel to enable T-Mobile to manage the affected router even during an outage condition so that, in the case of a recurrence of a similar event, T-Mobile would be able to restore the affected router to a working state more quickly after intentionally taking it offline; · Augmented processes regarding the phased integration of new devices into the network to include additional potential failure scenarios like those seen in this outage; · Activated additional IP Multimedia Subsystem registration nodes to increase capacity; · Revised IP Multimedia Subsystem registration nodes’ overload settings for better management of overload conditions; · Corrected the software error in the IP Multimedia Subsystem; · Introduced additional dedicated 911 nodes to enhance resiliency; · Reduced the number of retries allowed by the IP Multimedia Subsystem registration nodes responsible for managing the secure connection with the mobile device from 4 to 2; · Improved the clarity and specificity of the error message generated on nodes that interconnect with external networks when IP Multimedia Subsystem services are impacted to facilitate future troubleshooting; · Improved call distribution logic for Voice over Wi-Fi services to allow regional containment during potential future outages; · Deployed new vendor software updates to improve IP Multimedia Subsystem node robustness and resiliency; and · Audited multiple systems across the circuit-switch, IP Multimedia Subsystem, and transport networks for potential enhancements. 44. While fiber link failures are common, PSHSB finds that these steps, taken together, will reduce the likelihood that a fiber link failure could result in the recurrence of a similar event in T-Mobile’s network because traffic would be routed to an alternative path that could handle it. Moreover, if such an event recurred on T-Mobile’s network, it would not cause such a large service disruption because T-Mobile would have improved its networks’ ability to manage congestion in the case of a similar event and would have increased network capacity to maintain the network in a working state even with an increased volume of traffic. VI. NEXT STEPS 45. The Bureau plans to engage in stakeholder outreach and guidance regarding industry-accepted, recommended network reliability best practices to protect against similar outages in the future. T-Mobile did not follow several network reliability best practices that could have prevented the outage, or at least mitigated its effects: · Network operators should periodically audit the physical and logical diversity called for by the design of their network segment(s) and take appropriate measures as needed. Communications Security, Reliability and Interoperability Council, Best Practice 12-9-0532 (2011), https://opendata.fcc.gov/Public-Safety/CSRIC-Best-Practices/qb45-rw2t/data; see also FCC, March 8, 2017 AT&T VoLTE 911 Outage Report and Recommendations, PS Docket No. 17-68 (PSHSB May 2017), https://apps.fcc.gov/edocs_public/attachmatch/DOC-344941A1.pdf (recommending the same). The router that dropped signaling traffic and precipitated this outage could never have provided functional diversity for the link that failed because the router was not provisioned to process the signaling traffic that the failed link carried. Further, T-Mobile could have prevented the outage if it had audited its network during the new router integration to ensure that the traffic destined for the failed link would redirect to a router that was able to pass it. If the backup route had operated as it was designed, a nationwide outage would likely not have occurred. · Network operators and service providers should consider validating upgrades, new procedures and commands in a lab or other test environment that simulates the target network and load prior to the first application in the field. Communications Security, Reliability and Interoperability Council, Best Practice 12-10-0559 (2011), https://opendata.fcc.gov/Public-Safety/CSRIC-Best-Practices/qb45-rw2t/data; see also Communications Security, Reliability and Interoperability Council, Best Practice 12-9-8748 (2011), https://opendata.fcc.gov/Public-Safety/CSRIC-Best-Practices/qb45-rw2t/data (stating that network operators, service providers, and equipment suppliers “should test new devices to identify unnecessary services, outdated software versions, missing patches, and misconfigurations, and validate compliance with or deviations from an organization’ s security policy prior to being placed on a network”); Communications Security, Reliability and Interoperability Council, Best Practice 12-9-8035 (2011), https://opendata.fcc.gov/Public-Safety/CSRIC-Best-Practices/qb45-rw2t/data (stating that network operators and service providers should include steps to appropriately test all patches and fixes in a test environment prior to distribution into the production environment in their patch/fix policy and process guidelines). T-Mobile had a latent software error in its network that it failed to identify and address before it had a catastrophic impact. Had T-Mobile validated its IP Multimedia Subsystem registration node software and router integration in a test environment that simulated the relevant network segment, it could have discovered the software flaw and routing misconfiguration before they could impact live calls. · Service providers should use virtual interfaces for routing protocols and network management to maintain connectivity to network elements in the presence of physical interface outages. Communications Security, Reliability and Interoperability Council, Best Practice 12-10-0409 (2011), https://opendata.fcc.gov/Public-Safety/CSRIC-Best-Practices/qb45-rw2t/data. The most severe impact on calling that this outage caused occurred when T-Mobile engineers intentionally took down a link in the course of troubleshooting and then were unable to restore it for an hour. Had T-Mobile maintained a separate communications channel to enable it to manage the affected router even when they took the suspected link down during troubleshooting, they could have maintained superior visibility into the network and potentially resolved the outage more quickly. T-Mobile implemented this best practice as a corrective action to prevent a recurrence of this event. CenturyLink’s failure to implement this best practice also contributed to their December 2018 outage. Public Safety and Homeland Security Bureau, December 27, 2018 CenturyLink Network Outage Report (2019), https://docs.fcc.gov/public/attachments/DOC-359134A1.pdf. PSHSB issued a Public Notice to remind industry of the importance of implementing it. Public Safety and Homeland Security Bureau Encourages Communications Service Providers to Implement Important Network Reliability Best Practices, Public Notice, 34 FCC Rcd 9453 (PSHSB Oct. 15, 2019). · Network operators and service providers should actively monitor and manage 911 network components using network management controls, where available, to quickly restore 911 service and provide priority repair during network failure events. Communications Security, Reliability and Interoperability Council, Best Practice 12-9-0574 (2011), https://opendata.fcc.gov/Public-Safety/CSRIC-Best-Practices/qb45-rw2t/data. Reasonable 911 network monitoring would have revealed to T-Mobile in real time that the outage was causing call blocking on PSAP administrative lines, but the content of T-Mobile’s PSAP notification manifests that it likely did not understand the extent of its outage’s 911 impact while it was occurring. Had T-Mobile actively monitored its 911 network components, it might have been able to provide more accurate PSAP notification. 46. As a result of its investigation, PSHSB has also identified network reliability issues that network reliability standards bodies could study: · Whether VoLTE providers should prioritize redundancy for links that provide transport for signaling and registration traffic between IP Multimedia Subsystem cores and other networks; and · Whether, during any provisioning or rearrangement of IP Multimedia Subsystem routes, a VoLTE provider should prioritize audits of all signaling and registration traffic that would need to be rerouted in the event that the IP Multimedia Subsystem becomes unavailable. T-Mobile could have potentially prevented this outage if it had conducted such an audit as part of the rearrangement and provisioning process. Note that the audit would not have been timely if scheduled for a later date because the failure occurred almost immediately after backup facilities were rearranged. 47. In keeping with past practice, the Bureau plans to release a Public Notice, based on its analysis of this and other recent outages, reminding companies of industry-accepted best practices, including those recommended by the Communications Security, Reliability and Interoperability Council, and their importance. See, e.g., Public Safety and Homeland Security Bureau Encourages Communications Service Providers to Follow Best Practices to Help Ensure Network Reliability, Public Notice, 33 FCC Rcd 3776 (PSHSB 2018). In addition, the Bureau will contact other major transport providers to discuss their network practices and will offer its assistance to smaller providers to help ensure that our nation’s communications networks remain robust, reliable, and resilient. 2 Federal Communications Commission "FCC XX-XXX" APPENDIX A: Outage Timeline Time (EDT) Event 12:33 PM A fiber transport link in the southeast region of T-Mobile’s VoLTE network failed 12:45 PM The fiber transport link was repaired without intervention T-Mobile manually shut down the link to the new router in an attempt to transfer traffic away from the link it suspected was responsible for the ongoing outage. 2:41 PM T-Mobile began notifying PSAPs ~3:00 PM IP Multimedia Subsystem VoLTE and Voice over Wi-Fi registrations began to fail nationwide as all IP Multimedia Subsystem regional registration nodes became increasingly congested. 3:06 PM T-Mobile filed a notification in NORS T-Mobile reduced the number of registrations retries allowed by the IP Multimedia Subsystem registration nodes. 4:18 PM Neville Ray, T-Mobile President of Technology, first confirmed the existence of the outage for T-Mobile’s subscribers and the public on Twitter by describing it as a “voice and data issue that has been affecting customers around the country,” T-Mobile attempted to mitigate the outage by requesting that its wholesale transport provider, Inteliquent, lock inbound local and long-distance traffic. T-Mobile increased capacity of the registration system by activating additional IP Multimedia Subsystem registration nodes to increase capacity. 6:25 PM T-Mobile tweeted in Spanish to confirm the existence of the outage. 7:00 PM T-Mobile posted an outage statement as a splash screen across all of their digital properties. T-Mobile turned off the IP Multimedia Subsystem registration nodes overload controls. T-Mobile restarted some of the nodes that choose gateways for IP Multimedia Subsystem calls destined for 3G, 2G, and 911 networks in order to clear congestion from them. June 16, 12:46 AM T-Mobile network restored to a normal working state. T-Mobile began sending emails and/or phone calls to the same PSAPs to inform them that the outage was resolved. 2