ADVERTISEMENT

Cause of FAA Outage

Ok ...

Some of this is what I do for a living. Couple of things here ... the air space in the US wasnt closed, there was a ground stop. Big difference. The infrastructure the FAA uses for NOTAM message delivery is very old. Lastly, dont mean to scare people, but failures of a primary system/application and its back up happens ... a lot ... maybe not so much on critical systems ........ but Ive been in the commercial aviation industry for 23 years (as of yesterday), and the FAA RFO doesnt surprise me at all. Im just happy it wasnt our systems that failed ...
 
Ok ...

Some of this is what I do for a living. Couple of things here ... the air space in the US wasnt closed, there was a ground stop. Big difference. The infrastructure the FAA uses for NOTAM message delivery is very old. Lastly, dont mean to scare people, but failures of a primary system/application and its back up happens ... a lot ... maybe not so much on critical systems ........ but Ive been in the commercial aviation industry for 23 years (as of yesterday), and the FAA RFO doesnt surprise me at all. Im just happy it wasnt our systems that failed ...
It was stated that all of air traffic was grounded for the first time since 9/11 so that sounds big. What do you think it was due to?
 
Tucker with an interesting idea on what occurred.

Those that you love, tell them so every day.

If you have questions about your eternal destination, ask someone.
Please. Your soul will survive forever. Somewhere.

There is NO chance the main system AND the backup system went down at the same time.

What?
If next time they don't want money, just carnage?

Think it can't happen?
It can.
And I believe such will happen in the next 5-10 years, if not before.
 
  • Like
Reactions: TigerGrowls
It was stated that all of air traffic was grounded for the first time since 9/11 so that sounds big. What do you think it was due to?
Simple: a combination of painfully antiquated IT infrastructure for a key backend system and good ole human error. The more antiquated your IT infrastructure is, the more complex/onerous it is to patch because it has to be manually patched and almost certainly has loads of configuration drift. There is a high probability of human error occurring at some point or another. Oh, and a system as old and complex as this one takes a while to fully reboot.

As someone who works in government IT on the infrastructure side, this news didn’t shock me in the least. There’s nothing sexy about prioritizing the modernization of aging IT infrastructure when that takes focus and dollars off of shiny new application features/functionality or general non-IT initiatives. Those kind of things are quick wins that’ll make leadership look good, whereas legacy system modernization is complex, expensive, and can often take years to implement. Shareholders and most stakeholders don’t care one iota about infrastructure until it actually breaks and causes a work stoppage. And even then, they won’t really care until that happens multiple times. This goes for large non-government organizations as well — just ask Southwest Airlines or Yahoo and their crappy backend servers for Rivals.com.
 
It was stated that all of air traffic was grounded for the first time since 9/11 so that sounds big. What do you think it was due to?
Oh it was a big deal, but it was a ground stop. A ground stop is different than closing the airspace, which is what happened on 9/11. On 9/11 enroute aircraft had to land at the nearest airport. For example, at the airport in Gander (the first airport in NA that is enroute for most aircraft coming from Europe) there were 42 aircraft that were forced to land there. It was crazy because no one was allowed to deplane until all passengers were processed and threats assessed. That meant there were aircraft on the tarmac that were running out of fuel and whose lavatories were over flowing.

So, no Tucker ... it wasnt the same as 9/11 because with a ground stop enroute aircraft can still continue to their destination airport. Also ...Tucker ... to say that the FAA and CAA systems are completely separate ... that is a gross over simplification and not really true.

Based on the info we have received from the FAA the outage was due to a cascading database failure on both the primary and redundant platform. The reason for that could be ... any number of things, but was probably due to aging infrastructure and/or human error around a planned maintenance period. That makes sense to me.

Could it have been a cyber attack and they just arent telling us? Possibly ... cant rule it out, but I seriously doubt it. My company (Collins Aerospace/Raytheon) wouldve had to be informed by the FAA so that we could conduct a investigation of our systems that support the FAA, and that hasnt been requested or proactively done (that I can say for certain).

You need to understand that these systems mostly operate on closed infrastructure, so they arent open to the internet or any outside networks. Any cyber attack would have to be done locally, and I cant even begin to stress how ridiculous the security is around these facilities. We bid a recent FAA contract (current incumbent is Harris) where we wouldve had to build a Operations Center in our Annapolis facility. The security component was extensive. The FAA program employees couldnt even share break room/bathroom facilities with non-program employees.


Im babbling now ... so what do I think happened? I think it was a planned maintenance that went terribly wrong for some reason. I fully expect we will see a NOTAM RFP in the next 12 to 18 months to completely replace/update that system.
 
Simple: a combination of painfully antiquated IT infrastructure for a key backend system and good ole human error. The more antiquated your IT infrastructure is, the more complex/onerous it is to patch because it has to be manually patched and almost certainly has loads of configuration drift. There is a high probability of human error occurring at some point or another. Oh, and a system as old and complex as this one takes a while to fully reboot.

As someone who works in government IT on the infrastructure side, this news didn’t shock me in the least. There’s nothing sexy about prioritizing the modernization of aging IT infrastructure when that takes focus and dollars off of shiny new application features/functionality or general non-IT initiatives. Those kind of things are quick wins that’ll make leadership look good, whereas legacy system modernization is complex, expensive, and can often take years to implement. Shareholders and most stakeholders don’t care one iota about infrastructure until it actually breaks and causes a work stoppage. And even then, they won’t really care until that happens multiple times. This goes for large non-government organizations as well — just ask Southwest Airlines or Yahoo and their crappy backend servers for Rivals.com.
^ all these things!!!!!

Couldnt agree more.

Years ago Google had a bug in their ass that they were going to "revolutionize" the aviation industry. They failed miserably and ended up wasting 100s of millions of dollars (drop in the ocean for them) and the reason why was that the airlines response to their ideas were essentially "Ehhhhh if it aint broke, dont fix it".

Unless its labor or fuel, airlines dont spend money on anything unless they absolutely have too.
 
ADVERTISEMENT
ADVERTISEMENT