It’s been more than two months since the
ransomware attack on KQED in San Francisco on June 15. Since then, every work
day feels like I am trying to run under water. That’s because life without
normal network services in a modern large broadcast operation (about 350
employees) is, to say the least, a major challenge.
Many of my daily activities are impossible,
while others require re-inventing the wheel or doing things the old way BC — before computers.
After consulting with expert infection
consultants and the FBI, KQED’s initial decision to not to pay the ransom and proceed to making a full recovery on our
own was confirmed as the best approach. Without going into too many details of
exactly how the attack was able to succeed, I will try to give you an idea of
what was affected and how we got through the problems.
My first clue that something was wrong was a
call from the Burk remote control informing me our Sacramento station had no
audio. The KQEI(FM) site audio is fed over an Intraplex by an MPLS data line.
We occasionally get short line drop-outs, so my first response was to call our master
control to get them to connect via ISDN, if they had not already done so.
I also turned on my little radio to make sure
our main San Francisco station had audio. It did. I could not get through on
the MCR hot line and quickly discovered I could not call any phones at KQED. We
have a VOIP phone system, and it was down hard.
OK, I thought, that all made sense: The MPLS
and the VOIP phone system share IP services. I reasoned there was a network
issue at the studio.
About this time, my colleague Steve Pinch,
our FM engineering IT expert, called me to let me know there was a virus attack
on the network, and he was headed back to work. He wouldn’t be able to go home
for a full night’s sleep for the next several days.
Not fully trusting VOIP, we have a second hot
line into MCR that provides a dial tone from the telco central office. Once I
got through, I found the ISDN was connected and audio had returned to our
Sacramento station. I could now assist the announcer getting traffic reports on
the air by using our Telos phone system, which was connected to the telco
central office with a PRI and was working normally.
KQED(FM)'s Steve Pinch worked almost around the clock for days to
developed work-arounds for every problem and showed the production
and news people how to make them work.
Our Comrex BRIC-Link for getting traffic
reports via AOIP was dead. The traffic reports would have to come from one of
our talk show lines for the next three days. It didn’t sound great, but it
worked. Our multiple streaming audio feeds to various streaming providers were
down. These would not be back until we cautiously began restoring the most
critical network services, 12 hours after the shutdown.
Steve and the IT staff immediately placed
themselves into the trouble, but I didn’t have to be at work until the next
morning. It gave me time to ponder what the changes would be and what
work-arounds we would need to keep going.
The next morning I was met with hand-written
signs scattered throughout the building, warning people not to use their
computers or phones, and that information updates could be found on the white
board in the central atrium. Except for the fact we were on the air with normal
programing, it felt like we were back in the 19th century.
The IT staff, assisted by Steve from radio
engineering, worked at a feverish pace to keep what wasn’t infected safe and to
restore services. Virtually all ports on the network switches were immediately
shut off when the infection was detected. Until we knew exactly what happened
and how it spread through the network, “disconnect everything” was the
philosophy, in the hope the infection could be kept from doing any more damage.
This means several things were turned off
that weren’t infected and couldn’t be infected. But as we all know, it’s better
to be safe than sorry.
That is why we lost the traffic report
BRIC-Link and program audio to the Sacramento transmitter and a lot of other
things. The Intraplex uses IP for its connection and the ports were shut off.
That’s also why ISDN was able to keep going for the next three days, no IP
needed. Higher priorities came first and then we could start getting other
things going again, port by port. The highest priority was staying on the air
with normal programming. On air and news came first, and finding ways to keep
them going was a challenge for both the engineers and the news people.
We hired outside consultants to assist us in
protecting what was not infected, while at the same time getting services going
again. The FBI was on site to collect evidence of a crime. They copied the
entire contents of several PCs, including both those that had been left on and
those that had been power cycled. Other PCs, like mine, had been “touched”
earlier by the malware, but had been turned off before the actual attack. All
of these details helped determine just how the bad guys worked and perhaps give
clues as to who was the culprit.
In many ways, we were lucky. We have a great
IT department, good backups and the resources, both human and financial to take
the crisis head-on and keep going.
Our main bit of luck was that our on-air
broadcast systems for both TV and radio were not hit by the attack. More about
this later, but for now let’s just say it could have been a lot worse, were it
not for several good decisions made to keep critical systems isolated.
Dalet Galaxy, our news and production system,
was not so lucky. This was hit, and in the end, the servers and clients needed
to be completely reloaded. The Dalet database and file storage, including all
audio, stories and metadata were not hit, but they would not be accessible for
We were also lucky that our Public Radio
Satellite System equipment was unaffected and on the same network as our on air
Dalet system. We were still receiving both live and non-real time programming.
Also, our Telos phone system was still working for the two-hour daily talk show
and news interviews. The assistant producer call screening program was not
working. It needed to link to the Telos on a different VLAN, and that link was
disconnected. Communication with the hosts became primitive and the next event
countdown clock was gone.
Why were some computers affected and not
others? The infection agent gleaned a system password and found its way into
the active directory server. From there, it found and attacked every PC that
was on active directory and was powered up. All of this we would figure out
How did our on-air Dalet keep going? Dalet
Radio Suite, our on air system, was not on the KQED Active Directory. By
design, Radio Suite had a separate domain and its own network switch. The ports
on the network switch for Dalet Radio Suite were never turned off, but its link
to the main KQED network was removed. It truly became an island with no outside
Most production and news was done with Dalet
Galaxy, which was on the KQED active directory. On Thursday morning before the
infection, we had about 50 working Galaxy servers and clients. By late Thursday
afternoon, we had zero. However, each production room had one on-air Radio
Suite computer mostly used by producers as a utility PC and call screener.
Despite their reduced capacity, these Radio Suite PCs became the new main
The Sadie computers we use for craft editing
were not infected, but with the network disconnected getting audio to and from
the computers became a real challenge. Dalet Galaxy computers in the news edit
rooms were replaced with Radio Suite PCs, and several were added in the
newsroom for editors and other news use.
HERE’S WHAT ELSE WAS AFFECTED
The Nautel Importer; the Arctic Palm HD and
RDS scrolling information manager; the main shared production utility computer;
the on-air utility computer; the main VOIP phone system computer; the computer
at the KQED transmitter; most desktop computers on the network; and the
building security system. It’s not a broadcast PC, but access rights could not
be changed to get people into areas they needed to get to in order to fix
There were other computers and services not
infected, but since we could no longer connect them to the KQED network, we
could no longer use them, and those included our new Telos VX phone system; the
PC that received Associated Press and Bay City News wires and passed them on
the Dalet; audio file converters and file transfer between systems; the Burk
remote control and the Burk Autopilot application; the EAS CAP network
connection; and, the transmitter status and control via IP. Our NTP server
couldn’t be reached by 50+ devices. The Comrex and Tieline devices were off for
several weeks until we turned their network ports back on.
We quickly rebuilt the Nautel Importer, since
we are part of the Broadcaster Traffic Consortium. It was a bit of a challenge
to get it going on 64 bit Windows 7, but after several days we got it going and
back in use. In the mean time we discovered we could feed the Harris Importer
to the Nautel Exporter and it worked fine.
THE SLOW ROAD TO THE NEW
IT set up a full-time help desk in the main
atrium, and this is where staff could go to get laptops connected to the internet-only
At first, smartphones were also connected to
the Wi-Fi, but the Wi-Fi slowed down so much that they had to be removed. The
LTE at our building is ultra-fast and a better choice for the phones.
There were also several printers set up, and
requests for important files to be retrieved from the backups could be made
there, as well. After four weeks, the network printers were added to the Wi-Fi,
and people could print directly from their laptops.
Almost immediately after the attack, most
people installed Slack, an instant messager for business, on their smart phones
and that became the message service in place of email in the days after the
attack. Although the main Exchange email server was down we had a backup email
service through Mimecast, which people were able to access in the first week.
Phones returned after two weeks.
After three weeks, most files could be
retrieved from the network by request and placed in a Google drive.
Before the attack, reporters could already
get audio into Dalet from the field by using FTP. This became the main source
for audio imports. Since the network connection was off for protection, in
order to provide a method to connect to the FTP site over the internet, a
wireless dongle was added to the PC that runs the audio importing application.
The Wi-Fi-to-internet was shared with all of the KQED staff, and at times ran
so slowly the app would need to be restarted. Even when it was running, the
change from a direct Gig connection to a Wi-Fi connection that would drop to
1Mb and the huge increase in use caused many extreme slowdowns.
One work-around was for the production people
to copy files in real time over an audio cable between PCs in order to meet a
deadline. At first the use of USB drives to transfer files was prohibited.
These can be used to spread viruses and we weren’t taking any chances. Later some
USB drives were used, but these had to be scanned by the IT department.
NETWORK SERVICES RETURN
After patching network routers and switches,
new filters could be created to limit what devices could be seen on the
network. No PC needs to be able to see every other PC on the network, even if
it is password protected. As in our case, network accounts can be compromised. To
be sure nothing stopped working, the implementation of these filters had to be
a careful process.
Needless to say, there were a few surprises
when some device would quit working. The filter would be removed, and the
device with the issue would be examined to see what it needed to work and the
filter would be modified and reapplied.
File importing and exporting was reconfigured
and placed back on the high speed network and that made life easier for News
and Production. The call screening software was re-enabled for our talk show.
The roll out of rebuilt PCs started in early August and people could once again
log into a network. The completely rebuilt Dalet Galaxy system returned, and at
that point everybody breathed a collective sigh of relief.
This story is by no means over. It will
probably be several months before all the noncritical services and utilities
return. (I hope to be able to provide an update soon to contrast what “normal”
was before the attack and after.)
With the benefit of this experience, there
are certain things I’d like you to know and consider:
- Takeaway #
1. Have a reliable way of
communicating with your MCR. It could be a landline. It could be every
operator’s cell phone number. It could be a good old radio two-way; we use them
at KQED and having one in our MCR is now a priority.
- Takeaway # 2. Have a way to communicate with staff, such as
a general voicemail box off site, or a business instant message program, like Slack.
Make sure all staff members know how to access it. A cloud-based station wiki
could contain lots of “what to do if” documents. Keep it updated.
- Takeaway # 3. Have hard copies of critical documents on
file, and a copy on an engineering laptop not normally on the network. My
network documentation, including all IP addresses and passwords, was not
available for three weeks — including transmitter wiring, shipping forms, EAS
log masters, time sheets, etc. Don’t forget to keep the hard copies updated. I
had a copy of all important network, engineering and transmitter files on my
home PC, but they were nine years old.
- Takeaway # 4. Have work-arounds in place for everything.
How do your news people get interviews into their computers and edit them
without a network? And then how do they get those files into your on-air
system? If any equipment in your air chain requires a network connection to
function, have a non-network dependent device as a backup.
- Takeaway # 5. Some staff will be more understanding and
adaptable than others. Keeping people up to date about the crisis will go a
long way. But there was also a large increase in sweet treats throughout the
building in the weeks following the attack.
- Takeaway # 6. Keep your critical systems in protected
islands! Our radio listeners and TV viewers could never tell there was a
problem, as our regular programming kept going.
- Takeaway #
7. The staff working on the
crisis will work longer and harder than anyone could expect. Keep them fed when
they are here because they won’t take the time out to feed themselves, as they
should. Don’t get in their way and make sure all staff requests go through
managers. Recognize their efforts, thank them regularly.
While we don’t know who the villains in this
story are, we do know who the heroes were.
First, the IT department led by Michael
Kadel. They were the real saviors of the day and the weeks that followed.
Steve Pinch in FM engineering is truly responsible
for ensuring that KQED(FM) stayed on the air with normal programming. He worked
almost around the clock for days to developed work-arounds for every problem
and showed the production and news people how to make them work.
His counterparts in TV, Jay Strauss and Larry
Bursten, kept TV on the air.
I also think the production people, the news
people and all the staff at KQED deserve credit for their dedication and
perseverance in difficult times. We often talk about how we would respond to a
disaster like an earthquake. This was a disaster of a different kind and we got
through it intact and still going strong. We are a news organization with a
mission to serve our listeners, our viewers and our internet audience. We lived
up to our mission, and we will apply what we learned to future disaster