New Research Points to Hidden Vulnerabilities Within Machine Learning Systems
Agencies need to provide extra attention and security for machine learning and artificial intelligence-based systems, beyond the normal level of cybersecurity protection.
Government agencies collect a lot of data, and have access to even more of it in their archives. The trick has always been trying to tap into that store of information to improve decision-making, which is a major focus in government these days. The President’s Management Agenda, for example, emphasizes the importance of data-driven decision-making to improve federal services.
The volume of data that most agencies are working with is such that humans can’t easily tap into it for help with that decision-making. And even if they can perform searches into that data, the process is slow. Plus, because humans can’t get an overview of the entire pool of data, many of the interrelationships, causalities and influences hidden within it are lost. As such, government agencies at all levels are increasingly tapping into artificial intelligence, machine learning, robotic process automation and other similar tools to help sort, classify and mine their data to produce actionable results.
For the most part, these programs have proven to be extremely successful for the agencies that deploy them. Artificial intelligence is very good at analyzing data, and tends to get even more accurate as more data is added to a system.
The future certainly looks bright for AI and related technologies within government. However, a new study conducted by the NCC Group, a global security consulting firm, cautions that there are quite a few hidden dangers when employing AI and machine learning that agencies should know about. NextGov talked with NCC Group Chief Scientist Chris Anley about the results of their study, the specific risks associated with AI and machine learning, and ways that agencies can protect their data and their users from exploits that target those systems.
NextGov: Can you first tell us a little bit about your background and the NCC Group?
Anley: I'm the Chief Scientist at NCC Group, which means I collaborate with colleagues on projects and conduct my own research. We look into attacks and defenses for IT systems, networks and computing devices of all kinds, and publish research in these areas. NCC Group is one of the largest and most respected security consultancies in the world, with over 2,000 employees, 35 offices around the world and 14,000 clients.
My own background is in IT security and software development. I co-founded a company, NGS Software, in 2001 which was bought by NCC Group and I've been associated with NCC ever since.
NextGov: And what led you to research the specific vulnerabilities associated with AI and machine learning systems?
Anley: We started to notice ML applications becoming much more prevalent around five years ago. They present a whole new set of security challenges, so we've been actively researching attacks and defenses since then. In terms of applications, ML used to be a fairly niche activity, but we are increasingly seeing it used for routine tasks like suggesting actions to users in web applications, handling customer support queries and so on.
And attackers are starting to exploit those situations.
NextGov: Are the kinds of attacks being made against AI and ML systems different from the typical kinds of attacks made against government agencies and their networks?
Anley: Yes, there are a range of new types of attacks that apply specifically to ML systems, which is what the bulk of our paper is about, although the traditional security issues like patching, credential management and application security issues leading to traditional data breaches all still apply.
NextGov: The paper you produced details dozens of real-world attacks and successful attack techniques made against AI and ML systems. I want to talk about those, but one of the most striking findings is that you said training ML systems with sensitive or secret information should be considered an especially dangerous practice. Can you explain why you made that statement?
Anley: ML systems perform better when trained on larger amounts of data, so it follows that if the training data is sensitive in some way—say, financial, medical or other types of personal data—then there's an increased potential for security and even regulatory issues. Curating training data can be difficult and time consuming even without the security challenges of privacy, access control and complex configurations.
NextGov: And the kinds of attacks you demonstrated were sometimes able to gather the information that was used to train the system, so loading it up with sensitive information makes the situation worse. Can you talk about some of the other kinds of attacks that are made against AI systems?
Anley: Privacy attacks allow criminals to retrieve fragments of training data from the trained model by submitting inputs in the “normal” way; if the model was trained on sensitive data, some portion of this sensitive training data can be retrieved.
Poisoning attacks allow an attacker to modify the behavior of a model during training, to change the decisions it makes. For instance, if the model was involved in financial decisions, this might allow the attacker a financial advantage, or if the model was making security decisions—perhaps a facial recognition system—then it might allow the attacker to bypass the security check. In some cases the attacker can insert malicious code into the model itself, which could then do anything the attacker wants—install ransomware, mine cryptocurrencies or provide backdoor access.
Adversarial perturbation attacks allow an attacker to change the decision a system makes by making small, carefully chosen changes to inputs. Image classification systems are now a matter of life and death, so it's important to ensure they're robust. For example, issues have been found that relate to road signs and other physical objects in the real world. There are also examples of a 3D printed turtle that is mistaken for a gun or a 3D printed baseball that's mistaken for a cup of coffee.
NextGov: Your report is a fascinating read about how those kinds of attacks can happen. But what about defenses? Is there anything that agencies can do to help protect their AI systems?
Anley: For each attack, we’ve suggested mitigations in the taxonomy section of the paper. Additionally, each of the attacks are referenced in the categorized references section where the academic papers relating to that attack are listed.
There are no silver bullets to defend against these attacks, but the traditional precautions like vigilance, authentication, access controls, rate limiting, careful handling of sensitive data and periodical review by external security professionals are, as always, the best ways to avoid unpleasant surprises.
NextGov: Thanks for your time today. Because defenses are so important, can you explain some specific actions that agencies should take? And given what you learned while researching the paper, do you think that government agencies can safely deploy AI and ML systems without taking on significant additional risks?
Anley: Like any new technology, machine learning brings new opportunities and new risks. There are certainly things that organizations can do when developing and deploying ML systems that will help reduce their risk. Some good general advice includes:
- Patch systems.
- Authenticate, and use multifactor authentication wherever possible.
- Control access.
- Audit.
- Use web application firewalls and whatever other security features are offered by your cloud platform.
- Run NCC Group's ScoutSuite (a multi-cloud security auditing tool) against your cloud estate.
- Implement automated dependency checking and updating as part of your CI/CD pipeline, along with automated scanning of your code for credentials.
- Use a credential vault to store credentials and have your applications retrieve them at runtime, rather than storing them in or alongside the code.
- Have the security of your systems reviewed by external professionals and give those professionals access to your source code to help with their review. And while ensuring that there is a clear directed focus on your areas of concern, you should also give those professionals a broad scope to investigate any security weaknesses in your organization that they find.
And specifically in terms of machine learning or AI systems, do all of the aforementioned activities plus the following:
- If your model is trained on sensitive data, consider refactoring your application so that you don't need to train it on sensitive data.
- If you absolutely have to train on sensitive data, consider differential privacy techniques, anonymization or tokenization of the sensitive data.
- Apply the same supply chain controls to external models that you would to external code.
- Carefully curate your training data and apply controls to ensure that it can't be maliciously modified.
- Authenticate, rate limit and audit access to all models.
- If your model makes sensitive decisions that could be affected by adversarial perturbation, consider taking advice around implementing a training method to make the model more resistant to these attacks.
John Breeden II is an award-winning journalist and reviewer with over 20 years of experience covering technology. He is the CEO of the Tech Writers Bureau, a group that creates technological thought leadership content for organizations of all sizes. Twitter: @LabGuys