NIST releases finalized guidelines on protecting AI from attacks

BlackJack3D/Getty Images
The final guidance for defending against adversarial machine learning offers specific solutions for different attacks, but warns current mitigation is still developing.
The final version of the National Institute of Standards and Technology’s guide to combatting artificial intelligence-powered cyberattacks was released on Monday, featuring updated definitions of attacks and mitigation terms as well as recent threat mitigation method developments.
Differentiating adversarial machine learning attacks by predictive and generative AI systems, the report brings standardization to the emerging adversarial machine learning threat landscape.
“AI is useful but vulnerable to adversarial attacks. All models are vulnerable in all stages of their development, deployment, and use,” NIST’s Apostol Vassilev, a research team supervisor and one of the authors of the adversarial machine learning publication, told Nextgov/FCW. “At this stage with the existing technology paradigms, the number and power of attacks are greater than the available mitigation techniques.”
Some of the substantial changes in the final guidelines from the initial version released in January 2024 include an overview of generative AI models’ stages of learning, ongoing open problems in the field and an index on the classes of attacks on different AI systems.
The report lists three distinct threat types for each of the types of AI systems.
For predictive AI systems –– or programs that leverage data to offer forecasts and predictions –– the NIST guidelines review evasion attacks, data poisoning attacks and privacy attacks, all of which change the underlying data powering AI models.
For generative AI models, which are systems composed of algorithms that create unique and new outputs depending on a given input, the three listed attacks are: supply chain, direct prompting and indirect prompt injection.
Direct and indirect prompting attacks use different methods to insert harmful data into the model’s learning pool, potentially corrupting future output.
Supply chain model attacks, alternatively, aim to insert malicious information into a given AI model by targeting components that may be developed by a third-party entity and have access to a given model.
“The statistical, data-based nature of [machine learning] systems opens up new potential vectors for attacks against these systems’ security, privacy, and safety, beyond the threats faced by traditional software systems,” the authors write.
Despite the myriad of specific mitigation efforts offered in the report tailored to each type of attack, Vassilev noted that there are theoretical limits on the general strength of current mitigation techniques, like data sanitation and aligning certain moral guardrails into large language models.
“[This] means organizations need to apply traditional cybersecurity measures to harden the model and the platform it runs on,” he said. “Bottom line, residual risks remain and organizations deploying AI must develop a risk budget they are willing to live with and prepare a plan for recovery from a breach.”
NEXT STORY: USPS plans to use AI to enhance customer service