Skip to content

Instantly share code, notes, and snippets.

@ZapDos7
Last active April 14, 2024 22:51
Show Gist options
  • Save ZapDos7/0ad5b765ea5ca2dd0ed0517a754fa789 to your computer and use it in GitHub Desktop.
Save ZapDos7/0ad5b765ea5ca2dd0ed0517a754fa789 to your computer and use it in GitHub Desktop.

Lakera's 10 day email AI Security course

Day 1

Topics: the evolving landscape of AI security threats, complemented by real-world examples of LLM breaches

As AI becomes increasingly integrated into business operations, it brings with it a variety of risks. Let's explore the most prevalent threats:

  • Model-based attacks: These are designed to manipulate AI models into producing undesired outputs. Common techniques include data poisoning, prompt injection attacks, or gradient-based attacks.
  • Data Security Breaches: These risks involve data exposure, confidentiality breaches, and data loss, potentially leading to identity theft, legal issues, and significant financial and reputational harm.
  • AI Supply Chain Attacks: Targeting AI model development phases, these attacks can manipulate data collection and training, or plant backdoors during development and distribution.
  • DoS Attacks on AI: Denial-of-Service attacks overload AI systems with traffic, disrupting service availability and effectiveness.
  • Social Engineering Attacks: With the rise of tools like ChatGPT, these attacks, which exploit human psychology to breach security or acquire sensitive information, have become more frequent.

You can also have a look at this handy infographic listing some of the most common LLM vulnerabilities, including: prompt injection, phishing, data & prompt leakage, toxic content, hallucinations, command injection or LLM plugins compromise. The list of vulnerabilities is much longer and we'll explore a few of them in our next lesson.

image

If you'd like to get access to the full list, check out our LLM Security Playbook.

LLM Vulnerabilities One Pager. Real-world examples of LLM breaches

Finally, as promised, here are the real-life examples of LLM security breaches identified by Lakera's internal Red Team.

Exploit 1: Prompt injection in Google’s Bard extension. In this case, our engineer injected a prompt directing the Bard extension to summarize an email with "Raccoons are in the store". And so it did ;-)

image

Google's Bard extension exploit. Exploit 2: XSS in a Hosted Agent UI. In this case, Lakera Red team created a simple payload that uses a prompt injection to get the agent to render HTML. Unfortunately, the service rendered the HTML without any sanitization and executed some embedded JavaScript, resulting in a Cross Site Scripting (XSS) attack

👉 Read more and watch the video.

Exploit 3: Data Poisoning an OpenAI Assistant.

Our team leveraged an underlying system that the Assistant pulls data from via a custom function calling tool to bypass the Assistant’s desired behavior. While this example uses a manually-provided function response for demonstration purposes, in a real world application anyone who can modify the data in a downstream system that your LLM application relies on could potentially poison the system.

👉 Read more and watch the video.

Additional Resources:

Real-world LLM Exploits List

Navigating AI Security: Risks, Strategies, and Tools.

Day 2

Topics: OWASP Top 10 for LLM Applications and MITRE ATLAS™, two frameworks pivotal in AI security.

Intro

  • The OWASP Top 10 for Large Language Models focuses on identifying and addressing the most critical security risks specifically for applications that use Large Language Models, like AI chatbots or automated content generators.
  • The ATLAS™ framework by MITRE, on the other hand, is a broader cyber threat matrix. It categorizes and describes various tactics, techniques, and procedures used in cyber threats across different stages of an attack.

Understanding OWASP Top 10 for LLM Applications

The OWASP Top 10 for Large Language Model Applications is a list that outlines the most critical security risks associated with deploying and managing LLMs. This list aims to educate developers, designers, architects, managers, and organizations about potential vulnerabilities in LLM applications.

The top 10 vulnerabilities identified are listed below. (Source: OWASP) image

These vulnerabilities highlight the importance of careful management and security considerations in the deployment of LLMs. For a detailed understanding and further information on these vulnerabilities, you can refer to the official OWASP pages here and here.

Delving into MITRE ATLAS

The ATLAS™ framework by MITRE is a comprehensive matrix that categorizes and outlines various tactics, techniques, and procedures (TTPs) used in adversarial threats, particularly in the cyber domain. It provides a structured representation of how cyber threats operate, assisting in threat modeling, cybersecurity analysis, and defensive strategy development:

  • Reconnaissance: Gathering information to plan an attack.
  • Resource Development: Establishing resources (like accounts or tools) for conducting attacks.
  • Initial Access: The methods adversaries use to gain entry into AI systems.
  • ML Model Access: An attempt to gain some level of access to a machine learning model.
  • Execution: Techniques that result in the adversary-controlled execution of malicious operations.
  • Persistence: Ensuring continued control within an AI system.
  • Defense Evasion: Avoiding detection or blocking defensive measures.
  • Discovery: Understanding the AI environment and operations.
  • Collection: Gathering data of interest for future operations.
  • ML Attack Staging: Moving through an environment to gain more control or information.
  • Exfiltration: Stealing data.
  • Impact: Techniques to disrupt, destroy, or manipulate AI systems or data.

(Source: MITRE)

image

ATLAS™ is valuable for cybersecurity professionals and organizations to understand and mitigate cyber threats effectively. For a detailed overview and in-depth information, you can visit their website here.

Additional Resources:

Day 3

Topics: prompt injections – an increasingly relevant topic for those building with or using LLMs in their day-to-day.

What is a Prompt Injection?

Drawing from OWASP’s definition, a prompt injection is a vulnerability in Large Language Models (LLMs) where attackers use carefully crafted prompts to make the model ignore its original instructions or perform unintended actions.

There are two main types: direct prompt injections, which override system prompts, and indirect prompt injections, which manipulate inputs from external sources.

Prompt injections, listed at the top of OWASP’s Top 10 for LLM Applications, present significant risks in real-world applications. One famous instance involved a prompt injection used to manipulate Bing Chat into revealing its original instructions.

Prompt Injection Attacks in Practice

Early last year, we identified prompt injections as a growing threat. To raise awareness, we launched our AI education game, Gandalf, where players use prompts to trick an LLM into revealing a password. Gandalf has been played by millions of people around the globe, and we were able to analyze resulting prompt injection data, identifying attack patterns and types.

Types of Prompt Injection Attacks (Direct)

Here are key types of prompt injection attacks identified by Lakera’s Red Team:

  • Direct Attacks: Simple instructions directly telling the model to perform a specific action.
  • Jailbreaks: 'Hiding' malicious questions within prompts to provoke inappropriate responses. Example: The "DAN" jailbreak. Keep in mind, that in recent months, 'jailbreaks' have become the overarching term for most attacks described here.
  • Sidestepping Attacks: Circumventing direct instructions by asking indirect questions. Instead of confronting the model's restrictions head-on, they "sidestep" them by posing questions or prompts that indirectly achieve the desired outcome.
  • Multi-language Attacks: Leveraging non-English languages to bypass security checks.
  • Role-playing (Persuasion): Asking the LLM to assume a character's traits to achieve specific actions. Example: Grandma Exploit.
  • Multi-prompt Attacks: Incrementally extracting information through a series of innocuous prompts, instead of directly asking the model for confidential data.
  • Obfuscation (Token Smuggling): Altering outputs so they’re presented in a format that is not immediately recognizable to automated systems and flagged, but can be interpreted or decoded by a human or another system.
  • Accidental Context Leakage: Inadvertent disclosure of training data or previous interactions. This can occur due to the model's eagerness to provide relevant and comprehensive answers.
  • Code Injection: Manipulating the LLM to execute arbitrary code.
  • Prompt Leaking/Extraction: Revealing the model's internal prompt or sensitive information.

For an in-depth exploration and more real-life examples, check out our ELI5 Guide to Prompt Injections and Prompt Injection Attacks Handbook, both available for free.

image

Additional Resources:

Day 4

Topics: the diverse and evolving landscape of traditional and AI cyber security

Traditional Cybersecurity Overview

Traditional cybersecurity focuses on ensuring the integrity, confidentiality, and availability of information.

The evolution of cybersecurity reflects the changing nature of threats over the decades.

Starting from basic malware in the 1980s, the 1990s saw a rise in viruses targeting household computers. The 2000s brought about more sophisticated attacks like credit-card breaches and hacktivism, while the 2010s saw the emergence of nation-state attacks and Advanced Persistent Threats (APTs).

The increase in smart devices has expanded the threat landscape further.

At a very basic level, cybersecurity can be broken down into:

  • Critical infrastructure security
  • Application security
  • Network security
  • Cloud security
  • Internet of Things (IoT) security

It also emphasizes the importance of people, processes, and technology in an organization's security posture.

image

AI in Cybersecurity

AI has significantly enhanced cybersecurity by automating processes, detecting anomalies, and recognizing behavior patterns to quickly identify threats. Unlike traditional methods, AI in cybersecurity can adapt in real-time to complex threats, process large amounts of data, and reduce human error.

AI cybersecurity solutions include (but aren't limited to):

  • Intrusion detection systems (IDS)
  • Data loss prevention (DLP),
  • Security Information and Event Management (SIEM) tools.

Advantages Over Traditional Methods

AI solutions offer flexibility and speed, which are crucial in the rapidly evolving cyber threat landscape. AI's real-time processing and ability to adapt to new threats provide a more dynamic and proactive approach to cybersecurity compared to traditional static models​​.

Securing AI Applications Against Cyber Threats

With the increasing use of AI for critical functions and services, there is a growing need to secure AI systems themselves.

Some of the threats to AI systems include vulnerabilities listed in the OWASP Top 10 for LLM Applications and MITRE ATLAS™ that we explored earlier this week.

Threats such as adversarial machine learning attacks, data security breaches, AI supply chain attacks, DoS attacks, or social engineering attacks can have far-reaching consequences, so it’s essential to understand and protect against them.

image

Some of the best practices for protecting AI systems include:

  1. Implement a Robust AI Security Program: Develop and maintain a comprehensive security strategy, complete with updated AI asset records and clearly designated risk management responsibilities.
  2. Involve Stakeholders Actively: Engage AI experts for security insights and provide specialized training to AI teams to enhance threat identification and prevention.
  3. Establish Advanced Technical Safeguards: Protect data integrity through encryption, enforce strict access controls, and utilize advanced monitoring tools to detect potential threats promptly.
  4. Conduct Regular Security Assessments: Actively perform penetration testing and vulnerability scanning to proactively identify and mitigate security risks.
  5. Adhere to Legal and Regulatory Standards: Stay updated with and comply with regulations like GDPR and CCPA, as well as upcoming AI regulations to ensure data privacy and user trust.
  6. Develop an Incident Response Protocol: Create a detailed plan for immediate action in response to security breaches, including communication strategies and remediation steps.

Day 5

Topics: AI application security, a crucial layer in safeguarding the entire AI system

Intro

AI security can be broadly categorized into three levels:

  • Application security
  • Stack security
  • Infrastructure security

Understanding AI Application Security

In previous emails, we've covered various threats and vulnerabilities inherent to LLM-powered applications. Below is an image depicting a simplified architecture of an LLM application, highlighting various OWASP Top 10 vulnerabilities within the LLM application ecosystem.

image

As illustrated, end-user interactions with the LLM model or agents represent just a fraction of the total LLM ecosystem. With technological advancements, we'll increasingly see LLMs integrated into much more complex systems, connected with plugins and other applications, and tasked with autonomous execution.

This introduces new security challenges, especially since LLMs can be exploited by virtually anyone using plain English prompts.

image

Reactive vs. Proactive Security Approaches

When considering AI application security, it's helpful to look at how we traditionally protect software applications. At the most basic level, we can distinguish between:

  • Reactive Security: This involves responding to threats as they occur, especially critical in LLM applications due to their accessibility and vulnerability to attacks.
  • Proactive Security: Anticipating future risks and taking preventive measures. For LLMs, this includes activities like penetration-testing and red teaming to identify vulnerabilities before deployment.

Securing AI Applications: Best Practices

When building LLM-powered applications, it's critical to implement security from the beginning to protect both your users and your apps. Below, we've highlighted key initial steps to consider when securing your AI:

Before Deployment:

  • Assess your application against OWASP risks specific to LLMs.
  • Conduct red team exercises to pinpoint and address vulnerabilities.
  • Secure your supply chain by evaluating data sources and suppliers.

In-Operation:

  • Implement reactive measures, such as limiting LLM actions on downstream systems and ensuring robust input validation.
  • Integrate AI security tools for real-time threat monitoring and detection.
  • Continuously educate your team on AI security risks and stay updated with the latest developments.

Additional Resources:

Day 6

Topics: AI/LLM red teaming, a crucial practice for ensuring the safety and reliability of AI systems

What is AI/LLM Red Teaming?

The term “red teaming” originated in the Cold War, where the red team’s task was to simulate the enemy’s offensive strategies so the blue team could develop robust defenses.

Red teaming in LLMs involves rigorous testing to identify vulnerabilities, biases, and areas where performance or ethical responses might be lacking. This simulates adversarial attacks or creates challenging scenarios for the model.

By identifying issues, red teaming helps make LLMs robust against misuse and better aligned with ethical standards. Below are some of the common types of attacks on AI systems.

image

How to Carry Out AI/LLM Red Teaming?

There is no uniform approach to effective red teaming. This mostly results from the fact that AI models have unique vulnerabilities and deployment environments, which almost always calls for tailored red teaming approaches.

The best results can be achieved by combining creativity with systematic analysis.

The first step should involve setting clear objectives, such as:

  • Assigning risk levels to AI models to decide the extent of red teaming needed.
  • Deciding what potentially harmful behaviors to target: bias, toxicity, privacy breaches, etc.

Developing attack strategies is where creativity comes into play. The attacks can include:

  • Manual and automated attacks: Usually using a mix of both methods.
  • Employing multiple techniques: Code injection, hypotheticals, pros and cons discussions, role-playing, etc.
  • Scenario development: Creating realistic and extreme situations to test LLM responses.
  • Targeted prompting: Developing prompts to expose biases or unethical responses.
  • Feedback analysis: Analyzing responses for inconsistencies or problematic outputs.

Some of the best practices for effective and ethical red teaming:

  • Diverse teams: Assemble varied teams for different vulnerabilities.
  • Comprehensive planning: Develop detailed testing plans.
  • Iterative testing: Refine strategies based on findings.
  • Ethical consideration: Prioritize ethics in testing.
  • Data recording and analysis: Keep detailed records of attack strategies and outcomes.

image

Ok, but who should carry out red teaming exercises?

Internal vs. External Red Teams

The choice between internal and external red teams depends on the AI system's unique needs and context.

  • Internal red teams offer deep knowledge of their company's AI systems and continuous improvement but may have biases and resource limitations.
  • External red teams provide fresh perspectives and specialized expertise, reducing bias and demonstrating due diligence, but they might lack system familiarity and incur higher costs.

As we’ve established earlier, there’s no universally accepted standard and red teaming exercise calls for a bespoke approach.

In a nutshell—

  • Red teaming helps in creating safer, more reliable LLMs by identifying and mitigating potential harms.
  • It involves a combination of technical expertise, creative thinking, and rigorous testing to ensure LLMs adhere to high ethical and safety standards.
  • As AI continues to evolve, red teaming remains a critical practice for ensuring the responsible deployment of LLMs.

Day 7

Topics: the basic architecture of a modern AI tech stack and how to evaluate security solutions

Intro

The architecture of modern AI technology stack is multi-layered, encompassing a range of components from applications to infrastructure. Here’s a quick glance at the key layers:

  1. AI Applications

These are applications of AI technology, which can be categorized into consumer applications, enterprise applications, industry-specific applications (for specific sectors like healthcare or finance), and departmental applications (for specific departments within an organization, like HR or marketing). This is the part of the stack the application end user interfaces with. It will likely also include functionality powered by non-AI, traditional software.

  1. Autonomous Agents

This layer includes AI systems that operate independently, receiving external input from end users or other systems, making decisions and taking actions. They can be either open source (freely available and modifiable) or closed (proprietary and controlled by specific entities). This layer also includes agent management systems, which are tools for overseeing and controlling these autonomous agents.

  1. AI Models / Foundational Models

At this level, we have the core AI models that power applications and agents. These can be proprietary models (developed and owned by specific companies or entities) or open-source models (available for use and modification by anyone).

  1. AI Infrastructure

This is the backbone of AI technology, encompassing cloud services (for computing and storage), software management tools, optimization algorithms, security tools, repositories (for code and data storage), hardware (like GPUs and specialized AI processors), data centers (where the physical infrastructure is housed), and energy considerations (to power and cool the infrastructure).

  1. Data

The fuel for AI models, data can be public (freely available), proprietary (owned by specific entities), or synthetic (artificially generated).

image

One of the core components of the AI infrastructure layer comprises security solutions.

Last year as many as 75% of security professionals witnessed an increase in attacks, with 85% attributing this rise to bad actors using generative AI.

In other words, the AI stack needs robust AI security solutions to get protection against AI-powered attacks.

To help you pick the best defenses, we prepared a handy set of questions that you can use as a checklist to see how much of an overlap there is between your expectations, your organization’s requirements, and the tool’s features.

The checklist we compiled offers a framework to assess and choose AI security tools that prioritize data protection and system integrity.

image

To make the most of the checklist, first determine what answers you’d expect and score them on importance. Then, while scoping solutions, search for the one with the largest overlap on your most vital requirements.

In the rapidly changing realm of AI, the tool and vendor’s adaptability make all the difference!

Additional Resources:

Day 8

Topics: the implications of AI governance, including the impact of the EU AI Act and US regulations on the AI landscape

The EU AI Act

The EU AI Act is a comprehensive legal framework proposed by the European Commission to regulate the use of AI across all sectors except the military.

It adopts a risk-based approach to classify and regulate AI systems according to their potential impact on human rights and values. The Act proposes classifying AI tools into different risk levels, ranging from low to unacceptable, with corresponding obligations for governments and companies using these tools. You can read the Act in full here.

image

Categories of AI Risk in the EU AI Act

  • Unacceptable Risk: Certain uses of AI are banned due to their high potential for harm. This includes AI for social scoring leading to rights denial, manipulative AI targeting vulnerable populations, mass surveillance with biometric identification in public spaces, and harm-inducing AI like dangerous toys.
  • High Risk: These AI applications have significant implications for public safety, fairness, and rights. Examples include AI in critical infrastructure, educational tools, employment management systems, public service applications, law enforcement, migration control, and judicial decision-making. These applications must adhere to strict safety, transparency, and nondiscrimination standards.
  • Limited Risk: This category includes AI applications like chatbots or worker evaluation tools where risks are moderate but still require oversight, such as ensuring users are aware they are interacting with AI.
  • Minimal Risk: Inconsequential AI applications, such as spam filters or basic assistant software, are subject to minimal regulation.

The Act enforces transparency, particularly in high-risk applications, ensuring users are aware when they are interacting with AI systems. Oversight is managed by national authorities and the European Artificial Intelligence Board, reinforcing accountability and public trust in AI technologies.

You can watch the video to learn more.

The White House's AI Bill of Rights

In contrast to the EU AI Act, the White House's AI Bill of Rights is not a binding legal document but rather a set of principles aimed at guiding the ethical use of AI and automated systems.

It emphasizes safeguarding civil rights and democratic values in AI deployment. Key elements include safety and effectiveness of AI systems, protection against algorithmic discrimination, data privacy, clear information about AI use, and ensuring human alternatives and fallbacks. You can read the full text here.

The AI Bill of Rights focuses more on guiding principles for ethical AI use, whereas the EU AI Act is a binding legislative proposal with specific classifications, obligations, and penalties for AI systems and their providers.

Key elements of the "Blueprint for an AI Bill of Rights"

  • Safe and Effective Systems: Protection from unsafe or ineffective automated systems, ensuring safety and effectiveness in their design and deployment.
  • Algorithmic Discrimination Protections: Prevention of discrimination by algorithms and promotion of equitable system design and use.
  • Data Privacy: Protection from abusive data practices, ensuring privacy and user control over personal data.
  • Notice and Explanation: Providing clear, accessible information about the use and impact of automated systems.
  • Human Alternatives, Consideration, and Fallback: Ensuring options to opt out of automated systems in favor of human alternatives and providing means to address system failures or disputes.

image

The framework emphasizes the overlapping nature of these principles to form a comprehensive approach against potential harms from automated systems.

Additional Resources:

Day 9

Topics: the evolving role of the Chief Information Security Officer (CISO) in the era of advanced cybersecurity and AI

The traditional role of a CISO focused on developing and implementing IT security strategies, managing cybersecurity teams, and ensuring compliance with data protection laws and regulations. It was heavily centered on technical aspects and less on integrating security into the broader business strategy.

With the advent of AI, the CISO's role is witnessing an unprecedented transformation.

CISOs are no longer confined to the technicalities of IT security—they are adopting a strategic, holistic approach.

image

Today's CISO is a multifaceted leader.

They are not just managing IT security but are at the forefront of creating a resilient, AI-aware organizational culture. A culture where AI is seen as both a tool for boosting productivity but also a potential vector of attack for malicious actors.

A recent survey from Splunk revealed that growing numbers of CISOs are in fact incorporating AI solutions into their work toolbox.

As many as 35% of CISOs report using AI, either extensively or somewhat, for positive cybersecurity functions. Another 61% express that they either have plans to use it in the next 12 months, or are interested in doing so.

This role is dynamic, strategically important, and central to the success of modern business operations in the face of evolving cyber threats and technological advancements.

To learn more about how GenAI is impacting the role of a CISO, read this report from Gartner.

image

Day 10

Topics: places worth visiting online to deepen your understanding and follow the latest developments

Lakera’s Resources

AI/LLM Safety & Security Frameworks

AI Regulations (Proposed)

Guidelines

  • Adopting AI Responsibly – World Economics Forum’s guidelines for procurement of AI solutions by the private sector.

Reports

Databases

Resource Collections

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment