Skip to content

Instantly share code, notes, and snippets.

@brianray
Last active November 9, 2023 21:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brianray/56f1dc33413274ddb4e4a148d6706b81 to your computer and use it in GitHub Desktop.
Save brianray/56f1dc33413274ddb4e4a148d6706b81 to your computer and use it in GitHub Desktop.

Scenarios

  1. Jailbreaking for Data Leaked from Hacker Actor: the hacker gets data out of the model by pure prompt manipulation,
  2. Intrusion from Hacker Actor: The malicious instructions pass through the application and can instruct the extension services to do bad things.
  3. Data poisoning for Training data: The source data used to train the LLM has malicious content before it is trained.
  4. Prompt Poising: hidden content or injected content unintentional from an innocent bystander.
graph RL

    A1(Active Hacker Actor) -.1 melicious prompt.-> C1[LLM]
    subgraph sub1["1. Jailbreaking for Data Leaked from Hacker Actor"]
        C1[LLM]
    end
    C1[LLM] -.2 unauthrized data.-> A1(Hacker Actor)
    
    A2(Active Hacker Actor) -.1 melicious.->  C2[LLM]
    subgraph sub2["2. Intrusion from Hacker Actor"]
        C2[LLM] -.2 melicious instructions.-> D2[Extension Services] 
    end
    
    A3(Passive Hacker Actor) -.1 insert bad data.-> E3[Data Store]
    D3(User Actor) -.3 good prompt.-> C3[LLM] 
    E3[Data Store] -.2 training.-> C3[LLM]
    C3[LLM] -.4 bad response.-> D3(User Actor)
    subgraph sub3["3. Data poisoning for Training data"]
        E3[Data Store]
        C3[LLM]
    end

    
    D4(User Actor) -.1 cut .-> E4[data or source repository]
    D4(User Actor) -.3 prompt .-> C4[LLM]
    C4[LLM] -.4 bad content .-> D4(User Actor)
    E4[data or source repository] -.2 paste .-> D4(User Actor) 
    subgraph sub4["4. Prompt Poising"]
        C4[LLM]
        
    end
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment