Skip to content

Instantly share code, notes, and snippets.

@nicholasjackson
Last active June 29, 2023 12:41
Show Gist options
  • Save nicholasjackson/4593d9ffbc1a719e68b634e36e7bb0b7 to your computer and use it in GitHub Desktop.
Save nicholasjackson/4593d9ffbc1a719e68b634e36e7bb0b7 to your computer and use it in GitHub Desktop.
Vault Honeypots

Vault Honeypots

I spoke at an event in Oslo where Vesselin gave a talk about how honey pots could be used to automate network security by automatically updating cloud armor rules to block the attacker.

At the time I was giving a talk on Terraform and Vault, but as I watched Vesselin's talk I wanted to build a system that could automate firewall rules using Terraform.

Introduction

Before we continue let's take a quick look at what a honeypot is, the best way to find any definition, is to google it.

Not quite what I was looking for, let's checkout the second listing.

A honeypot is a network-attached system set up as a decoy to lure cyber attackers 
and detect, deflect and study hacking attempts to gain unauthorized access 
to information systems. 

The function of a honeypot is to represent itself on the internet as a 
potential target for attackers usually, a server or other high-value asset 
and to gather information and notify defenders of any attempts to access 
the honeypot by unauthorized users.

Better, so we have a fake server, in our instance a fake Vault server, we want to report any malicious activity from this server and use it to configure our firewall.

The flow that Vesselin proposed was this, we are going to build something very similar. Before we do, let's look at the last part of this puzzle.

What is infrastructure as code?

So now we have defined what things are, let's see the system in action. We are going to play you a video as we are going to live code this and there is a very good chance things will go horribly wrong.

Honeypot in Action

Show Video

Building the system

This is what our system is going to look like, it is fairly simple for demo purposes.

The flow which looks very similar to Vessilin's looks like this..

Explain Flow

Managing firewall rules using infrastructure as code

So let's take a look at our first part, how are we going to update our cloud armor rules?

We are going to use some Terraform. Let's build that up.

First we need to define a variable to store the banned ip addresses in, we will use this to dynamically generate our firewall rules.

1_variable variables.tf

variable "deny_list" {
  default     = ""
  description = "Deny list for application security policy"
}

Next let's create a local variable that takes our terraform input variable and converts it into an array

2_local loadbalancer.tf

locals {
  deny_list = split(",", trim(var.deny_list,","))
}

Then we can define our cloud armor policy

3_rule loadbalancer.tf

resource "google_compute_security_policy" "security-policy-1" {
  name        = "armor-security-policy"
  description = "security policy for cloud run"

  # By default allow all traffic
  rule {
    action   = "allow"
    priority = "2147483647"

    match {
      versioned_expr = "SRC_IPS_V1"

      config {
        src_ip_ranges = ["*"]
      }
    }
    
    description = "Lowest priority rule, all other rules will be evaluated first"
  }
}

What we need to do is to add another rule to this that is built up from our variable. In fact we need to add multiple rules for each blocked ip address, that looks like this.

4_dynamic loadbalancer.tf

  # Create a Deny rule for all ips in the deny_list variable
  dynamic "rule" {
    for_each = local.deny_list
    
    content {
      action   = "deny(403)"
      priority = "${index(local.deny_list, rule.value)}"
      
      match {
        versioned_expr = "SRC_IPS_V1"
        config {
          src_ip_ranges = [rule.value]
        }
      }
    }
  }

Once all that is done we can assign the rule to the backend

5_policy loadbalancer.tf

  security_policy = google_compute_security_policy.security-policy-1.self_link

Now all that is done let's run a terraform plan to check that things are working and push the code. You can see that terraform cloud is running a plan, we can see the changes, so let's press apply and our rules get updated.

Automating Terraform from Honeypot Events

This is fine, what we really want to do is to automate this so that an event from the Honeypot automatically triggers this update.

Let's create a new cloud function that will update the variables in Terraform whenever the the Honeypot detects a new threat.

New Attack

We are going to write this in Go but you could use any language that is supported by Cloud Functions.

6_attack_func new-attack/main.go

func init() {
	// Register a CloudEvent function with the Functions Framework
	functions.CloudEvent("NewAttackFunction", newAttackFunction)
}

// Function newAttackFunction accepts and handles a CloudEvent object
// and updates terraform with the details of the new attacker.
func newAttackFunction(ctx context.Context, e event.Event) error {
}

The first thing we need to do is to grab some settings that we are passing as environment variables to the function

7_env_vars new-attack/main.go

  log.Println("New Attack Event called with data", string(e.Data()))

  tfeAddress := os.Getenv("TFE_ADDRESS")
  tfeToken := os.Getenv("TFE_TOKEN")
  tfeWorkspace := os.Getenv("TFE_WORKSPACE")
  tfeVariable := os.Getenv("TFE_VARIABLE")

The payload that Vault sends is a simple json payload containing the ip address we need to parse this from the payload sent in the event.

8_decode new-attack/main.go

  // Decode the message
  message, err := decodeMessage(e)
  if err != nil {
    log.Println("Unable to decode message", err)

    // Do not attempt re-delivery as the message is invalid
    return nil
  }

9_decode_func new-attack/main.go

type pubSubData struct {
  Message pubSubMessage `json:"message"`
}

type pubSubMessage struct {
  Data string `json:"data"`
}

type attackMessage struct {
  IP string `json:"ip"`
}

type applyMessage struct {
  Type string `json:"type"`
}

// decodeMessage decodes the message from the event
func decodeMessage(e event.Event) (*attackMessage, error) {
  var data pubSubData
  var message attackMessage

  // decode the data
  err := json.Unmarshal(e.Data(), &data)
  if err != nil {
    return nil, fmt.Errorf("unable to deserialize data: %s", err)
  }

  // parse the message
  md, err := base64.StdEncoding.DecodeString(data.Message.Data)
  if err != nil {
    return nil, fmt.Errorf("unable to decode message data: %s", err)
  }

  err = json.Unmarshal(md, &message)
  if err != nil {
    return nil, fmt.Errorf("unable to deserialize message: %s", err)
  }

  return &message, nil
}

Once the message is decoded we can get the attackers ip

10_check_ip new-attack/main.go

  // If the message has no payload do nothing
  if message.IP == "" {
    log.Println("Message does not contain an IP address, skipping")

    return nil
  }

Then we can construct a tfe client and get the existing value of the variable

11_tfe_client new-attack/main.go

  // Create a tfe client
  config := &tfe.Config{
    Address:           tfeAddress,
    Token:             tfeToken,
    RetryServerErrors: true,
  }

  // Create a TFE client
  tfeClient, err := tfe.NewClient(config)
  if err != nil {
    log.Println("Unable to create TFE client", err)

    return err
  }

  denyList, err := getVariableFromTFE(tfeClient, tfeWorkspace, tfeVariable)
  if err != nil {
    log.Println("Unable to get variable from TFE", err)

    return err
  }

12_get_func new-attack/main.go

// getVariableFromTFE gets the given variable from a workspace
func getVariableFromTFE(client *tfe.Client, workspace, variable string) (string, error) {
  ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
  defer cancel()

  vars, err := client.Variables.List(ctx, workspace, &tfe.VariableListOptions{})
  if err != nil {
    return "", fmt.Errorf("unable to list variables for workspace %s: %s", workspace, err)
  }

  for _, v := range vars.Items {
    if v.Key == variable {
      return v.Value, nil
    }
  }

  return "", nil
}

Once we have the variable let's check to see that it is not already on our list

13_update_ip new-attack/main.go

  // Add the IP to the variable if not present
  log.Printf("Updating variable '%s', current value: '%s', adding '%s'\n", tfeVariable, denyList, message.IP)
  newDenyList, err := addIPIfNotPresent(denyList, message.IP)
  if err != nil {
    log.Println("Address already exists in list, quitting")

    return nil
  }

14_update_func new-attack/main.go

func addIPIfNotPresent(variable, messageIP string) (string, error) {
  // the variable is stored as a comma separated list, build an array
  ips := strings.Split(strings.TrimSuffix(variable, ","), ",")
  for _, ip := range ips {
    if messageIP == ip {
      return "", fmt.Errorf("ip allready exists in collection")
    }
  }

  // add the ip to the collection and update TFE
  ips = append(ips, messageIP)
  ipsString := strings.Join(ips, ",")

  return ipsString, nil
}

If it is not already on the list, lets update TFE with the variable

15_update_tfe new-attack/main.go

	err = updateVariableInTFE(tfeClient, tfeWorkspace, tfeVariable, newDenyList)
	if err != nil {
		log.Println("Unable to update varaible", err)

		return err
	}

16_tfe_update_func new-attack/main.go

// updatesAVariableInTFE updates the given variable with the new value
func updateVariableInTFE(client *tfe.Client, workspace, variable, value string) error {
  ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
  defer cancel()

  vars, err := client.Variables.List(ctx, workspace, &tfe.VariableListOptions{})
  if err != nil {
    return fmt.Errorf("unable to list variables for workspace %s: %s", workspace, err)
  }

  for _, v := range vars.Items {
    if v.Key == variable {
      _, err := client.Variables.Update(ctx, workspace, v.ID, tfe.VariableUpdateOptions{Value: &value})
      if err != nil {
        return fmt.Errorf("unable to update variable: %s", err)
      }

      return nil
    }
  }

  return nil
}

Finally once TFE has been updated, we can then trigger the update by sending a new message to a different pubsub queue.

17_trigger_plan new-attack/main.go

  log.Println("Triggering plan")

  // trigger a cloud update
  err = triggerPlan()
  if err != nil {
    log.Println("Unable to trigger pubsub update", err)
    return err
  }

  return nil

17_trigger_plan new-attack/main.go

// triggerPlan publishes a message to the apply topic, this in turn
// starts a new plan in TFE
func triggerPlan() error {
	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	// Sets your Google Cloud Platform project ID.
	projectID := os.Getenv("GCP_PROJECT_ID")
	if projectID == "" {
		return fmt.Errorf("GCP_PROJECT_ID is not set")
	}

	// Creates a client.
	client, err := pubsub.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("failed to create pubsub client: %s", err)
	}
	defer client.Close()

	// Creates the new topic.
	topic := client.Topic("tfe-apply-topic")

	message := &applyMessage{
		Type: "new_attack",
	}

	data, _ := json.Marshal(message)

	topic.Publish(ctx, &pubsub.Message{Data: data})

	topic.Stop()

	return nil
}

Run Apply

Let's now look at the apply function, like the last function we need to fetch some variables and get the payload

18_decode run-apply/main.go

	tfeAddress := os.Getenv("TFE_ADDRESS")
	tfeToken := os.Getenv("TFE_TOKEN")
	tfeWorkspace := os.Getenv("TFE_WORKSPACE")

	log.Println("New Apply Event called with data", string(e.Data()))

	// decode the data
	data, err := decodeMessage(e)
	if err != nil {
		log.Println("Unable to deserialize data", err)

		// no
		return nil
	}

19_decode run-apply/main.go

type pubSubData struct {
  Message pubSubMessage `json:"message"`
}

type pubSubMessage struct {
  PublishTime time.Time `json:"publishTime"`
  Data        string    `json:"data"`
}

// decodeMessage decodes the message from the event
func decodeMessage(e event.Event) (*pubSubData, error) {
  var data pubSubData

  // decode the data
  err := json.Unmarshal(e.Data(), &data)
  if err != nil {
    return nil, fmt.Errorf("unable to deserialize data: %s", err)
  }

  return &data, nil
}

Then like the other function we create the client

20_create_client run-apply/main.go

  // create a tfe client
  config := &tfe.Config{
    Address:           tfeAddress,
    Token:             tfeToken,
    RetryServerErrors: true,
  }

  // Create a TFE client
  tfeClient, err := tfe.NewClient(config)
  if err != nil {
    log.Println("Unable to create TFE client", err)

    return err
  }

Then let's get the latest run status

21_get_status run-apply/main.go

  ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
  defer cancel()

  status, createdTime, err := getLatestRunStatus(tfeClient, tfeWorkspace)
  if err != nil {
    log.Println("Unable to get run status")

    return fmt.Errorf("unable to get run status: %s", err)
  }

22_get_status_func run-apply/main.go

// getLatestRunStatus gets the latest run status for the workspace
func getLatestRunStatus(client *tfe.Client, workspace string) (tfe.RunStatus, time.Time, error) {
  runs, err := client.Runs.List(context.Background(), workspace, &tfe.RunListOptions{})
  if err != nil {
    return "", time.Time{}, fmt.Errorf("unable to query runs: %s", err)
  }

  if len(runs.Items) > 0 {
    return runs.Items[0].Status, runs.Items[0].CreatedAt, nil
  }

  return "", time.Time{}, nil
}

23_check run-apply/main.go

  // if the last run was after the publish time for this message ignore
  if createdTime.Sub(data.Message.PublishTime) > 0 {
    log.Println("Ignore message, a run has been created after this message was published", err)

    return nil
  }

  // if the current run status is planning or applying, redeliver the message later
  if status == tfe.RunPlanning || status == tfe.RunApplying {
    log.Printf("Current run status is %s, retry later\n", status)

    return fmt.Errorf("retry message")
  }

Finally we can run an apply

24_check run-apply/main.go

  // Create the plan
  err = applyConfig(tfeClient, tfeWorkspace)
  if err != nil {
    log.Printf("Error creating plan, %s\n", err)

    return nil
  }

  // Return nil if no error occurred
  return nil

25_apply_func run-apply/main.go

// applyConfig creates a new plan and apply in TFE
func applyConfig(client *tfe.Client, workspace string) error {
  ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
  defer cancel()

  ws, err := client.Workspaces.ReadByID(ctx, workspace)
  if err != nil {

    return fmt.Errorf("unable to read workspace: %s", err)
  }

  log.Println("Creating Plan")
  run, err := client.Runs.Create(context.Background(), tfe.RunCreateOptions{Workspace: ws, Type: "runs"})
  if err != nil {

    return fmt.Errorf("unable to create plan: %s", err)
  }

  // Runs are async, check the state and when done apply
  for {
    if ctx.Err() != nil {
      return fmt.Errorf("timeout waiting for run")
    }

    // Check the state of the run
    run, err := client.Runs.Read(ctx, run.ID)
    if err != nil {

      return fmt.Errorf("unable to check plan status: %s", err)
    }

    if run.Status == tfe.RunPlannedAndFinished {
      log.Println("No changes to make")

      return nil
    }

    if run.Status == tfe.RunErrored {

      return fmt.Errorf("run failed: %s", run.Message)
    }

    if run.Status == tfe.RunPlanned {
      log.Println("Applying Plan")
      // Run the apply
      err := client.Runs.Apply(ctx, run.ID, tfe.RunApplyOptions{})
      if err != nil {

        return fmt.Errorf("unable to apply plan: %s", err)
      }
    }

    if run.Status == tfe.RunApplied {
      log.Println("Apply complete", err)

      return nil
    }

    time.Sleep(10 * time.Second)
  }
}

Let's now deploy this

We can now test it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment