Skip to content

Instantly share code, notes, and snippets.

@kmott
Last active November 10, 2023 18:41
Show Gist options
  • Save kmott/57f9943f155d96724849bc0e80b3ca6d to your computer and use it in GitHub Desktop.
Save kmott/57f9943f155d96724849bc0e80b3ca6d to your computer and use it in GitHub Desktop.
Migrating from PX-Developer to PX-Essentials or PX-Enterprise on HashiCorp Nomad

Summary

These are the steps I took to migrate an existing pxd volume from PX-Developer to PX-Essentials / PX-Enterprise running in Nomad--I am not responsible for any data loss you incur if you follow these steps!!

Prerequisites

If you are running PX-Developer using portworx/px-dev Docker container, you MUST stop that first before continuing:

  • nomad stop --purge storage

Additionally, it's probably safest to stop and remove any running jobs that will be using volumes managed by px-dev during the migration:

  • nomad stop --purge <job_1>
  • nomad stop --purge <job_2>
  • nomad stop --purge <job_N>

PortWorx CSI Job

The PortWorx CSI job is pretty straightforward--the big thing is to make sure:

  • You point it at the same KVDB instance (-k) as PX-Developer
  • Use block devices (-s) that your previous installation was already using for PX-Developer
  • You use the same version of oci-monitor as px-dev
  • Cluster name (-c) is the same as previous px-dev instance

portworx.csi.hcl

job "portworx" {
  datacenters = ["localdev"]
  type        = "system"

  update {
    max_parallel = 1
    min_healthy_time = "2m"
    healthy_deadline = "10m"
    progress_deadline = "15m"
    auto_revert = false
    canary = 0
  }

  group "portworx" {
    count = 1

    restart {
      attempts = 30
      interval = "30m"
      delay    = "1m"
      mode     = "delay"
    }

    ephemeral_disk {
      size = 128
    }

    network {
      mode = "host"

      // For OpenStorage API SDK
      port "portworx-api" {
        static = 9021
        to = 9021
      }

      // For Prometheus Exporter Metrics
      port "portworx-metrics" {
        static = 9001
        to = 9001
      }

      port "portworx" {
        static = 9015
        to = 9015
      }
    }

    task "px-node" {
      driver = "docker"
      kill_timeout = "120s"   # allow portworx 2 min to gracefully shut down
      kill_signal = "SIGTERM" # use SIGTERM to shut down the nodes

      # setup environment variables for px-nodes
      env {
        AUTO_NODE_RECOVERY_TIMEOUT_IN_SECS = "1500"
        PX_TEMPLATE_VERSION                = "V4"
        CSI_ENDPOINT                       = "unix://var/lib/csi/csi.sock"
      }

      # CSI Driver config
      csi_plugin {
        id                     = "portworx"
        type                   = "monolith"
        mount_dir              = "/var/lib/csi"
        health_timeout         = "30m"                  # Nomad 1.3.2 and later only
        stage_publish_base_dir = "/var/lib/csi/publish" # Nomad 1.3.4 and later only
      }

      # container config
      config {
        image        = "portworx/oci-monitor:2.13.10"  # Should match the version PX-Developer is running
        ipc_mode     = "host"
        privileged   = true

        # configure your parameters below
        # do not remove the last parameter (needed for health check)
        args = [
          "-c", "localdev",                     # Same as px-dev deployment
          "-k", "consul://127.0.0.1:8500",      # Same as px-dev deployment
          "-s", "/dev/sdb",                     # Same as px-dev deployment
          "--endpoint", "0.0.0.0:9015"
        ]

        volumes = [
          "/run/docker/plugins:/run/docker/plugins",
          "/var/cores:/var/cores",
          "/var/run/docker.sock:/var/run/docker.sock",
          "/run/containerd:/run/containerd",
          "/etc/pwx:/etc/pwx",
          "/opt/pwx:/opt/pwx",
          "/proc:/host_proc",
          "/etc/systemd/system:/etc/systemd/system",
          "/var/run/log:/var/run/log",
          "/var/log:/var/log",
          "/var/run/dbus:/var/run/dbus",
          "/usr/src:/usr/src",
        ]

        network_mode = "host"
        ports = ["portworx", "portworx-api", "portworx-metrics"]
      }

      # resource config
      resources {
        cpu    = 1024
        memory = 2048
      }

      service {
        name = "storage-metrics"
        tags = [
          "portworx",
          "storage",
          "cluster"
        ]

        port = "portworx-metrics"

        check {
          type     = "script"
          command = "/opt/pwx/bin/pxctl"
          args = ["status"]
          interval = "10s"
          timeout = "3s"
        }
      }

      service {
        name = "storage"
        tags = [
          "pwx",
          "portworx",
          "storage",
          "api",
        ]

        port = "portworx-api"

        check {
          name     = "check_portworx-api"
          type     = "tcp"
          port     = "portworx-api"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

Run CSI Job

Once you created the portworx.csi.hcl file, run the job:

nomad run --detach portworx.csi.hcl

It will take a little while to bootstrap and startup, but once it's done, you should see a list of your existing volumes, and nomad plugin status should show that the new CSI plugin is registered and available:

nomad status

# Review nomad plugin status
root@localdev:/vagrant/localdev# nomad plugin status
Container Storage Interface
ID        Provider          Controllers Healthy/Expected  Nodes Healthy/Expected
portworx  pxd.portworx.com  1/1                           1/1

# Review nomad volume status
root@localdev:/vagrant/localdev# nomad volume status
Container Storage Interface
No CSI volumes

pxctl volume status

root@localdev:/vagrant/localdev# pxctl volume list | grep 'redis\|NAME'
ID                      NAME                    SIZE    HA      SHARED  ENCRYPTED       PROXY-VOLUME    IO_PRIORITY     STATUS                          SNAP-ENABLED    
1107963972586667525     redis-cache             1 GiB   1       no      no              no              MEDIUM          up - attached on 10.0.2.15      no

NOTE: These are NOT yet CSI volumes, we will migrate them later

References

Summary

Volume migration for a job that previously ran PX-Developer and is now running PX-Essentials or PX-Enterprise should be done with a job that is not currently running.

For this example, I am going to use a simple REDIS container to migrate its /data volume to CSI volume in Nomad.

NOTE: It is important that the 'name' value matches the 'name' column in the pxctl volume status output and that the 'id' value matches the 'id' column from pxctl volume status

root@localdev:/vagrant/localdev# pxctl volume list | grep 'redis\|NAME'
ID                      NAME                    SIZE    HA      SHARED  ENCRYPTED       PROXY-VOLUME    IO_PRIORITY     STATUS                          SNAP-ENABLED    
1107963972586667525     redis-cache             1 GiB   1       no      no              no              MEDIUM          up - attached on 10.0.2.15      no

Prerequisites

Before we do anything, assuming we have data in our redis container at /data, lets verify that's the case:

# Exec in to our redis container's alloc
root@localdev:/vagrant/localdev# nomad exec -i 94f5c /bin/bash
root@79ca3843e4f6:/data#

# Create a file we want to make sure exists post-migration
root@79ca3843e4f6:/data# date --rfc-3339=ns > date.txt
root@79ca3843e4f6:/data# ls -lah
total 12K
drwxr-xr-x. 2 redis redis 4.0K Oct 19 02:08 .
drwxr-xr-x  1 root  root  4.0K Oct 19 02:04 ..
-rw-r--r--  1 root  root    36 Oct 19 02:08 date.txt

# Exit out of our container, and stop the redis job
root@localdev:/vagrant/localdev# nomad stop --purge redis

# pxctl volume status should show our redis vol as detached
root@localdev:/vagrant/localdev# pxctl volume list | grep 'redis\|NAME'
ID                      NAME                    SIZE    HA      SHARED  ENCRYPTED       PROXY-VOLUME    IO_PRIORITY     STATUS                          SNAP-ENABLED    
1054097411774934751     redis-cache             1 GiB   1       no      no              no              MEDIUM          up - detached                   no

redis.volume.hcl

This is the new CSI volume definition that we will register in Nomad using nomad volume create:

id           = "1107963972586667525"  # Should match 'id' col from 'pxctl volume status'
name         = "redis-cache"          # Should match 'name' col from 'pxctl volume status'
type         = "csi"
plugin_id    = "portworx"

capability {
  access_mode     = "single-node-reader-only"
  attachment_mode = "file-system"
}

capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type     = "ext4"
  mount_flags = ["noatime"]
}

Register Volume

Register the volume like this (NOTE: The output MUST match the 'id' field in pxctl volume list, otherwise it created a new volume, rather than registering an existing one):

# Create redis volume
root@localdev:/vagrant/localdev# nomad volume register redis.volume.hcl 
Created external volume 1107963972586667525 with ID 1107963972586667525

# View Nomad volume status
root@localdev:/vagrant/localdev# nomad volume status
Container Storage Interface
ID                   Name         Plugin ID  Schedulable  Access Mode
1107963972586667525  redis-cache  portworx   true         <none>

# View pxctl volume status
root@localdev:/vagrant/localdev# pxctl volume list | grep 'redis\|NAME'
ID                      NAME                    SIZE    HA      SHARED  ENCRYPTED       PROXY-VOLUME    IO_PRIORITY     STATUS                          SNAP-ENABLED    
1107963972586667525     redis-cache             1 GiB   1       no      no              no              MEDIUM          up - detached                   no

Everything looks great, so we should be able to update our Nomad Job now.

Update Nomad Job

Update your nomad job HCL (an example is below).

  • Remove the 'mounts' field from job.task.group.confg
  • Add new value for job.task.group.volume
  • Add new value for job.task.group.volume_mount
job "redis" {
  datacenters = ["localdev"]

  group "cache" {
    count = 1

    # ADD For CSI Volume of this task (see redis.volume.hcl)
    volume "redis-cache" {
      type            = "csi"
      source          = "1107963972586667525"
      attachment_mode = "file-system"
      access_mode     = "single-node-writer"
    }

    network {
      port "db" {
        to = 6379
      }
    }

    task "redis" {
      driver = "docker"

      config {
        image = "redis:7"
#        mounts = [                       # REMOVE START
#          {
#            type = "volume"
#            target = "/test"
#            source = "redis-cache"
#            readonly = false
#            volume_options = [{
#              driver_config = {
#                name = "pxd"
#                options = [{
#                  name = "redis-cache"
#                  size = "1"
#                  repl = "1"
#                  io_priority = "medium"
#                  io_profile = "auto"
#                }]
#              }
#            }]
#          }
#        ]                                # REMOVE END

        ports = ["db"]
      }

      resources {
        cpu    = 500
        memory = 256
      }

      # ADD For CSI Volume mount of this task (see redis.volume.hcl)
      volume_mount {
        volume      = "redis-cache"
        destination = "/data"
      }
    }
  }
}

Run Nomad JOb

You can now re-run your Nomad job, and it shoudl automatically use our newly registered redis-cache volume:

root@localdev:/vagrant/localdev# nomad run --detach redis.hcl
Job registration successful
Evaluation ID: b98e4f3a-c3c2-5502-6c93-d8f3962c66e4

Summary

We now have our new Nomad CSI plugin via PortWorx installed and running, registered an pre-existing volume with it, and our redis job should be running again and attached to the CSI volume. Let's verify.

Nomad Volume Status

root@localdev:/vagrant/localdev# nomad volume status
Container Storage Interface
ID                   Name         Plugin ID  Schedulable  Access Mode
1107963972586667525  redis-cache  portworx   true         single-node-writer

pxctl Status

root@localdev:/vagrant/localdev# pxctl volume list | grep 'redis\|NAME'
ID                      NAME                    SIZE    HA      SHARED  ENCRYPTED       PROXY-VOLUME    IO_PRIORITY     STATUS                          SNAP-ENABLED    
1107963972586667525     redis-cache             1 GiB   1       no      no              no              MEDIUM          up - attached on 10.0.2.15      no

Data Verification

Exec into our redis alloc again, and check that we have the same file and data as before:

root@localdev:/vagrant/localdev# nomad exec -i 6e35d /bin/bash
root@fe4125921fa2:/data# ls -lah
total 16K
drwxr-xr-x. 2 redis redis 4.0K Oct 19 02:08 .
drwxr-xr-x  1 root  root  4.0K Oct 19 02:12 ..
-rw-r--r--  1 redis root    36 Oct 19 02:08 date.txt
-rw-------  1 redis redis   88 Oct 19 02:08 dump.rdb
root@fe4125921fa2:/data# cat date.txt 
2023-10-19 02:08:40.860611956+00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment