Skip to content

Instantly share code, notes, and snippets.

@cirocosta
Last active March 5, 2020 17:14
Show Gist options
  • Save cirocosta/5c073561b1f43a2d5d560a6d8df8fa10 to your computer and use it in GitHub Desktop.
Save cirocosta/5c073561b1f43a2d5d560a6d8df8fa10 to your computer and use it in GitHub Desktop.
note

trying it out

There are two ways of trying this out:

local build

As long as you have access to a cluster, you can get Concourse to run workloads there.

However, you need to get Concourse running.

You can do so using the Makefile included in here though:

# run postgres in docker
#
make db


# start a kubernetes cluster using `kind` with a modified CRI (because ..
# containerd) that whitelists `http`-based registries.
#
make cluster


# build, install, and run Concourse (make sure you run `yarn build` 
# first to build the UI assets)
#
make run

That done, you can now access Concourse on http://localhost:8080.

using the helm chart

I've uploaded an image (cirocosta/concourse:k8s-iter-1) to DockerHub for those wanting to try out the "in-cluster" experience, and repurposed "the Helm chart of today" to support the "runtime of tomorrow".

# clone the branch of the chart where it includes the necessary RBAC configs for
# setting up a service account that's powerful enough
#
git clone \
        --branch k8s-iter-1 \
        https://github.com/concourse/concourse-chart


# clone the branch of this PR
#
git clone \
        --branch k8s-iter-1 \
        https://github.com/concourse/concourse


# get the `init` configmap into the cluster so that we can hold containers alive
# until processes are meant to be executed in them.
#
NAMESPACE=default make -C ./concourse init


# install the chart / render the templates with the sample values under
# `hack/k8s/values.yaml` from this PR's branch
#
helm dependency update ./concourse-chart
helm upgrade \
        --install \
        --values=./concourse/hack/k8s/values.yaml \
        test \
        ./concourse-chart

with the chart installed, you can now port-forward ATC (http://localhost:8080) and get going.

ps.: your cluster must have support for pulling images from http-based registries served as pods. you can have a local cluster running make -C concourse cluster if you'd like to use kind).

how it works

The execution of each step follows the same pattern as we're accustomed with when using Garden, where we set the sandbox up, and then run the desired process inside it ( db.Creating() -> container.Create() -> db.Created()... -> container.Run())

Because a build plan can be seen as a directed acyclic graph when it comes to dependencies, we can rely on that fact to dictate how each step either gathers inputs, our supplies outputs.

For each step, as long as its dependencies (including transitive ones) were able to fulfill what they should (e.g., a get successfully running /opt/resource/in), it'll be able to retrieve any artifact that it might need.

When running a step, atc communicates with Kubernetes to create pods that represent those steps.

Each step pod can be seen as a potential permutation of two configurations:

  • i. having an "inputs fetcher" init container, responsible for fetching artifacts retrieves by dependnecies
  • ii. having an "output streamer" sidecar container, responsible for providing artifacts to those who might depend on this step

For instance, task in the example above would look like:

  • having an "input fetcher" capable of retrieving data from repository and image
  • having an "output streamer" capable of streaming the artifacts it produces to the "bucket" step ("bucket" would pull from the output streamer endpoint)

so that in the end, we have this form of "peer-to-peer" communication between the pods themselves:

(arrow direction indicating "depends on")

Once done with the build, the regular internal Concourse container lifecycle would take care of moving the pods in our DB for CREATED state to DESTROYING, which the kubernetes implementation of a worker would then notice the desire to not having certain pods, and then proceed with deleting them.

concerns about the current design

The current approach presented here is very focused towards not changing much of the current constructs in our codebase as is of today.

Despite demonstrating that it is possible to run Concourse on Kubernetes, the current design might raise some eyebrows.

insecure registry whitelisting

Given that the container runtimes that kubelets communicate with need to trust the registries that they interact with, we have to rely on whitelisting the internal pod domain as trusted insecure registries.

Interestingly, this is already done by default on GKE.

next iterations

To work around this, we can explore the avenue of having the images being pushed to a central registry in the cluster that's trusted by the kubelets.

As long as we make the process of getting "input in" and "outputs out" pluggable enough, we could have either/or.

execing via apiserver

It's very convenient to be able to just exec (or attach to) a process via apiserver's exec endpoints, making the current Concourse's container lifecycle and process execution work very with pretty much no changes needed in our code flow.

It might be that this does not scale though, with apiserver being hit so hard with a system like concourse.

There can also be concerns about keeping that connection for a long time (e.g., steps that take a long time to finish their main executions), or the sheer size of throughput necessary (steps that log tons to stderr).

next iterations

If we don't want to modify the imperative nature of ATC making the requests to execute, we could have some form of "shim" that'd be mounted to every main container in step pods, which would then be reached out to by ATC, performing the task of execin processes there, and dealing with the log streams / re-attaching.

syncing handles by fetching all concourse pods

Having a pod per Concourse container will necessarily mean that we'd have a pod for each resource scope id (assuming no use of ephemeral check containers).

That means that an installation with 5000 of those would be fetching 5000 pods on every worker tick.

next iterations

This could be improved by leveraging the same mechanisms that controllers do - perhaps even making this a controller itself that gets informed on changes to pods that match the label that we have?

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@startuml
autoactivate on
participant Concourse
participant Database
participant Kubernetes
participant CheckStep
... "sandbox" setup ...
Concourse -> Database : container CREATING
Database --> Concourse : ok
Concourse -> Kubernetes : CheckStep Pod Definition
Kubernetes --> Concourse: Ok
loop until ready or timeout
Concourse -> Kubernetes : Pod status
Kubernetes --> Concourse : Ok
end
Concourse -> Database : container CREATED
Database --> Concourse : ok
... process execution ...
Concourse -> Kubernetes : exec /opt/resource/check
Kubernetes -> CheckStep : stream
CheckStep --> Kubernetes : stream
Kubernetes --> Concourse : resource versions
@enduml
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@startuml
object repository
repository : type = "get"
object image
image : type = "get"
object task
task : type = "task"
object bucket
bucket : type = "put"
task <|-- repository
task <|-- image
bucket <|-- task
@enduml
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@startuml
title dependency graph
node "repository" {
package "repository containers" {
[repository output streamer]
[repository main container]
}
}
node "image" {
package "image containers" {
[image output streamer]
[image main container]
}
}
node "task" {
package "task init containers" {
[task input fetcher] ..> [repository output streamer]
[task input fetcher] ..> [image output streamer]
}
package "task containers" {
[task main container]
[task output streamer]
}
}
node "bucket" {
package "bucket init containers" {
[bucket input fetcher] ..> [task output streamer]
}
package "bucket containers" {
[bucket main container]
}
}
@enduml
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@startuml
node "step pod" {
package "init containers" {
[input fetcher]
}
package "containers" {
[output streamer]
[main container]
}
database "volume" {
[input fetcher] ..> input
output ..> [output streamer]
input ..> [main container]
[main container] ..> output
}
}
@enduml
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@startuml
title "worker's pod reporting & gc"
control WorkerTicker
box "web node" #LightBlue
participant Worker
participant Web
end box
participant Kubernetes
participant Db
autoactivate on
...
WorkerTicker -> Worker : tick
Worker -> Kubernetes : list pods
note left
retrieve "current state"
end note
Kubernetes --> Worker : pods
Worker -> Db : update containers state
Db --> Worker: ok
Worker -> Db : find containers in DESTROYin state (desired state)
note left
retrieve "desired state"
end note
Db --> Worker: list
Worker -> Kubernetes : destroy pods from list (converge)
note left
converge
end note
Kubernetes --> Worker : ok
Worker --> WorkerTicker : Done
@enduml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment