cirocosta/README.md

## README.md

      
    Raw
  

              README.md
            
          
    trying it out

There are two ways of trying this out:
local build

As long as you have access to a cluster, you can get Concourse to run workloads
there.
However, you need to get Concourse running.
You can do so using the Makefile included in here though:
# run postgres in docker
#
make db


# start a kubernetes cluster using `kind` with a modified CRI (because ..
# containerd) that whitelists `http`-based registries.
#
make cluster


# build, install, and run Concourse (make sure you run `yarn build` 
# first to build the UI assets)
#
make run
That done, you can now access Concourse on http://localhost:8080.
using the helm chart

I've uploaded an image (cirocosta/concourse:k8s-iter-1) to DockerHub for those
wanting to try out the "in-cluster" experience, and repurposed "the Helm chart of
today" to support the "runtime of tomorrow".
# clone the branch of the chart where it includes the necessary RBAC configs for
# setting up a service account that's powerful enough
#
git clone \
        --branch k8s-iter-1 \
        https://github.com/concourse/concourse-chart


# clone the branch of this PR
#
git clone \
        --branch k8s-iter-1 \
        https://github.com/concourse/concourse


# get the `init` configmap into the cluster so that we can hold containers alive
# until processes are meant to be executed in them.
#
NAMESPACE=default make -C ./concourse init


# install the chart / render the templates with the sample values under
# `hack/k8s/values.yaml` from this PR's branch
#
helm dependency update ./concourse-chart
helm upgrade \
        --install \
        --values=./concourse/hack/k8s/values.yaml \
        test \
        ./concourse-chart
with the chart installed, you can now port-forward ATC (http://localhost:8080)
and get going.
ps.: your cluster must have support for pulling images from http-based
registries served as pods. you can have a local cluster running make -C concourse cluster if you'd like to use kind).
how it works

The execution of each step follows the same pattern as we're accustomed with
when using Garden, where we set the sandbox up, and then run the desired process
inside it ( db.Creating() -> container.Create() -> db.Created()... ->
container.Run())

Because a build plan can be seen as a directed acyclic graph when it comes to
dependencies, we can rely on that fact to dictate how each step either gathers
inputs, our supplies outputs.
For each step, as long as its dependencies (including transitive ones) were able
to fulfill what they should (e.g., a get successfully running /opt/resource/in),
it'll be able to retrieve any artifact that it might need.

When running a step, atc communicates with Kubernetes to create pods that
represent those steps.
Each step pod can be seen as a potential permutation of two configurations:

i. having an "inputs fetcher" init container, responsible for fetching
artifacts retrieves by dependnecies
ii.  having an "output streamer" sidecar container, responsible for providing
artifacts to those who might depend on this step


For instance, task in the example above would look like:

having an "input fetcher" capable of retrieving data from repository and
image
having an "output streamer" capable of streaming the artifacts it produces to
the "bucket" step ("bucket" would pull from the output streamer endpoint)

so that in the end, we have this form of "peer-to-peer" communication between
the pods themselves:

(arrow direction indicating "depends on")
Once done with the build, the regular internal Concourse container lifecycle
would take care of moving the pods in our DB for CREATED state to
DESTROYING, which the kubernetes implementation of a worker would then notice
the desire to not having certain pods, and then proceed with deleting them.

concerns about the current design

The current approach presented here is very focused towards not changing much
of the current constructs in our codebase as is of today.
Despite demonstrating that it is possible to run Concourse on Kubernetes, the
current design might raise some eyebrows.
insecure registry whitelisting

Given that the container runtimes that kubelets communicate with need to trust
the registries that they interact with, we have to rely on whitelisting the internal
pod domain as trusted insecure registries.
Interestingly, this is already done by default on GKE.
next iterations

To work around this, we can explore the avenue of having the images being pushed
to a central registry in the cluster that's trusted by the kubelets.
As long as we make the process of getting "input in" and "outputs out" pluggable
enough, we could have either/or.
execing via apiserver

It's very convenient to be able to just exec (or attach to) a process via
apiserver's exec endpoints, making the current Concourse's container
lifecycle and process execution work very with pretty much no changes needed in
our code flow.
It might be that this does not scale though, with apiserver being hit so hard
with a system like concourse.
There can also be concerns about keeping that connection for a long time (e.g.,
steps that take a long time to finish their main executions), or the sheer size
of throughput necessary (steps that log tons to stderr).
next iterations

If we don't want to modify the imperative nature of ATC making the requests to
execute, we could have some form of "shim" that'd be mounted to every main
container in step pods, which would then be reached out to by ATC, performing
the task of execin processes there, and dealing with the log streams /
re-attaching.
syncing handles by fetching all concourse pods

Having a pod per Concourse container will necessarily mean that we'd have a pod
for each resource scope id (assuming no use of ephemeral check containers).
That means that an installation with 5000 of those would be fetching 5000 pods
on every worker tick.
next iterations

This could be improved by leveraging the same mechanisms that controllers do -
perhaps even making this a controller itself that gets informed on changes to
pods that match the label that we have?

  
## execution.svg

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              execution.svg
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## execution.uml
@startuml
autoactivate on

participant Concourse
participant Database
participant Kubernetes
participant CheckStep

... "sandbox" setup ...


Concourse -> Database : container CREATING
Database --> Concourse : ok

Concourse -> Kubernetes : CheckStep Pod Definition
Kubernetes --> Concourse:  Ok

loop until ready or timeout
  Concourse -> Kubernetes : Pod status
  Kubernetes --> Concourse : Ok
end

Concourse -> Database : container CREATED
Database --> Concourse : ok


... process execution ...


Concourse -> Kubernetes : exec /opt/resource/check
  Kubernetes -> CheckStep : stream
  CheckStep --> Kubernetes : stream
Kubernetes --> Concourse : resource versions
@enduml

## plan.svg

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              plan.svg
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## plan.uml
@startuml

object repository
repository : type = "get"

object image
image : type = "get"

object task
task : type = "task"

object bucket
bucket : type = "put"

task <|-- repository
task <|-- image
bucket <|-- task

@enduml

## pod-chain.svg

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              pod-chain.svg
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## pod-chain.uml
@startuml
title dependency graph

node "repository" {
	package "repository containers" {
		[repository output streamer]
		[repository main container]
	}
}

node "image" {
	package "image containers" {
		[image output streamer]
		[image main container]
	}
}

node "task" {
	package "task init containers" {
		[task input fetcher] ..> [repository output streamer]
		[task input fetcher] ..> [image output streamer]
	}

	package "task containers" {
		[task main container]
		[task output streamer]
	}
}

node "bucket" {
	package "bucket init containers" {
		[bucket input fetcher] ..> [task output streamer]
	}

	package "bucket containers" {
		[bucket main container]
	}
}
@enduml

## step-pod.svg

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              step-pod.svg
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## step-pod.uml
@startuml
node "step pod" {
	package "init containers" {
		[input fetcher]
	}

	package "containers" {
		[output streamer]
		[main container]
	}

	database "volume" {
		[input fetcher] ..> input
		output ..> [output streamer]
		input ..> [main container]
		[main container] ..> output
	}
}
@enduml

## worker-pod-gc.svg

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              worker-pod-gc.svg
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## worker-pod-gc.uml
@startuml
title "worker's pod reporting & gc"

control WorkerTicker

box "web node" #LightBlue
	participant Worker
	participant Web
end box

participant Kubernetes
participant Db

autoactivate on

...

WorkerTicker -> Worker : tick

	Worker -> Kubernetes : list pods
	note left
          retrieve "current state"
	end note
	Kubernetes --> Worker : pods

	Worker -> Db : update containers state
	Db --> Worker: ok

	Worker -> Db : find containers in DESTROYin state (desired state)
	note left
	  retrieve "desired state"
	end note
	Db --> Worker: list

	Worker -> Kubernetes : destroy pods from list (converge)
	note left
	  converge
	end note
	Kubernetes --> Worker : ok

Worker --> WorkerTicker : Done
@enduml
	@startuml
	autoactivate on

	participant Concourse
	participant Database
	participant Kubernetes
	participant CheckStep

	... "sandbox" setup ...


	Concourse -> Database : container CREATING
	Database --> Concourse : ok

	Concourse -> Kubernetes : CheckStep Pod Definition
	Kubernetes --> Concourse: Ok

	loop until ready or timeout
	Concourse -> Kubernetes : Pod status
	Kubernetes --> Concourse : Ok
	end

	Concourse -> Database : container CREATED
	Database --> Concourse : ok


	... process execution ...



	Concourse -> Kubernetes : exec /opt/resource/check
	Kubernetes -> CheckStep : stream
	CheckStep --> Kubernetes : stream
	Kubernetes --> Concourse : resource versions
	@enduml
	@startuml

	object repository
	repository : type = "get"

	object image
	image : type = "get"

	object task
	task : type = "task"

	object bucket
	bucket : type = "put"

	task <\|-- repository
	task <\|-- image
	bucket <\|-- task

	@enduml
	@startuml
	title dependency graph

	node "repository" {
	package "repository containers" {
	[repository output streamer]
	[repository main container]
	}
	}

	node "image" {
	package "image containers" {
	[image output streamer]
	[image main container]
	}
	}

	node "task" {
	package "task init containers" {
	[task input fetcher] ..> [repository output streamer]
	[task input fetcher] ..> [image output streamer]
	}

	package "task containers" {
	[task main container]
	[task output streamer]
	}
	}

	node "bucket" {
	package "bucket init containers" {
	[bucket input fetcher] ..> [task output streamer]
	}

	package "bucket containers" {
	[bucket main container]
	}
	}
	@enduml
	@startuml
	node "step pod" {
	package "init containers" {
	[input fetcher]
	}

	package "containers" {
	[output streamer]
	[main container]
	}

	database "volume" {
	[input fetcher] ..> input
	output ..> [output streamer]
	input ..> [main container]
	[main container] ..> output
	}
	}
	@enduml
	@startuml
	title "worker's pod reporting & gc"

	control WorkerTicker

	box "web node" #LightBlue
	participant Worker
	participant Web
	end box

	participant Kubernetes
	participant Db

	autoactivate on

	...

	WorkerTicker -> Worker : tick

	Worker -> Kubernetes : list pods
	note left
	retrieve "current state"
	end note
	Kubernetes --> Worker : pods

	Worker -> Db : update containers state
	Db --> Worker: ok

	Worker -> Db : find containers in DESTROYin state (desired state)
	note left
	retrieve "desired state"
	end note
	Db --> Worker: list

	Worker -> Kubernetes : destroy pods from list (converge)
	note left
	converge
	end note
	Kubernetes --> Worker : ok

	Worker --> WorkerTicker : Done
	@enduml