nicholasjackson/readme.md

## readme.md

      
    Raw
  

              readme.md
            
          
    Security

Let's take a look at how the Service Mesh protects your applications,
We have an application deployed at present, by default the service mesh protects our application
Basic Example

Let's look at this example,
k apply -f ./security/basic.yaml 

This is the most basic example, it allows any connection to reach the api service
L7

We can also enforce intentions based on L7, let's add the rest of the basic examples so that
you can see things working
First the payments service
k apply -f ./security/payments.yaml 

Then the gRPC currency service
k apply -f ./security/currency.yaml 

How the security works

Let's examine how this works, I am going to grab a shell on a service mesh enabled pod
If I curl the local service, you can see that everything works as expected, the local service is making an outbound
call to the data plane this is redirected to the other end.
curl localhost:9090

If I try to make that call direct
curl payments.default.svc
it resolves correctly, however if I try to go to the pod it will fail
Let me just disable that rule in the control pane
you can see the request failing
and if we add it again, it is working
Granular security

Because we are using a software defined network we can actually be quite smart about what can access services
Consider this example, what if we only want to allow access to certain path, let's apply the following configuration and see what happens
k apply -f ./security/payments_deny.yaml
This also works with gRPC, let's see this in action, the currency service is a gRPC service so the RBAC looks like the following example
This allows the endpoint to be accessed but not to use gRPC curl as grpc curl is trying to use the reflection API
grpcurl -plaintext -d currency.ingress.shipyard.run:18443 FakeService.Handle 

To add this we can enable access to the ServerReflection
k apply -f ./security/currency_with_reflection.yaml

grpcurl -plaintext -d currency.ingress.shipyard.run:18443 FakeService.Handle 

Just to sanity check this, lets disable the handle method and only enable the reflection
grpcurl -plaintext -d currency.ingress.shipyard.run:18443 FakeService.Handle 

grpcurl -plaintext -d currency.ingress.shipyard.run:18443 list

Observability

By default when you configure a service with the service mesh will expose some default metrics based on the configured
service type. Let's take a look at a HTTP service.
http://localhost:8080/explore
Let's build a quick dashboard for the first service in our chain API.
Metrics are going to be specific to your service mesh but any that use Envoy, Kong, Consul, Istio, should produce metrics that look like this.
envoy_listener_http_downstream_rq_xx{local_cluster="api", envoy_http_conn_manager_prefix="public_listener"}

Let's create a new dashboard.
This metric shows the response that is sent by the service it is a histogram so we need to apply the rate function to it
rate(envoy_listener_http_downstream_rq_xx{local_cluster="api", envoy_http_conn_manager_prefix="public_listener"}[$__rate_interval])

We can also report the upstream calls from the API to the Payments service
rate(envoy_cluster_external_upstream_rq{consul_source_service="api", envoy_cluster_name="payments"}[$__rate_interval])

Let's add a few more metrics, we can report the duration of the requests and also show the duration for the service calls
histogram_quantile(0.5, rate(envoy_cluster_upstream_rq_time_bucket{consul_destination_service="api"}[$__rate_interval]))

Dynamic Charts

Now that we have the charts, let's see how they can be made generic
First we add a variable
envoy_cluster_upstream_rq_time_bucket

Then we add a Regex for the values we would like to extract
/consul_destination_service="([^"]*).*/

Then we can make the chart dynamic
rate(envoy_listener_http_downstream_rq_xx{local_cluster="$service", envoy_http_conn_manager_prefix="public_listener"}[$__rate_interval])

Deployment Reliability

Now we have some basic metrics for our service, let's look at some common problems with deployment reliability that often occurs and see
what we can do about them with service mesh.
First let's modify one of our payment service versions to fail intermittently
kubectl apply -f ./reliability/failing_v2.yaml

Adding retries

Let's see what we can do about this, because service mesh allows you to control a software defined network you can
apply patterns such as retries without needing to change the application code.
kubectl apply -f ./reliability/retry.yaml

What you can immediately see is that the 501 from the downstream requests have disappeared, you can also see the latency increase
the reason for the increase is due to the retry.
Now, since we have default metrics, let's see how we can add those retries to our chart
rate(envoy_cluster_retry_upstream_rq_xx{consul_source_service="$service", consul_destination_service="$upstream"}[$__rate_interval])

Isolating failing service

First let's remove that retry
k delete -f ./reliability/retry.yaml 

Now let's see how we can isolate the failing version of the service so we can test it
k apply -f ./reliability/isolate.yaml 

We can now see that only the v1 service is being hit as the errors have disappeared
How can we test this manually, what we can do is to use the mesh to route to this specific version
k apply -f ./reliability/isolate.yaml 

Next we add some specific routing to
k apply -f ./reliability/router.yaml 

We can select the individual endpoints
curl "payments.ingress.shipyard.run:18080/?version=1"

curl "payments.ingress.shipyard.run:18080/?version=1"

Splitting traffic

But what if you want to control the traffic splitting, for example you might want to do a canary deployment
k apply -f ./reliability/splitter.yaml 

Automated release

Let's put all of this together now and start to see how we can use these techniques to automate a deployment
First let's clean up
k delete -f ./reliability

We are using a release controller which will basically create the configuration that you have seen, tools like flagger / argo, release controller for consul
and other amazing open source tools.
First we create the release
k apply -f ./reliability/release.yaml 

Then we create an new deployment
k apply -f ./reliability/working_v2.yaml 

What if the deployment was broken tho, well first we can create our retry
k apply -f ./reliability/retry.yaml

Now let's apply our broken service
k apply -f ./reliability/failing_v2.yaml

You can see the traffic is being split but it is not raising errors to the end user