Skip to content

Instantly share code, notes, and snippets.

@aiwantaozi
Last active May 3, 2018 09:16
Show Gist options
  • Save aiwantaozi/9e55c083525d5c58b8f722b528076c89 to your computer and use it in GitHub Desktop.
Save aiwantaozi/9e55c083525d5c58b8f722b528076c89 to your computer and use it in GitHub Desktop.
Audit log solution

Phase 1

Features

  • support write audit log to file and ship to remote database - The audit log could be enabled on the global level, user could config audit log, remote database to ship audit log on the UI. Once user enable audit log and config the remote target, we will write audit log to file inside container and ship the log to the remote target. Audit log will include some metadata information in Phase 1.
    • local file - The audit log will generate inside the rancher server container.
    • remote database - we use fluentd to ship logs, user could config remote database endpoint in the UI, like mysql, elaticsearch, hadoop, kafka and so on.
  • simple audit log list, query in UI - if user config outside mysql as the audit log storage, user could list and have simple audit log query. The global level audit log will show all cluster audit log, and the cluster level will only show the related cluster audit log.

Storage

why not store in etcd

  • we may store several days audit log, too much log may let etcd have memory pressure and slow down the main function. Now k8s events store in ETCD, it enhances the logic to reduce the probability to make ETCD OOM, but we couldn't do the second enhance in the audit log.
    • TTL: remove the event outside the TTL time range
    • not record duplicated event: k8s add a field count to compress duplicated events, instead of write duplicated event.

storage we choose and reason

  • we save audit log to file inside the rancher server container, then could have some rolling logger config, and use fluentd to ship the log target user configured.
    • reason
      • we could use fluentd to collect and ship it to the different target user deploy.
      • use fluentd so we don't need to consider too much about the different target, deliver retry, error recover logic.

Log Collection

what info we will collect, how the log format look like?

  • Metadata - log request metadata but not request or response body.
    • cluster
    • requesting user
    • timestamp
    • resource
    • verb
    • request id
    • stage
    • requestURI
    • ip
    • group
    • version
    • action
    • project

collect ways

  • how to collect api request audit log
    • add a filter after auth filter, so we could get the user info.
      • rancher/server/server.go
    • could get request and respone from APIContext
      • rancher/norman/api/server.go
      • rancher/norman/types/server_types.go
  • how to collect kubectl audit log
    • kube config file from rancher UI
      • kube config from rancher UI will connect to the URL https://192.168.1.100:8443/k8s/clusters/cluster_id, for the path have prefix /k8s/clusters will also need auth in rancher server, and rancher server will dialer to the related cluster. Also, different user will see different kubectl file generate in UI, the field user are different, it represents the current login user.
    • user import cluster, and use the kube config file not generate from rancher, we could not audit if user use their own kube config file

implementation

configuration

  • audit log path
  • max size - is the maximum size in megabytes of the log file before it gets rotated. It defaults to 100 megabytes.
  • max age - is the maximum number of days to retain old log files based on the timestamp encoded in their filename.
  • max backups - is the maximum number of old log files to retain. The default is to retain all old log files (though max age may still cause them to get deleted.)
  • target type - mysql, elacticsearch, splunk, kafka, syslog, hadoop, mongodb
  • target endpoint
  • target secret

enable configuration

collection

  • after fluentd configuration is generated, then start fluend, get the metadata from rancher apiContext and write to audit log file

update configuration

  • if user updates log config, regenerate fluend config, send signal to update fluend, and lumberjack.

ship log

  • default the audit log is saved inside rancher server container, the fluentd will ship the audit log file to user target. so we could support multiple target mysql, elasticsearch, splunk, hadoop, mongoDB, S3, kafka AMQP and so on.

UI show

  • query from mysql, user config the mysql target, and we use mysql driver to connec and query the result and send to UI.
  • for global level, will show all the audit log
  • for cluster level, will use the cluser id to query
  • the log query API need to integrate with auth to make sure user only could access the audit log they have permission.

Open questions

  • Should we support enable k8s orginal audit log, now our metadata include all the metadata orginal k8s have, but we could not config policy now.
  • The k8s original audit log is per user cluster, if we support it, should we also support it in our tools - logging config.
  • Collect audit log and ship them is two question, whether we allow user only config enable audit log, but not config the target, since the log will loss, if in ha deployment, and rancher server flow to another host.

Referrence

Phase 2

Suggested Feature

  • webhook user could config webhook, and we send a request to the configured webhook server.
  • audit policy like k8s, we could support config audit policy, for different group or resource, we could config different policy, for example, you could get config for pods the audit log should include the request and response body, but for config map, it only includes metadata, and for secret log nothing.

Details

audit level

  • None - don’t log events that match this rule.
  • Metadata - log request metadata but not request or response body.
    • cluster
    • requesting user
    • timestamp
    • resource
    • verb
    • auditID
    • stage
    • requestURI
    • ip
    • group
    • version
  • Request - log event metadata and request body but not response body. This does not apply for non-resource requests.
  • RequestResponse - log event metadata, request and response bodies. This does not apply for non-resource requests.

audit policy

learn from k8s, we have a default and could let user config different policy they want in the UI

apiVersion: xxxx
kind: Policy
rules:
# Log pod changes at RequestResponse level
- level: RequestResponse
  resources:
  - group: ""
    # Resource "pods" doesn't match requests to any subresource of pods,
    # which is consistent with the RBAC policy.
    resources: ["pods"]
# Log "pods/log", "pods/status" at Metadata level
- level: Metadata
  resources:
  - group: ""
    resources: ["configmaps"]

webhook - will save the webhook config in a crd, and post the request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment