aiwantaozi/audit_log.md

## audit_log.md

      
    Raw
  

              audit_log.md
            
          
    Phase 1

Features


support write audit log to file and ship to remote database - The audit log could be enabled on the global level, user could config audit log, remote database to ship audit log on the UI. Once user enable audit log and config the remote target, we will write audit log to file inside container and ship the log to the remote target. Audit log will include some metadata information in Phase 1.

local file - The audit log will generate inside the rancher server container.
remote database - we use fluentd to ship logs, user could config remote database endpoint in the UI, like mysql, elaticsearch, hadoop, kafka and so on.


simple audit log list, query in UI - if user config outside mysql as the audit log storage, user could list and have simple audit log query. The global level audit log will show all cluster audit log, and the cluster level will only show the related cluster audit log.

Storage

why not store in etcd


we may store several days audit log, too much log may let etcd have memory pressure and slow down the main function. Now k8s events store in ETCD, it enhances the logic to reduce the probability to make ETCD OOM, but we couldn't do the second enhance in the audit log.

TTL: remove the event outside the TTL time range
not record duplicated event: k8s add a field count to compress duplicated events, instead of write duplicated event.


storage we choose and reason


we save audit log to file inside the rancher server container, then could have some rolling logger config, and use fluentd to ship the log target user configured.

reason

we could use fluentd to collect and ship it to the different target user deploy.
use fluentd so we don't need to consider too much about the different target, deliver retry, error recover logic.


Log Collection

what info we will collect, how the log format look like?


Metadata - log request metadata but not request or response body.

cluster
requesting user
timestamp
resource
verb
request id
stage
requestURI
ip
group
version
action
project


collect ways


how to collect api request audit log

add a filter after auth filter, so we could get the user info.

rancher/server/server.go


could get request and respone from APIContext

rancher/norman/api/server.go
rancher/norman/types/server_types.go


how to collect kubectl audit log

kube config file from rancher UI

kube config from rancher UI will connect to the URL https://192.168.1.100:8443/k8s/clusters/cluster_id, for the path have prefix /k8s/clusters will also need auth in rancher server, and rancher server will dialer to the related cluster. Also, different user will see different kubectl file generate in UI, the field user are different, it represents the current login user.


user import cluster, and use the kube config file not generate from rancher, we could not audit if user use their own kube config file


implementation

configuration


audit log path
max size - is the maximum size in megabytes of the log file before it gets rotated. It defaults to 100 megabytes.
max age - is the maximum number of days to retain old log files based on the timestamp encoded in their filename.
max backups - is the maximum number of old log files to retain.  The default is to retain all old log files (though max age may still cause them to get deleted.)
target type - mysql, elacticsearch, splunk, kafka, syslog, hadoop, mongodb
target endpoint
target secret

enable configuration


after user enabled audit log config, save the config to crd, generate fluentd config base on the user config. And send the signal to update log config option, reference https://github.com/natefinch/lumberjack/tree/v2.1

collection


after fluentd configuration is generated, then start fluend, get the metadata from rancher apiContext and write to audit log file

update configuration


if user updates log config, regenerate fluend config, send signal to update fluend, and lumberjack.

ship log


default the audit log is saved inside rancher server container, the fluentd will ship the audit log file to user target. so we could support multiple target mysql, elasticsearch, splunk, hadoop, mongoDB, S3, kafka AMQP and so on.

mysql output: https://github.com/tagomoris/fluent-plugin-mysql


UI show


query from mysql, user config the mysql target, and we use mysql driver to connec and query the result and send to UI.

may use this mysql interface https://github.com/golang/go/wiki/SQLInterface
support driver
https://github.com/golang/go/wiki/SQLDrivers


for global level, will show all the audit log
for cluster level, will use the cluser id to query
the log query API need to integrate with auth to make sure user only could access the audit log they have permission.

Open questions


Should we support enable k8s orginal audit log, now our metadata include all the metadata orginal k8s have, but we could not config policy now.
The k8s original audit log is per user cluster, if we support it, should we also support it in our tools - logging config.
Collect audit log and ship them is two question, whether we allow user only config enable audit log, but not config the target, since the log will loss, if in ha deployment, and rancher server flow to another host.

Referrence


https://github.com/eBay/Kubernetes/blob/master/docs/design/event_compression.md
https://kubernetes.io/docs/tasks/debug-application-cluster/audit/

Phase 2

Suggested Feature


webhook user could config webhook, and we send a request to the configured webhook server.
audit policy like k8s, we could support config audit policy, for different group or resource, we could config different policy, for example, you could get config for pods the audit log should include the request and response body, but for config map, it only includes metadata, and for secret log nothing.

Details

audit level


None - don’t log events that match this rule.
Metadata - log request metadata but not request or response body.

cluster
requesting user
timestamp
resource
verb
auditID
stage
requestURI
ip
group
version


Request - log event metadata and request body but not response body. This does not apply for non-resource requests.
RequestResponse - log event metadata, request and response bodies. This does not apply for non-resource requests.

audit policy

learn from k8s, we have a default and could let user config different policy they want in the UI
apiVersion: xxxx
kind: Policy
rules:
# Log pod changes at RequestResponse level
- level: RequestResponse
  resources:
  - group: ""
    # Resource "pods" doesn't match requests to any subresource of pods,
    # which is consistent with the RBAC policy.
    resources: ["pods"]
# Log "pods/log", "pods/status" at Metadata level
- level: Metadata
  resources:
  - group: ""
    resources: ["configmaps"]

webhook - will save the webhook config in a crd, and post the request.