Skip to content

Instantly share code, notes, and snippets.

@jbzdak
Last active February 20, 2019 12:05
Show Gist options
  • Save jbzdak/e228a328b10c89942cffa4d9ef817fd5 to your computer and use it in GitHub Desktop.
Save jbzdak/e228a328b10c89942cffa4d9ef817fd5 to your computer and use it in GitHub Desktop.
crawler.yml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: scraper-cron-uk
spec:
schedule: "0 3 * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
backoffLimit: 3
template:
spec:
restartPolicy: OnFailure
containers:
- name: scraper-job
image: eu.gcr.io/example-project/example-scraper
command: ['scrapy']
args: ['crawl_qualcheck', 'example_scraper', '-a', 'domain_to_scrape=example.org.uk']
env:
- name: CRAWLERA_CONCURRENCY
value: 50
- name: CRAWLERA_API_KEY
valueFrom:
secretKeyRef:
name: crawlera-api-key
key: apikey
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1024Mi
volumeMounts:
- mountPath: /tmp
name: tmp-volume
- name: service-account
mountPath: "/etc/service-account"
readOnly: true
volumes:
- name: tmp-volume
emptyDir: {}
- name: service-account
secret:
secretName: service-account
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment