- Write blog posts to describe how to use TC
- For developers and community:
- ! easy to install (dumb defaults that work right of the box)
- ! easy to deploy (include external services/ ingress/certbot) (FCP-50, FCP-51) (terraform)
- ! easy to bootstrap new repo integration: quickstart /sample configs
- ! simple steps to run it locally: docker-compose / kind / include documentation
- ! describe how to easily integrate with github: write taskcluster.yml / decisiongraph
- easy to use own tools: bring-your-own cluster/S3/db/rabbit
- simple workers to run payloads (locally / dev /stage) (FCP-54)
- Documentation:
- add/improve “How To” section in docs
- Make search work better (use some FTS)
- switch to mkdocs material design? https://squidfunk.github.io/mkdocs-material/
- how to use interactive tasks (taskcluster #3826)
- Integration / github:
- Validate .taskcluster.yml - online validator/linter
- For rel ops:
- !! provide easy to install defaults, config generator
- Deployment, testing & QA (avoid broken releases)
- !! custom ingress (taskcluster #3295, taskcluster #4913) or operator
- ! introduce proper CD with cloudbuild? (or other) to have latest env deployed instantly
- ! Run E2E tests against dev/staging and produce reports
- add db wipe command (taskcluster #2922)
- add command to generate migrations (taskcluster #2923)
- simplify service configuration process (taskcluster #3296)
- deploy custom images: (taskcluster #5041)
- ! introduce typescript (esm modules, types) (taskcluster #4028, taskcluster #4260)
- ! get rid of old/unsupported packages
- neutrino (taskcluster #4001)
- Review linting rules (FCP-52)
- decide where to go with graphql, is it better to switch to rest, what will the cost be
- review security alerts and avisories (FCP-12)
- Platform artifact integrity / CoT (FCP-23)
- Docker image vulnerabilities (FCP-33)
- Remove deprecated azure endpoints (tables) (FCP-93)
- migrate process.hrtime (taskcluster #5319)
- python client tests not running in ci (taskcluster #4968)
- support release branches (taskcluster #3469)
- !! Visualise and reprioritize tasks in queues taskcluster/taskcluster-rfcs#172 (FCP-16, taskcluster #2939)
- Show number of pending tasks taskcluster #5849
- Queue monitoring - is there something to improve for sheriffs/relops (FCP-35)
- Cancel task groups & tasks (taskcluster #3652)
- Make decision-task-generated task depend on a breakpoint task (bugzilla#1373013)
- ! Save filters (persist in query string/ storage) (taskcluster #5313)
- Responsive tables (taskcluster #5379)
- Auth
- scopes layout improvements (taskcluster #1539)
- scopes&roles easier to understand: tree-like view, dependencies: who depends on this role, who assumes it, etc: expand/collapse all, jump to/filter
- prevent creation of roles with empty scopes (taskcluster #5065)
- missing navigation: breadcrumbs, back button
- Tasks view:
- !! display Logs right away (something similar to https://travis-ci.org/github/taskcluster/generic-worker/jobs/634409403)
- ! Track timings: Display task timing information (taskcluster #222, taskcluster #1305, taskcluster #1955)
- organise information better (Can we use UX/UI experts inside Mozilla?)
- Add sibling task selector to switch quickly between tasks
- Add hierarchy view for task dependencies (1604234) (libs; sigma.js elkjs, elj-react)
- ! task visibility: (customer request: sheriffs)
- ! we have 600 win tests running, what are they?
- ! what’s running and what’s waiting
- kill tasks manually
- dependency chain visualised - who depends on who
- all tasks that are xx days old
- Data tables: add server-side search
- Worker-manager
- Display total capacity from all pools, graphs to represent running/existing/pending
- group by provider, add filtering capability
- hide pools with zero tasks
- filter by owner
- display available worker providers (taskcluster #1586)
- Workers
- One-click button to quarantine/unquarantine workers (taskcluster #5214)
- show pending tasks (taskcluster #2939)
- ! build generic containers with pre-configured workers that are easy to run and connect to the instance
- ! self-updating worker binaries with worker-runner (taskcluster #3059)
- ! Change registerWorker schema to include worker version, to skip manual querying (taskcluster #2982, taskcluster #5306)
- deprecate docker-worker (FCP-17, taskcluster #5321)
- ! health metrics: (customer request: cloudops) (FCP-9)
- (inspect last tasks completed by worker, give a percentage health, also alert if too many exceptions, we need to avoid bad workers from failing a ton of jobs )
- https://github.com/mozilla-platform-ops/android-tools/tree/master/worker_health in fitness.py script
- Per project/repo/team billing! How much resources are spent by category (FCP-24)
- Support Win11 (FCP-26)
- Support macOS Monterrey (FCP-27)
- Generic worker imageset for Ubuntu 20 (FCP-86)
- Generic worker decision task vs taskgraph (taskcluster #2915)
- Turn off Stateless DNS (bugzilla 1547358)
- use S3 lifecycle policy when possible (taskcluster #3949)
- Object service: keep or remove? (FCP-22)
- ! enable visibility on who and how made changes (related taskcluster #4343)
- add timestamps for updated_at, created_at
- add client information: created_by, updated_by
- could possibly be a separate entity to track history of changes
- ! show history of changes for secrets (taskcluster #5438)
- how to understand that something gets stuck before human notices it?
- ! Azure limits per location with worker-manager (taskcluster #4938)
- ! Focus: reduce idling time & cloud costs:
- ! estimator to include historical data & run times (taskcluster #3377)
- ! background scans to detect orphaned/dangling workers: (taskcluster #3378, taskcluster #3379, taskcluster #3380)
- ! Implement serverless for quick tasks: (taskcluster #4580)
- improve provisioner estimator (taskcluster #3061)
- choose regions/specs based on costs (taskcluster #3063)
- Kubernetes workers? worker manager spins new kubernetes pods
- Azure reports incorrect amount of workers: https://bugzilla.mozilla.org/show_bug.cgi?id=1723789
- find instances that are in requested state but provisioned long time ago == stuck
- combine similar errors for pools (taskcluster #3064)
- Azure workers not being stopped on time https://bugzilla.mozilla.org/show_bug.cgi?id=1779815
- ! Rerun related:
- task rerun doesn’t update github status taskcluster/taskcluster#5085 (!!)
- rerun intermittent pr checks (taskcluster #4950)
- taskcluster doesn’t report back status of rerun (taskcluster #5437)
- when reruning tasks dependencies are ignored (taskcluster #5442)
- status not updated without page refresh (taskcluster #5086, taskcluster #3780, taskcluster #5046))
- ! UI: Rerun: find out scopes needed: (taskcluster #5300)
- ! Skip tasks if commit message includes “ci skip” (taskcluster #5311)
- ! Embed last lines of logs in the output if task fails (taskcluster #5217)
- ! restore caches in runs (taskcluster #2916)
- ! log links expires: (taskcluster #5033)
- Run tasks from github comment (FCP-18) (taskcluster #40)
- Different roles for releases (taskcluster #5518)
- drop support for tc-github.v0 (taskcluster #4098)
- Evaluate Gitlab integration (FCP-19)
- Index service (is it vcs-agnostic commit history?)
- timeline of tasks
- search by attributes: repo/owner/branch/etc
- relate similar tasks (i.e. specific service test for different task groups)
- cancel running tasks on new push (in PR)
- Run manually taskclsuter github for any given repo:
taskcluster run-github https://github.com/some/repo/.taskcluster.yml
? - specify replyTo (taskcluster #1233)
- better all-completed or failed resolutions (taskcluster #3244, taskcluster #3684)
- change status of checks to in_progress (taskcluster #5165)
- Improved badges (taskcluster #5521)
- watch command (taskcluster #5207)
- Engage more internal services/proejcts to use it, avoid fragmentation (FCP-20)
- Take ownership of taskgraph (FCP-21)
- Per-project billing ? (FCP-24)? Resources & time spent by particular project/owner/org/..
- cooperate with cloudops to replace constant polling scripts with optimal endpoints to return data they need
- design extension architecture (or workflows)
- deploy-service workflow (takes care about cluster/resources, adds deployment steps)
- taskgraph extension - generate tasks based on your needs dynamically
- language specific workflow - best practices for tests, builds, lints, etc. like “include tc-python” that does generic steps