Certification Exam Guide
- future advances in data technology
- changes to business requirements
- awareness of current state and how to migrate the design to a future state
- data modeling
- tradeoffs
- distributed systems
- schema design
- future advances in data technology
- changes to business requirements
- awareness of current state and how to migrate the design to a future state
- data modeling
- tradeoffs
- system availability
- distributed systems
- schema design
- common sources of error (eg. removing selection bias)
- future advances in data technology
- changes to business requirements
- awareness of current state, how to migrate the design to the future state
- data modeling
- tradeoffs
- system availability
- distributed systems
- schema design
- capacity planning
- different types of architectures: message brokers, message queues, middleware, service-oriented
- data cleansing
- batch and streaming
- transformation
- acquire and import data
- testing and quality control
- connecting to new data sources
- provisioning resources
- monitoring pipelines
- adjusting pipelines
- testing and quality control
- data collection and labeling
- data visualization
- dimensionality reduction
- data cleaning/normalization
- defining success metrics
- feature selection/engineering
- algorithm selection
- debugging a model
- performance/cost optimization
- online/dynamic learning
- working with business users
- gathering business requirements
4.2 Optimizing data representations, data infrastructure performance and cost. Considerations include:
- resizing and scaling resources
- data cleansing, distributed systems
- high performance algorithms
- common sources of error (eg. removing selection bias)
- verification
- building and running test suites
- pipeline monitoring
5.2 Assessing, troubleshooting, and improving data representations and data processing infrastructure.
- planning (e.g. fault-tolerance)
- executing (e.g., rerunning failed jobs, performing retrospective re-analysis)
- stress testing data recovery plans and processes
- automation
- decision support
- data summarization, (e.g, translation up the chain, fidelity, trackability, integrity)
- Identify and Access Management (IAM)
- data security
- penetration testing
- Separation of Duties (SoD)
- security control
legislation (e.g., Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), etc.) audits