Skip to content

Instantly share code, notes, and snippets.

@dmussaku
Last active March 26, 2024 16:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dmussaku/59157003140b376af6b5597a177ee1f8 to your computer and use it in GitHub Desktop.
Save dmussaku/59157003140b376af6b5597a177ee1f8 to your computer and use it in GitHub Desktop.
Assignment: Building a Data Pipeline for SpaceX Launch Data

Assignment: Building a Data Pipeline for SpaceX Launch Data

Objective: Design and document a robust data pipeline to fetch launch data from the SpaceX API (https://api.spacexdata.com/v5/launches/)

Requirements:

  1. Develop a high-level design document outlining the architecture and components of the proposed data pipeline.
  2. Identify and describe a strategy for interacting with the SpaceX API to fetch launch data. https://api.spacexdata.com/v5/launches/
  3. Propose a strategy for periodic data polling, specifying the intervals and justifying your choice based on efficiency and relevance.
  4. Devise a conceptual schema for storing launch data emphasizing optimization for storage and retrieval efficiency.
  5. Define the logical flow of the data pipeline, outlining steps for data retrieval, potential transformations, and storage in the chosen database.
  6. Outline the infrastructure needed to ingest, transform, validate, and serve the data.
  7. Outline the procedures for contributions and data access which would enable everyone in the organization to collaborate securely and efficiently.

Deliverables:

  1. Document the design covering each aspect of the pipeline, including API interaction, polling mechanism, database schema, and data processing.
  2. Diagrams to visually represent key components of the pipeline (optional: prototypes, or mockups).

Evaluation Criteria:

  1. Creativity: Innovative and thoughtful design elements that enhance the data pipeline's efficiency and functionality.
  2. Clarity: Clear and concise presentation of the design, ensuring that the audience can easily grasp the proposed solution.
  3. Feasibility: The proposed design should be realistic and consider practical aspects of implementation.
  4. Justification: Well-founded explanations for design choices, including polling intervals, database selection, and potential enhancements
  5. Insight: Demonstration of understanding and consideration of best practices in system design, data storage, and retrieval.

Note: This design challenge focuses on the conceptualization and presentation of the data pipeline rather than the actual implementation. The goal is to encourage strategic thinking, creativity, and effective communication of design principles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment