Skip to content

Instantly share code, notes, and snippets.

@arielshaqed
Last active October 26, 2022 12:25
Show Gist options
  • Save arielshaqed/1478de07e8e97efa170b7ff0c5cfa93f to your computer and use it in GitHub Desktop.
Save arielshaqed/1478de07e8e97efa170b7ff0c5cfa93f to your computer and use it in GitHub Desktop.
remaining tasks for LakeFSOutputCommitter. Basis for the execution plan, which is what you _really_ want to read.
  • Integrate to write single format (text?) in one mode (append?) M1
  • Implement abort M1
  • Testing (component test?) M1
  • Decide whether we need 1 branch per task? M1
  • Write all formats M2
    • Text
    • CSV
    • Parquet *
    • ORC *?
    • JSON
  • Check remaining interface methods M2
    • Support task recovery
  • Support all modes M2
    • ErrorIfExists (default)
    • Append
    • Overwrite *
    • Ignore
  • Multiwriter support M3
    • For overwrite save mode
    • For other save modes
  • Easier configuration: default to using this with lakeFSFS? M?
  • Improve configuration for Spark 3: better configuration options exist in Spark 3 (default OC for FS). Use them! M??
  • Enhance metadata M?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment