Skip to content

Instantly share code, notes, and snippets.

View denstern's full-sized avatar
:octocat:

Denis Neustroev denstern

:octocat:
View GitHub Profile
@denstern
denstern / spark_spline.md
Last active April 2, 2024 08:22
Getting spark sources from spline

image

The Spline module collects run-time data lineage information from Spark jobs.

The Spline agent is a Scala library built into the Spark driver that listens to Spark job events and collects logical execution plans. The collected metadata is then passed to the origin manager, from where it can either be sent to the Spline server, for example via a REST API or Kafka, or used in other ways, depending on the type of manager chosen.

Add module to spark-submit

  1. add configs