Skip to content

Instantly share code, notes, and snippets.

View MaxNevermind's full-sized avatar

Max Konstantinov MaxNevermind

View GitHub Profile
@MaxNevermind
MaxNevermind / FastParquetTransformer.scala
Last active April 19, 2024 20:47
A Spark Parquet utility, enables much faster modification or addition of a field to an extremely large dataset.
import org.slf4j.{Logger, LoggerFactory}
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.parquet.hadoop.rewrite.{RewriteOptions, ParquetRewriter}
import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
import org.apache.spark.sql.functions.{col, element_at, expr, monotonically_increasing_id}
import java.net.URI
import scala.collection.JavaConverters._
// You need Maven installed to run it.
lazy val mavenDependencyTree = taskKey[Unit]("Prints a Maven dependency tree")
mavenDependencyTree := {
val scalaReleaseSuffix = "_" + scalaVersion.value.split('.').take(2).mkString(".")
val pomXml =
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>groupId</groupId>
<artifactId>artifactId</artifactId>
<version>1.0</version>
@MaxNevermind
MaxNevermind / maven_2_sbt_dependencies.scala
Last active June 29, 2018 11:43 — forked from mslinn/PomToSbt.scala
Convert pom.xml to build.sbt
val lines = (scala.xml.XML.load("pom.xml") \\ "dependencies") \ "dependency" map { dependency =>
val groupId = (dependency \ "groupId").text
val artifactId = (dependency \ "artifactId").text
val version = (dependency \ "version").text
val scope = (dependency \ "scope").text match {
case "" => ""
case x => s""" % "$x""""
}
s"""libraryDependencies += "$groupId" % "$artifactId" % "$version"$scope"""