Skip to content

Instantly share code, notes, and snippets.

@kmader
kmader / README.md
Last active October 31, 2023 14:21
Beating Serialization in Spark

Serialization

As all objects must be Serializable to be used as part of RDD operations in Spark, it can be difficult to work with libraries which do not implement these featuers.

Java Solutions

Simple Classes

For simple classes, it is easiest to make a wrapper interface that extends Serializable. This means that even though UnserializableObject cannot be serialized we can pass in the following object without any issue

public interface UnserializableWrapper extends Serializable {
 public UnserializableObject create(String parm1, String parm2);
@un33k
un33k / sed cheatsheet
Created August 22, 2011 13:28
magic of sed -- find and replace "text" in a string or a file
FILE SPACING:
# double space a file
sed G
# double space a file which already has blank lines in it. Output file
# should contain no more than one blank line between lines of text.
sed '/^$/d;G'