Skip to content

Instantly share code, notes, and snippets.

View tcarland's full-sized avatar

Timothy C. Arland tcarland

View GitHub Profile
@tcarland
tcarland / mysql-repl.md
Last active June 29, 2018 02:24
Configuring MySQL Replication

Configuring MySQL for Replication

Configure Mysql:

Additional mysql options common to all instances is provided below. The following are the necessary options for enabling bin-log format necessary for replication. 'server-id' should be unique across all mysql instances.

server-id=1
@tcarland
tcarland / hadoop-psdm.md
Last active June 5, 2020 15:19
Running Hadoop in pseudo-distributed mode.

Pseudo-Distributed Hadoop Environment

A document describing the configuration of a local, apache-based hadoop distribution running in pseudo-distributed mode. While there are useful VM's provided by various hadoop vendors, running natively provides better performance and more control over the environment for testing purposes (such as running multiple versions). For developers interested in underlying details of the hadoop stack, having a native version based on compiled apache projects is much more clear versus trying to make sense of Cloudera's internal versions.

@tcarland
tcarland / hadoop-prereq.md
Last active November 26, 2020 16:03
Hadoop cluster prerequisites

Hadoop Node Prerequisites

Configuring root ssh

There are various methods of automation for applying these nodes requisites, including distributing CDH agents, but it is still very useful to have an administration tool that allows interaction with all nodes with proper feedback and diff capabilities. Clustershell works brilliantly for this and is a must for managing clusters without opening too many windows. If not just for clustershell, configuring root ssh is also useful

@oza
oza / SparkOnYARN.md
Last active October 9, 2022 08:53
How to run Spark on YARN with dynamic resource allocation

YARN

  1. General resource management layer on HDFS
  2. A part of Hadoop

Spark

  1. In memory processing framework

Spark on YARN