###Tested with:
- Spark 2.0.0 pre-built for Hadoop 2.7
- Mac OS X 10.11
- Python 3.5.2
Use s3 within pyspark with minimal hassle.
# Install R + RStudio on Ubuntu 14.04 | |
sudo apt-key adv –keyserver keyserver.ubuntu.com –recv-keys E084DAB9 | |
# Ubuntu 12.04: precise | |
# Ubuntu 14.04: trusty | |
# Ubuntu 16.04: xenial | |
# Basic format of next line deb https://<my.favorite.cran.mirror>/bin/linux/ubuntu <enter your ubuntu version>/ | |
sudo add-apt-repository 'deb https://ftp.ussg.iu.edu/CRAN/bin/linux/ubuntu trusty/' | |
sudo apt-get update |
1)What is Difference between Secondary namenode, Checkpoint namenode & backupnod Secondary Namenode, a poorly named component of hadoop. | |
(2)What are the Side Data Distribution Techniques. | |
(3)What is shuffleing in mapreduce? | |
(4)What is partitioning? | |
(5)Can we change the file cached by Distributed Cache |
// Find the minimum path sum (from root to leaf) | |
public static int minPathSum(TreeNode root) { | |
if(root == null) return 0; | |
int sum = root.val; | |
int leftSum = minPathSum(root.left); | |
int rightSum = minPathSum(root.right); | |
if(leftSum < rightSum){ | |
sum += leftSum; |
# MWS API docs at http://docs.developer.amazonservices.com/en_US/orders-2013-09-01/Orders_Datatypes.html#Order | |
# MWS Scratchpad at https://mws.amazonservices.com/scratchpad/index.html | |
# Boto docs at http://docs.pythonboto.org/en/latest/ref/mws.html?#module-boto.mws | |
from boto.mws.connection import MWSConnection | |
... | |
# Provide your credentials. | |
conn = MWSConnection( |
Picking the right architecture = Picking the right battles + Managing trade-offs
People
![]() :bowtie: |
😄 :smile: |
😆 :laughing: |
---|---|---|
😊 :blush: |
😃 :smiley: |
:relaxed: |
😏 :smirk: |
😍 :heart_eyes: |
😘 :kissing_heart: |
😚 :kissing_closed_eyes: |
😳 :flushed: |
😌 :relieved: |
😆 :satisfied: |
😁 :grin: |
😉 :wink: |
😜 :stuck_out_tongue_winking_eye: |
😝 :stuck_out_tongue_closed_eyes: |
😀 :grinning: |
😗 :kissing: |
😙 :kissing_smiling_eyes: |
😛 :stuck_out_tongue: |
/!\ Be very carrefull in your setup : any misconfiguration make all the git config to fail silently ! Go trought this guide step by step and it should be fine 😉
~/.ssh/config
, set each ssh key for each repository as in this exemple:import multiprocessing #:) | |
def do_this(number): | |
print number | |
return number*2 | |
# Create a list to iterate over. | |
# (Note: Multiprocessing only accepts one item at a time) | |
some_list = range(0,10) |
import copy | |
# write to a path using the Hudi format | |
def hudi_write(df, schema, table, path, mode, hudi_options): | |
hudi_options = { | |
"hoodie.datasource.write.recordkey.field": "recordkey", | |
"hoodie.datasource.write.precombine.field": "precombine_field", | |
"hoodie.datasource.write.partitionpath.field": "partitionpath_field", | |
"hoodie.datasource.write.operation": "write_operaion", | |
"hoodie.datasource.write.table.type": "table_type", |