Skip to content

Instantly share code, notes, and snippets.

View anjijava16's full-sized avatar

Anjaiah Methuku anjijava16

View GitHub Profile
Scala Envi:
1. Install scala
1.1 Download scala (latest version) form
1.2 Uncompress it
1.3 Add the scala bin folder to path variable
2. Eclipse mars or luna
anjijava16 /
Created December 26, 2016 10:17 — forked from amalgjose/
Mapreduce program for removing stop words from the given text files. Hadoop Distributed cache and counters are used in this program
package com.hadoop.skipper;
import java.util.HashSet;
import java.util.Set;
import java.util.StringTokenizer;
import org.apache.hadoop.fs.Path;
Um projeto no GIT é composto de 3 partes:
- Working Directory: Onde acontece a edição/deleção
- Staging Area: Onde adicionamos os arquivos a serem comitados
- Repository: Onde acontece o commit e é armazenado a última versão.
• DISTINCT and GROUP BY - Use only if it is necessary. Try to avoid it as it will degrade the performance.
• PARTITION - Try to partition the table. Using the partition column in Filter will Improve the performance.
• Rewrite - Do not use the same query as used in RDBMS. Rewrite the query completely to improve the performance.
• Map Split Size - Try to reduce the Map Split Size. This will reduce the time taken by the query.
• Map Join - Try to Map Join small tables so that joining it with large table will take less time.
• Memory - Change the memory based on queries used.
• Format ORC - Try to keep all the tables in ORC format which will improve the queries on that table.
• Hive Execute parallel - For executing jobs in parallel.
• CTAS - Try creating Managed tables instead of External tables.
• Data Explosion - Try to fetch the filtered data set and join. Make sure that there is no cross join between large data set.
Procedure to install Google Chrome 52 on a RHEL/CentOS/Fedora Linux:
Here is how to install and use the Google Chrome 45 in five easy steps:
Open the Terminal application. Grab 64bit Google Chrome.
Type the following command to download 64 bit version of Google Chrome:
Install Google Chrome and its dependencies on a CentOS/RHEL, type:
sudo yum install ./google-chrome-stable_current_*.rpm
Start Google Chrome from the CLI:
i) hadoop fs -Ddfs.block.size=67108864 -Ddfs.replication=4 -copyFromLocal pom.xml /app/data
ii) hdfs fsck -blocks -files -locations /app/data/pom.xml
Output : Connecting to namenode via http://localhost:50070
FSCK started by hadoop (auth:SIMPLE) from / for path /app/data/pom.xml at Mon Aug 21 23:36:58 IST 2017
/app/data/pom.xml 2617 bytes, 1 block(s): Under replicated BP-806356112- Target Replicas is 4 but found 1 replica(s).
0. BP-806356112- len=2617 repl=1 []
0) hadoop jar mapReduceUtils-0.1.jar com.iwinner.m_techlearn.hadoop.mapreduce.custom1.TempuratureJob /data/OutputCust/
i) hadoop fs -Ddfs.block.size=67108864 -Ddfs.replication=4 -copyFromLocal pom.xml /app/data
ii) hdfs fsck -blocks -files -locations /app/data/pom.xml
iii)yarn application -list
iv)yarn application -kill <<Application_ID>>
Wednesday, March 11, 2015
HDFS Tutorial
Posted by pramod narayana at 6:27 AM No comments:
Email This
Share to Twitter
Share to Facebook
Share to Pinterest