Skip to content

Instantly share code, notes, and snippets.

@showyou
Created August 2, 2012 07:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save showyou/3234772 to your computer and use it in GitHub Desktop.
Save showyou/3234772 to your computer and use it in GitHub Desktop.
Problem which is not close socket on hive
Environment:
CentOS 5.8 x64
CDH3u4
hadoop-0.20-0.20.2+923.256-1
hadoop-0.20-{namenode,secondarynamenode,jobtracker,tasktracker,daanode}-0.20.2+923.256-1
hadoop-0.20-conf-pseudo-0.20.2+923.256-1(but same error was occurred on not pseudo env)
apache hive-0.8.1(but same error was occurred on hive 0.9)
Procedure for reproduction:
1. Set up hadoop
2. Prepare data file and link.txt:
data:
$ hadoop fs -cat /path/to/data/2012-07-01/20120701.csv
1, 20120701 00:00:00
2, 20120701 00:00:01
3, 20120701 01:12:45
link.txt
$ cat link.txt
/path/to/data/2012-07-01//*
2. On hive, create table like below:
CREATE TABLE user_logs(id INT, created_at STRING)
row format delimited fields terminated by ',' lines terminated by '\n'
stored as inputformat 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
3. Put link.txt to /user/hive/warehouse/user_logs
4. Open another session(A session), and watch socket,
$ netstat -a | grep CLOSE_WAIT
tcp 1 0 localhost:48121 localhost:50010 CLOSE_WAIT
tcp 1 0 localhost:48124 localhost:50010 CLOSE_WAIT
$
5. Return to hive session, execute this,
$ select * from user_logs;
6. Return to A session, watch socket again,
$ netstat -a | grep CLOSE_WAIT
tcp 1 0 localhost:48121 localhost:50010 CLOSE_WAIT
tcp 1 0 localhost:48124 localhost:50010 CLOSE_WAIT
tcp 1 0 localhost:48166 localhost:50010 CLOSE_WAIT
If you makes any partitions, you'll watch unclosed socket whose count equals partitions by once.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment