Skip to content

Instantly share code, notes, and snippets.

@abhisheyke
Created July 19, 2018 01:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abhisheyke/27b073e7ac805ce9e6bb33c2b011bb5a to your computer and use it in GitHub Desktop.
Save abhisheyke/27b073e7ac805ce9e6bb33c2b011bb5a to your computer and use it in GitHub Desktop.
Job run on two executor pods. Both execuotr pods has similar statistics.
Details statistics captured from one of two executors.
File descriptor...
# lsof -p 14 | tail -10
java 14 root *621u a_inode 0,10 0 8838 [eventpoll]
java 14 root *622r FIFO 0,9 0t0 250666725 pipe
java 14 root *623w FIFO 0,9 0t0 250666725 pipe
java 14 root *624u a_inode 0,10 0 8838 [eventpoll]
java 14 root *625r FIFO 0,9 0t0 250505447 pipe
java 14 root *626w FIFO 0,9 0t0 250505447 pipe
java 14 root *627u a_inode 0,10 0 8838 [eventpoll]
java 14 root *635r FIFO 0,9 0t0 250187352 pipe
java 14 root *636w FIFO 0,9 0t0 250187352 pipe
java 14 root *637u a_inode 0,10 0 8838 [eventpoll]
# lsof -p 14 | grep -E 'pipe|eventpoll' | wc -l
51648
# lsof -p 14 | wc -l
52174 // This number goes upto 85k (max increased ulimit) and crashed. /proc/14/fd gives similar numbe
Netstat details... most of them are in TIME_WAIT
# netstat -plant | tail -10
tcp6 0 0 10.244.1.109:40732 192.168.9.181:50010 TIME_WAIT -
tcp6 0 0 10.244.1.109:46012 192.168.9.181:50010 TIME_WAIT -
tcp6 0 0 10.244.1.109:58378 10.97.110.19:9092 TIME_WAIT -
tcp6 0 0 10.244.1.109:41290 192.168.9.181:50010 TIME_WAIT -
tcp6 0 0 10.244.1.109:44240 192.168.9.181:50010 TIME_WAIT -
tcp6 0 0 10.244.1.109:38238 192.168.9.182:50010 TIME_WAIT -
tcp6 0 0 10.244.1.109:44884 192.168.9.181:50010 TIME_WAIT -
tcp6 0 0 10.244.1.109:38312 192.168.9.181:50010 TIME_WAIT -
tcp6 0 0 10.244.1.109:46352 192.168.9.181:50010 TIME_WAIT -
tcp6 0 0 10.244.1.109:44698 192.168.9.182:50010 ESTABLISHED -
# netstat -plant | wc -l
1340
# netstat -plant | grep ESTABLISHED | wc -l
6
# netstat -plant | grep TIME_WAIT | wc -l
1096
// Most of the connection in TIME_WAIT is against hdfs hosting pods IP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment