Vansika Pareek vansika

## gist:b57877e58e2eb1fa6389dc237ec81436
Live datanodes (5):

Name: 10.0.0.76:9866 (10.0.0.76)
Hostname: ce771da755d5
Decommission Status : Normal
Configured Capacity: 241924083712 (225.31 GB)
DFS Used: 8241152 (7.86 MB)
Non DFS Used: 92889190400 (86.51 GB)
DFS Remaining: 139151388672 (129.59 GB)
DFS Used%: 0.00%

## gist:c1e305cdfb713339dd43f6960ecbb691

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vansika
                / gist:c1e305cdfb713339dd43f6960ecbb691
            
            
              Last active
              August 19, 2019 11:56
            
          
    There is a file called path.py https://github.com/metabrainz/listenbrainz-labs/blob/master/listenbrainz_spark/path.py
that contain path to directories which we need in HDFS.
eg path:
DATAFRAME_DIR = os.path.join('/', 'recommendation', 'dataframe')
one use of path.py is here : https://github.com/metabrainz/listenbrainz-labs/blob/master/manage.py#L57
Now, there is another file called create_dataframe.py which need path info, here:
https://github.com/metabrainz/listenbrainz-labs/blob/master/listenbrainz_spark/recommendations/create_dataframes.py#L86

  
## dataframes.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vansika
                / dataframes.md
            
            
              Last active
              August 19, 2019 20:42
            
          
    I don't intend to ask about any specific peice of code in this file, but the general flow. I would like to know if I had followed good pythonic code practices. If not, what changes should I bring in?
File here. This file fetches some dataframes(tables) from hdfs, perform a join query and save the result back to hdfs.
Here
we are calling a function from this file.
I wanted to know how okay is the exception handling here? Should I remove the try and catch from init.py and catch it only
in create_dataframes.py or do it like it is? Note that, this function has been used in many files and I have followed the same path
everywhere. Also, how fine are the error messages? Since

  
## overview.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vansika
                / overview.md
            
            
              Created
              August 23, 2019 08:55
            
          
    This is an overview of the music recommendation project. Any inputs or reviews are more than welcome.

Let us take a look inside HDFS:

/data/listenbrainz
/data/listenbrainz/1.parquet


## overview.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                vansika
                / overview.md
            
            
              Last active
              August 27, 2019 06:44
            
          
    This is an overview of the music recommendation project. Any inputs or reviews are more than welcome.

Let us take a look inside HDFS:
|___/data/listenbrainz
|   └───2002
|   │     │   1.parquet
|   │     │   2.parquet
| │ │ .


## gist:b7501771e18c1d4316ca8f252ed4a7c5
#!/usr/bin/python3

#code
t = int(input())

def get_len(s, k):
    freq, q, char, max_len = {}, [], set(), 1

    for i in s:
        char.add(i)

## gist:60110c864200cc481fa69867a4e0fb10
# Python3 program using to find length of
# the longest common substring recursion

# Returns length of function for longest
# common substring of X[0..m-1] and Y[0..n-1]
def lcs(i, j, count) :

    if (i == 0 or j == 0) :
        return count


## gist:5fea41da137159c960192ef46e2ae151
#code

t = int(input())

def get_max(s1, n, s2, m, matrix):
    if n == 0 and m == 0:
        if s1[n] == s2[m]:
            matrix[n][m] = 1
        return matrix[n][m]


## gist:61ca5e6a31c8158115de6a390b17399d
## utils.py
def register_dataframe(df, table_name):
    df.createOrReplaceTempView(table_name)

try:
    utils.register_dataframe(df, table)
except Py4JJavaError as err:
    logging.error('{}\n{}\nAborting...'.format(str(err), err.java_exception))
    sys.exit(-1)

## gist:a97c81727f355b294f583cd6b333e648

def convert_metadata_to_row(model_metadata):
    meta = model_metadata
    return Row(
        alpha=meta['alpha'],
        created=meta['created'],
        deleted=meta['deleted'],
        from_date=meta['from_date'],
        lambda=
    )
	Live datanodes (5):

	Name: 10.0.0.76:9866 (10.0.0.76)
	Hostname: ce771da755d5
	Decommission Status : Normal
	Configured Capacity: 241924083712 (225.31 GB)
	DFS Used: 8241152 (7.86 MB)
	Non DFS Used: 92889190400 (86.51 GB)
	DFS Remaining: 139151388672 (129.59 GB)
	DFS Used%: 0.00%
	#!/usr/bin/python3

	#code
	t = int(input())

	def get_len(s, k):
	freq, q, char, max_len = {}, [], set(), 1

	for i in s:
	char.add(i)
	# Python3 program using to find length of
	# the longest common substring recursion

	# Returns length of function for longest
	# common substring of X[0..m-1] and Y[0..n-1]
	def lcs(i, j, count) :

	if (i == 0 or j == 0) :
	return count
	#code

	t = int(input())

	def get_max(s1, n, s2, m, matrix):
	if n == 0 and m == 0:
	if s1[n] == s2[m]:
	matrix[n][m] = 1
	return matrix[n][m]
	## utils.py
	def register_dataframe(df, table_name):
	df.createOrReplaceTempView(table_name)

	try:
	utils.register_dataframe(df, table)
	except Py4JJavaError as err:
	logging.error('{}\n{}\nAborting...'.format(str(err), err.java_exception))
	sys.exit(-1)

	def convert_metadata_to_row(model_metadata):
	meta = model_metadata
	return Row(
	alpha=meta['alpha'],
	created=meta['created'],
	deleted=meta['deleted'],
	from_date=meta['from_date'],
	lambda=
	)