Skip to content

Instantly share code, notes, and snippets.

Live datanodes (5):
Name: 10.0.0.76:9866 (10.0.0.76)
Hostname: ce771da755d5
Decommission Status : Normal
Configured Capacity: 241924083712 (225.31 GB)
DFS Used: 8241152 (7.86 MB)
Non DFS Used: 92889190400 (86.51 GB)
DFS Remaining: 139151388672 (129.59 GB)
DFS Used%: 0.00%

There is a file called path.py https://github.com/metabrainz/listenbrainz-labs/blob/master/listenbrainz_spark/path.py that contain path to directories which we need in HDFS. eg path: DATAFRAME_DIR = os.path.join('/', 'recommendation', 'dataframe')

one use of path.py is here : https://github.com/metabrainz/listenbrainz-labs/blob/master/manage.py#L57

Now, there is another file called create_dataframe.py which need path info, here: https://github.com/metabrainz/listenbrainz-labs/blob/master/listenbrainz_spark/recommendations/create_dataframes.py#L86

I don't intend to ask about any specific peice of code in this file, but the general flow. I would like to know if I had followed good pythonic code practices. If not, what changes should I bring in?

File here. This file fetches some dataframes(tables) from hdfs, perform a join query and save the result back to hdfs.

Here we are calling a function from this file. I wanted to know how okay is the exception handling here? Should I remove the try and catch from init.py and catch it only in create_dataframes.py or do it like it is? Note that, this function has been used in many files and I have followed the same path everywhere. Also, how fine are the error messages? Since

This is an overview of the music recommendation project. Any inputs or reviews are more than welcome.

Let us take a look inside HDFS:

/data/listenbrainz /data/listenbrainz/1.parquet

This is an overview of the music recommendation project. Any inputs or reviews are more than welcome.

Let us take a look inside HDFS:

|___/data/listenbrainz
|   └───2002
|   │     │   1.parquet
|   │     │   2.parquet
| │ │ .
#!/usr/bin/python3
#code
t = int(input())
def get_len(s, k):
freq, q, char, max_len = {}, [], set(), 1
for i in s:
char.add(i)
# Python3 program using to find length of
# the longest common substring recursion
# Returns length of function for longest
# common substring of X[0..m-1] and Y[0..n-1]
def lcs(i, j, count) :
if (i == 0 or j == 0) :
return count
#code
t = int(input())
def get_max(s1, n, s2, m, matrix):
if n == 0 and m == 0:
if s1[n] == s2[m]:
matrix[n][m] = 1
return matrix[n][m]
## utils.py
def register_dataframe(df, table_name):
df.createOrReplaceTempView(table_name)
try:
utils.register_dataframe(df, table)
except Py4JJavaError as err:
logging.error('{}\n{}\nAborting...'.format(str(err), err.java_exception))
sys.exit(-1)
def convert_metadata_to_row(model_metadata):
meta = model_metadata
return Row(
alpha=meta['alpha'],
created=meta['created'],
deleted=meta['deleted'],
from_date=meta['from_date'],
lambda=
)