Skip to content

Instantly share code, notes, and snippets.

@vansika
Last active August 19, 2019 11:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vansika/c1e305cdfb713339dd43f6960ecbb691 to your computer and use it in GitHub Desktop.
Save vansika/c1e305cdfb713339dd43f6960ecbb691 to your computer and use it in GitHub Desktop.

There is a file called path.py https://github.com/metabrainz/listenbrainz-labs/blob/master/listenbrainz_spark/path.py that contain path to directories which we need in HDFS. eg path: DATAFRAME_DIR = os.path.join('/', 'recommendation', 'dataframe')

one use of path.py is here : https://github.com/metabrainz/listenbrainz-labs/blob/master/manage.py#L57

Now, there is another file called create_dataframe.py which need path info, here: https://github.com/metabrainz/listenbrainz-labs/blob/master/listenbrainz_spark/recommendations/create_dataframes.py#L86

The path needed in create_dataframes.py should be something like this: hdfs://hadoop-master:9000/recommendation/dataframe/user.py

os.path.join ignores everythin before a '/' so I created the path like hdfs://hadoop-master:9000 + path.DATAFRAME_DIR + '/user.py'

which looks very weird to me.

also, is it good to create all the required directories before hand with a single file (manage.py) or each directory should be created with the script that requires it ?

I specifically wanted your reviews on this PR: https://github.com/metabrainz/listenbrainz-labs/pull/46/commits/80fb5fe22c0cd77544a467ddb52d3bda9c206137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment