vansika/gist:c1e305cdfb713339dd43f6960ecbb691

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    There is a file called path.py https://github.com/metabrainz/listenbrainz-labs/blob/master/listenbrainz_spark/path.py
that contain path to directories which we need in HDFS.
eg path:
DATAFRAME_DIR = os.path.join('/', 'recommendation', 'dataframe')
one use of path.py is here : https://github.com/metabrainz/listenbrainz-labs/blob/master/manage.py#L57
Now, there is another file called create_dataframe.py which need path info, here:
https://github.com/metabrainz/listenbrainz-labs/blob/master/listenbrainz_spark/recommendations/create_dataframes.py#L86
The path needed in create_dataframes.py should be something like this:
hdfs://hadoop-master:9000/recommendation/dataframe/user.py
os.path.join ignores everythin before a '/'
so I created the path like hdfs://hadoop-master:9000 + path.DATAFRAME_DIR + '/user.py'
which looks very weird to me.
also, is it good to create all the required directories before hand with a single file (manage.py)
or each directory should be created with the script that requires it ?
I specifically wanted your reviews on this PR:
https://github.com/metabrainz/listenbrainz-labs/pull/46/commits/80fb5fe22c0cd77544a467ddb52d3bda9c206137