Skip to content

Instantly share code, notes, and snippets.

@drj11
Last active November 21, 2019 16:29
Show Gist options
  • Save drj11/9926eaba9eb3a9440c6d42688fdc2337 to your computer and use it in GitHub Desktop.
Save drj11/9926eaba9eb3a9440c6d42688fdc2337 to your computer and use it in GitHub Desktop.
comments on google-pol-collector

media_saver is empty.

The command line args

The command line args are weird. Perhaps do something like:

main --data uk --ads uk --data all

annoying because it makes me repeat uk, and is slightly non-standard because it repeats the --data argument. But I think we could make the argparse module understand that. Alternatively, investigate click for writing CLI interfaces.

small issues:

what you call data is kind of metadata. But metadata is also data, so ok.

Mostly advert_collector.py

To remove the first 4 characters of a string advert, do: advert[4:]; see function clean_up_json in advert_handler.py. Aside, what are the first 4 characters?

Following PEP-8, sort your imports alphabetically, put 3rd party imports like import youtube_dl into a second import section. (all files)

in is_good_response in advert_collector.py, the .lower() is (now) not necessary. In the end all you test is whether this value was present or not. Do you intend to check for empty string as well?

Not convinced that get_media needs to be in here (could be in advert_handler?)

Mostly data_collector.py

Can put the os.environ assignment outside the function.

Mostly file_management.py

I'm skeptical that the os.path.join in get_directory_for_location_today is as simple as it could be. Why does it call os_getcwd() twice?

Mostly main

main is quite a good place to configure logging actually.

import datetime as date my personal preference for avoiding import thingpy as tp is not usually strong enough to stop people doing it. But datetime as date would surprise most python programmers (specifically, in main).

Just for surprise value, main should probably be renamed main.py.

main should begin with the hash-pling line: #!/usr/bin/env python so that you can also execute the script directly: ./main

Much as a dislike import thing_collector as tc, ac and dc are kind of acceptable names. asv and fmgmt not so much.

in the main() function, you can use tuple unpacking to do:

location, get_ads = arg.split(":")
scope[location] = bool(get_ads)
# the bool() here, avoids the int(collect_ads) inside the collect_adverts function

in the advert loop inside collect_adverts, you can replace most occurences of adverts[i] with line; consider renaming line to advert_row or something.

In the collect_adverts function you can avoid one level of nesting by inverting the if condition and using continue as an early exit:

if not collect_ads:
    continue
# rest of code here, at same indent as if
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment