Last active
November 19, 2015 18:30
-
-
Save BrentPalmer/e30195decafd0f1b9cea to your computer and use it in GitHub Desktop.
YouTube Data Checker - Parses through two CSV files and outputs the emails of discrepancies. *Note* I did not know if I was able to ask questions about the challenge? I noticed that prepended to some channel_ownership strings were "UC". I did not know if this was data entry error or not, so i processed as not BUT added the necessary code to take…
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'csv' | |
class YouTubeDataParser | |
def initialize( args ) | |
raise "Missing 'file1.csv'" if args[0].nil? | |
raise "Missing 'file2.csv'" if args[1].nil? | |
file1 = CSV.read(args[0], headers: true) | |
file2 = CSV.read(args[1], headers: true) | |
@file1 = file1 | |
@file2 = file2 | |
@concern = args[2] | |
yt_data_checker(@file1, @file2, @concern) | |
end | |
#checks for concern, directs correct files. | |
def yt_data_checker(file1, file2, concern) | |
if concern == "channel_ownership" | |
sanitize_channels(file1, file2) | |
calculate_differences(@file_1_yt_channels, @file_2_yt_channels) | |
print_emails(@total_difference) | |
elsif concern == "subscriber_count" | |
sanitize_subscriber_count(file1, file2) | |
calculate_differences(@file_1_subscriber_count, @file_2_subscriber_count) | |
print_emails(@total_difference) | |
else | |
sanitize_channels(file1, file2) | |
sanitize_subscriber_count(file1, file2) | |
calculate_differences(@file_1_yt_channels, @file_2_yt_channels) | |
calculate_differences(@file_1_subscriber_count, @file_2_subscriber_count) | |
print_emails(@total_difference) | |
end | |
end | |
#Normalizes channels | |
def sanitize_channels(file1, file2) | |
@file_1_yt_channels = {} | |
@file_2_yt_channels = {} | |
file1.each do |row| | |
@file_1_yt_channels[row[0]] = row[1].split('/').last #.gsub(/^UC/, "") -> Insert if UC is error in input | |
end | |
file2.each do |row| | |
@file_2_yt_channels[row[0]] = row[1].split('/').last #.gsub(/^UC/, "") -> Insert if UC is error in input | |
end | |
end | |
#Normalizes subscriber count | |
def sanitize_subscriber_count(file1, file2) | |
@file_1_subscriber_count = {} | |
@file_2_subscriber_count = {} | |
file1.each do |row| | |
@file_1_subscriber_count[row[0]] = row[2].gsub(/\W/, "").to_s | |
end | |
file2.each do |row| | |
@file_2_subscriber_count[row[0]] = row[2].gsub(/\W/, "").to_s | |
end | |
end | |
#Calculates between suppled channel_ownership, subscribe_count or both. | |
def calculate_differences(data_set1, data_set2) | |
@differences ||= [] | |
@differences = @differences + (data_set1.to_a - data_set2.to_a) | |
@total_difference = @differences | |
end | |
#Iterates through differneces, collects emails and prints them out. | |
def print_emails(differences) | |
emails = [] | |
differences.each do |difference| | |
emails << difference[0] | |
end | |
puts "-------Emails With Discrepancies------" | |
puts emails.uniq | |
puts "--------------------------------------" | |
end | |
end | |
YouTubeDataParser.new( ARGV ) |
@BrentPalmer - Pro-tip, if you rename your file YouTube Data Checker.rb
(note that .rb
at the end), you get syntax highlighting :)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I made a couple changes, including making a new variable for total differences and changing it from a hash to an array for a better output.