Skip to content

Instantly share code, notes, and snippets.

@BrentPalmer
Last active November 19, 2015 18:30
Show Gist options
  • Save BrentPalmer/e30195decafd0f1b9cea to your computer and use it in GitHub Desktop.
Save BrentPalmer/e30195decafd0f1b9cea to your computer and use it in GitHub Desktop.
YouTube Data Checker - Parses through two CSV files and outputs the emails of discrepancies. *Note* I did not know if I was able to ask questions about the challenge? I noticed that prepended to some channel_ownership strings were "UC". I did not know if this was data entry error or not, so i processed as not BUT added the necessary code to take…
require 'csv'
class YouTubeDataParser
def initialize( args )
raise "Missing 'file1.csv'" if args[0].nil?
raise "Missing 'file2.csv'" if args[1].nil?
file1 = CSV.read(args[0], headers: true)
file2 = CSV.read(args[1], headers: true)
@file1 = file1
@file2 = file2
@concern = args[2]
yt_data_checker(@file1, @file2, @concern)
end
#checks for concern, directs correct files.
def yt_data_checker(file1, file2, concern)
if concern == "channel_ownership"
sanitize_channels(file1, file2)
calculate_differences(@file_1_yt_channels, @file_2_yt_channels)
print_emails(@total_difference)
elsif concern == "subscriber_count"
sanitize_subscriber_count(file1, file2)
calculate_differences(@file_1_subscriber_count, @file_2_subscriber_count)
print_emails(@total_difference)
else
sanitize_channels(file1, file2)
sanitize_subscriber_count(file1, file2)
calculate_differences(@file_1_yt_channels, @file_2_yt_channels)
calculate_differences(@file_1_subscriber_count, @file_2_subscriber_count)
print_emails(@total_difference)
end
end
#Normalizes channels
def sanitize_channels(file1, file2)
@file_1_yt_channels = {}
@file_2_yt_channels = {}
file1.each do |row|
@file_1_yt_channels[row[0]] = row[1].split('/').last #.gsub(/^UC/, "") -> Insert if UC is error in input
end
file2.each do |row|
@file_2_yt_channels[row[0]] = row[1].split('/').last #.gsub(/^UC/, "") -> Insert if UC is error in input
end
end
#Normalizes subscriber count
def sanitize_subscriber_count(file1, file2)
@file_1_subscriber_count = {}
@file_2_subscriber_count = {}
file1.each do |row|
@file_1_subscriber_count[row[0]] = row[2].gsub(/\W/, "").to_s
end
file2.each do |row|
@file_2_subscriber_count[row[0]] = row[2].gsub(/\W/, "").to_s
end
end
#Calculates between suppled channel_ownership, subscribe_count or both.
def calculate_differences(data_set1, data_set2)
@differences ||= []
@differences = @differences + (data_set1.to_a - data_set2.to_a)
@total_difference = @differences
end
#Iterates through differneces, collects emails and prints them out.
def print_emails(differences)
emails = []
differences.each do |difference|
emails << difference[0]
end
puts "-------Emails With Discrepancies------"
puts emails.uniq
puts "--------------------------------------"
end
end
YouTubeDataParser.new( ARGV )
@BrentPalmer
Copy link
Author

I made a couple changes, including making a new variable for total differences and changing it from a hash to an array for a better output.

@8bitDesigner
Copy link

@BrentPalmer - Pro-tip, if you rename your file YouTube Data Checker.rb (note that .rb at the end), you get syntax highlighting :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment