Skip to content

Instantly share code, notes, and snippets.

@andrepcg
Created October 17, 2020 11:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save andrepcg/7c89f2345a9f1bba945c7ebb0dd79f28 to your computer and use it in GitHub Desktop.
Save andrepcg/7c89f2345a9f1bba945c7ebb0dd79f28 to your computer and use it in GitHub Desktop.
Extrai resultado das votações na Assembleia da República
Proposal = Struct.new(:title, :votes_favor, :votes_against, :votes_abstention, :conclusion)
CONCLUSIONS = ['Aprovad', 'Rejeitad']
status = 'FINDING_START'
PARTIES_INDICES = nil
PARTIES_LINE_START = ' PS'
# PS PSD BE PCP CDS-PP PAN PEV CH IL NiJKM NiCR
def extract_parties_index(line)
obj = {}
parties = line.split(/[\s.]+/).reject(&:empty?)
parties.each do |party|
obj[line.index(party)] = party
end
obj
end
def find_closest_index(i)
PARTIES_INDICES.keys.min_by{ |x| (i - x).abs }
end
# FAVOR X X X X X X X
def extract_votes(line)
(0..line.length).find_all { |i| line[i,1] == 'X' }
.map { |i| PARTIES_INDICES[find_closest_index(i)] }
.compact
end
def contains_conclusion(line)
CONCLUSIONS.any? { |word| line.include?(word) }
end
cur_proposal = nil
proposals = []
File.readlines(ARGV[0]).each_with_index do |line, index|
status = 'TITLE_READ' if line.start_with?(PARTIES_LINE_START)
if line.start_with?('') || status == 'READING_TITLE'
cur_proposal = Proposal.new('') if line.start_with?('')
status = 'READING_TITLE'
cur_proposal.title << line.strip + " "
status = 'TITLE_READ' if line.strip.end_with?(';')
end
if line.start_with?(PARTIES_LINE_START) && !PARTIES_INDICES
PARTIES_INDICES = extract_parties_index(line)
end
if status == 'TITLE_READ'
status = 'FINDING_VOTES' if line.include?(PARTIES_LINE_START)
status = 'READING_CONCLUSION' if contains_conclusion(line)
end
status = 'READING_VOTES' if line.start_with?('FAVOR')
if status == 'READING_VOTES'
if line.start_with?('FAVOR')
cur_proposal.votes_favor = extract_votes(line)
elsif line.start_with?('CONTRA')
cur_proposal.votes_against = extract_votes(line)
elsif line.start_with?('ABSTEN')
cur_proposal.votes_abstention = extract_votes(line)
status = 'READING_CONCLUSION'
end
end
if status == 'READING_CONCLUSION'
if contains_conclusion(line)
cur_proposal.conclusion = line.strip
proposals << cur_proposal
status = 'FINDING_START'
end
end
end
proposals.each do |prop|
p prop
end
@andrepcg
Copy link
Author

andrepcg commented Oct 17, 2020

Documento de exemplo: https://app.parlamento.pt/WebUtils/docs/doc.pdf?Path=6148523063446f764c304653546d56304c334e706447567a4c31684a566b786c5a79394e52564e424c30464f52566850553046485255354551564e42636e463161585a764c7a4c43716942545a584e7a77364e764945786c5a326c7a6247463061585a684c31684a566c3879587a4531587a49774d6a41744d5441744d545a664d6a41794d4330784d4330784e6935775a47593d&Fich=XIV_2_15_2020-10-16_2020-10-16.pdf&Inline=true

pdftotext votos.pdf -layout
ruby test.rb votos.txt

Resultado:
#<struct Proposal title="Projeto de Resolução n.º 690/XIV/2.ª (IL) – Portal online de transparência e monitorização do processo de execução dos Fundos Europeus; ", votes_favor=["PSD", "CDS-PP", "PAN", "CH", "IL", "NiJKM", "NiCR"], votes_against=["PS"], votes_abstention=["BE", "PCP", "PEV"], conclusion="Rejeitado">

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment