Skip to content

Instantly share code, notes, and snippets.

@mangar
Created November 10, 2016 17:34
Show Gist options
  • Save mangar/35313adab1ef083a4db96c30af68d50b to your computer and use it in GitHub Desktop.
Save mangar/35313adab1ef083a4db96c30af68d50b to your computer and use it in GitHub Desktop.
# Starting from a directory (ARGV[0]) gets all PDF files and generate # an image for each page inside another directory (ARGV[1]). # During the process, a new pdf is generated for each page. # The third param (ARGV[2]) can set to 'true' to remote this middle pdf.
FROM mangar/rails-pg:5.0.0
# docker build -t mangar/rails-pg:5.0.0.1 .
MAINTAINER Marcio Mangar "marcio.mangar@gmail.com"
RUN apt-get update && apt-get install -y ghostscript
RUN gem install rails -v 5.0.0.1
RUN gem install sqlite3
RUN gem install puma -v 3.0
RUN gem install sass-rails -v 5.0
RUN gem install uglifier -v 1.3.0
RUN gem install coffee-rails -v 4.2
RUN gem install jquery-rails
RUN gem install turbolinks -v 5
RUN gem install jbuilder -v 2.5
RUN gem install carrierwave -v 0.10.0
RUN gem install devise_token_auth -v 0.1.38
RUN gem install fog -v 1.38
RUN gem install omniauth -v 1.3.1
RUN gem install omniauth-facebook -v 4.0.0
RUN gem install pg -v 0.15
RUN gem install active_model_serializers -v 0.10.2
RUN gem install acts_as_list -v 0.8.0
RUN gem install dotenv-rails
RUN gem install pry-rails
RUN gem install pry-rescue
RUN gem install web-console
RUN gem install listen -v 3.0.5
RUN gem install spring
RUN gem install spring-watcher-listen -v 2.0.0
RUN gem install better_errors
RUN gem install rack-cors
RUN gem install awesome_print
RUN gem install ap
# rack-cors, :require => 'rack/cors'
# awesome_print", require:"ap"
RUN gem install le -v 2.7.2
RUN gem install redis
RUN gem install kaminari
RUN gem install paranoia -v 2.2.0.pre
RUN gem install tzinfo-data
RUN gem install CFPropertyList -v 2.3.2
RUN gem install arel -v 7.1.1
RUN gem install turbolinks -v 5.0.1
RUN gem install globalid -v 0.3.7
RUN gem install jsonapi -v 0.1.1.beta2
RUN gem install sprockets-rails -v 3.2.0
RUN gem install jwt -v 1.5.4
RUN gem install uglifier -v 3.0.2
RUN gem install hashie -v 3.4.4
RUN gem install rbvmomi -v 1.8.2
RUN gem install faker -v 1.6.6
RUN gem install acts_as_list -v 0.8.1
RUN gem install omniauth -v 1.3.1
RUN gem install omniauth-oauth2 -v 1.4.0
RUN gem install omniauth-facebook -v 4.0.0
RUN gem install excon -v 0.52.0
RUN gem install fog-core -v 1.42.0
RUN gem install fog-profitbricks -v 0.0.5
RUN gem install fog-vsphere -v 1.0.1
RUN gem install fog-openstack -v 0.1.12
RUN gem install fog-aws -v 0.11.0
RUN gem install fog-local -v 0.3.0
RUN gem install cmdparse -v 3.0.1
RUN gem install puma -v 3.6.0
RUN gem install sqlite3 -v 1.3.11
RUN gem install rmagick
# RUN gem install hexapdf -v 0.1.0
EXPOSE 3000
#!/usr/local/bin/ruby
#
# Description
#
# Starting from a directory (ARGV[0]) gets all PDF files and generate
# an image for each page inside another directory (ARGV[1]).
# During the process, a new pdf is generated for each page.
# The third param (ARGV[2]) can set to 'true' to remote this middle pdf.
#
#
# Usage:
#
# ./split.rb INPUT_DIR_WITH_PDF_FILES OUTPUT_DIR REMOVE_MIDDLE_PDF_GEN
# ./split.rb ../../public/pdf/upload/document_temp /app/public/pdf/files true
#
require 'hexapdf'
require 'RMagick'
def removeExt (file = "", index = nil)
filename = "#{file[0,file.rindex(".")]}"
(!index ? filename : "#{filename}_#{index.to_s.rjust(5, "0")}")
end
puts "-" * 100
puts "[pdf.split.start] #{Time.now}"
_dir = ARGV[0]
Dir.chdir(_dir)
_files = Dir.glob("*.pdf")
puts "Files: #{_files}"
_output_dir = ARGV[1]
puts "Output: #{_output_dir}"
_remove_middle_pdf = ARGV[2]
puts "Remove Middle PDF: #{_remove_middle_pdf}"
_files.each do |file|
puts "---"
puts "> Processing File: #{file}"
document_code = file[0, file.rindex("__")]
# puts "> Document Code: #{document_code}"
output_dir = "#{_output_dir}/#{document_code}"
puts "> Output: #{output_dir}"
Dir.mkdir(output_dir) unless File.exists?(output_dir)
#
pdf = HexaPDF::Document.open(file)
i = 0
pdf.pages.each_page do |page|
target = HexaPDF::Document.new
target.pages.add_page(target.import(page))
new_filename = "#{removeExt(file, i)}"
new_file = "#{output_dir}/#{new_filename}.pdf"
target.write(new_file, optimize: true)
# puts "outputing: #{new_file}"
# convert to a jpg
pdf = Magick::ImageList.new("#{new_file}")
# puts "PDF: #{pdf}"
pdf.write("#{removeExt(new_file)}.jpg")
puts " - #{removeExt(new_file)}.jpg"
if _remove_middle_pdf == "true"
File.delete(new_file)
end
i = i + 1
end
end
puts "-" * 50
puts "[pdf.split.finished] #{Time.now}"
puts "-" * 100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment