1c7/extract base64 <img> to image file

Last active August 31, 2018 11:06

Star () You must be signed in to star a gist
Fork () You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/1c7/4ab1422d57dcb752917d6be1563fbc36.js"></script>
Save 1c7/4ab1422d57dcb752917d6be1563fbc36 to your computer and use it in GitHub Desktop.

Download ZIP

Fix accidentally store base64 img into database

Raw

extract base64 <img> to image file

http://1c7.me/2018-8-31-fix-base64-img-in-database-to-url/

Author

1c7 commented Aug 31, 2018 •

edited

Loading

SQL

What article are big? sort by length

select id, title, length(content_html), status from topics order by length(content_html) desc

Author

1c7 commented Aug 31, 2018 •

edited

Loading

Ruby

save article into .html file

  def clean
    id_array = [434]
    id_array.each do |one|
      filename = "./topic_id_#{one}.html"
      t = Topic.unscoped.find(one)
      open(filename, 'w') { |f| f << t.content_html} 
    end
    render json: 'done'
    return
  end

Author

1c7 commented Aug 31, 2018 •

edited

Loading

Python3 (The key part, Extract base64 into image file)

filename: decode_base64_to_image.py
you should modify following code to fit your need

# Python 3
import re
import base64
import os

# we would call this function later
def handle_file(filename, id):
  infile = open(filename, 'r')
  content = infile.read()

  # Regular Expression match <img src=''
  pattern = re.compile(r'<img [^>]*src="([^"]+)')
  images = pattern.findall(content) # array

  # loop matched
  for index, url in enumerate(images):
    if url.startswith('data:image'): # 'data:image/jpeg;base64,' 'data:image/png;base64,'
      # get file suffix like "jpeg" "png"
      array = url.split('/')
      suffix = array[1].split(';')[0]
      print(suffix)
      # get the part after 'data:image/jpeg;base64,'
      data_array = url.split(',')
      base64_naked = data_array[1]
      imgdata = base64.b64decode(base64_naked)
      save_filename = 'topic_id_' + str(id) + '_'+ str(index) + '.' + suffix
      # save image
      with open(save_filename, 'wb') as f:
        f.write(imgdata)
      # replace content
      img_wittcism = 'https://img.example.com/' + save_filename
      content = content.replace(url, img_wittcism)
  # save content to new file
  final_file = open(filename+".txt", "w")
  final_file.write(content)

id_array = [434]
# loop and call function
for id in id_array:
  filename = "./topic_id_{one}.html".format(one=id) # filename like "topic_id_353.html"
  handle_file(filename, id)

Author

1c7 commented Aug 31, 2018 •

edited

Loading

Ruby

insert into db

def inser_into_db
    id = '434'
    content = '[内容填这里]'

    t = Topic.find(id)
    t.content_html = content
    t.save
    render json: "成功更新文章id #{id} 内容"
    return
end

Author

1c7 commented Aug 31, 2018

Blog: http://1c7.me/2018-8-31-fix-base64-img-in-database-to-url/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment