taylor224/README.md

## README.md

      
    Raw
  

              README.md
            
          
    티스토리 블로그 백업하기 & 워드프레스로 이사하기

Tistory 블로그 백업 기능이 삭제 되어 백업할수 없는 분들을 위한 백업 스크립트 및 워드프레스로의 이전 방법입니다.

우선 tistory 의 스킨을 [사용중TickTalk(사용자 수정/업로드) ver.1.0(사용자 수정/업로드)] 으로 변경하시기 바랍니다. 본 스크립트와 설명은 이 스킨을 기반으로 제작되었습니다.
또한 환경설정 - 기본정보 - 블로그 정보 - 주소설정 에서 주소방식을 [숫자 (http://notice.tistory.com/123)] 로 변경하시기 바랍니다.
워드프레스를 미리 설치하신 후 워드프레스 블로그의 url 및 기타 설정을 미리 진행하셔야 합니다.
본 사항을 미리 수행하지 않을경우 스크립트가 동작하지 않을수 있습니다.
진행하기 전 모든 부분에 대해서 읽어보신 후 진행하시기 바랍니다. 사전작업이 필요할 수 있습니다.
티스토리 백업하기


우선 tistory_backup.py 를 열어 맨 위의 base_url 을 백업할 블로그의 url 로 변경합니다.
본 스크립트는 이미지 주소를 워드프레스용 주소로 미리 치환합니다. 따라서 워드프레스에 tistory 에서 백업된 사진을 올릴 폴더를 미리 셋팅한 뒤 해당 폴더의 상대주소를 입력해줍니다. 저의 경우는 워드프레스 폴더의 /wp-content/uploads/tistory 폴더를 만들어 업로드 할 예정이었으므로 해당 상대경로를 입력하였습니다.
블로그를 갓 만든 상태에선 uploads 폴더가 없을수 있습니다. 없으면 만들어주셔야 합니다. 향후 첨부파일이 올라갈 디렉터리 입니다.
파이썬으로 스크립트를 돌립니다. 본 스크립트는 파이썬 2.7 기준으로 제작되었습니다. python tistory_backup.py
나온 data.xml 파일을 잘 보관합니다.

워드프레스에 글 Import 하기


앞서 백업한 티스토리 데이터 파일 중 image 폴더에 있는 이미지들을 전체 다 앞서 지정한 서버 폴더에 업로드 합니다. 저의 경우는 /wp-content/uploads/tistory 폴더에 업로드 하였습니다.
WP All Import 라는 plugin 을 다운받아 설치하고 Activate 시킵니다. https://wordpress.org/plugins/wp-all-import/
관리자 페이지 왼쪽 밑의 All Import 메뉴에서 New Import 를 선택합니다.
Upload a file 버튼을 선택하고 앞서 만들어진 data.xml 파일을 선택하고 업로드 합니다.
업로드 이후 표시된 화면에서 New Items 박스버튼을 선택한 후 Create new [Posts] 를 선택한 이후 아래의 [Continue to Step 2] 버튼을 누릅니다.
다음 화면에서 왼쪽의 박스 버튼 리스트에서 item 을 클릭하신 후 아래의 Continue to Step 3 를 클릭합니다.
다음에 표시된 화면의 오른쪽 XML tree 에서 <title> 을 클릭하여 왼쪽의 제목창으로 drag & drop 합니다.
같은 방법으로 content 도 왼쪽의 본문 창으로 drag & drop 합니다.
본문창 바로 아래의 Advanced Options 창을 눌러서 열은 뒤 모두다 체크 해제하고 Decode HTML entities with html_entity_decode 만 활성화 합니다.
아래의 Images 드롭다운 메뉴를 눌러 열은 뒤 라디오 체크박스에서 Use images currently in Media Library 를 선택합니다.
그 다음 아래의 Taxonomies, Categories, Tags 드랍다운 창을 열어서 카테고리나 태그를 설정합니다. 일괄로 Tistory_backup 같은 카테고리나 태그를 설정할수도 있고 백업된 파일에서 가져와서 원래의 카테고리를 설정할수도 있습니다. 원래의 카테고리를 설정할 경우 오른쪽 XML tree 에서 category 를 입력창으로 drag & drop 합니다.
그 다음 아래의 Other Post Options 를 누른 뒤 Post Dates 부분의 입력창에서 now 를 지우고 오른쪽 XML tree 에서 date 를 drag & drop 합니다.
그 다음 맨 아래의 Continue to Step 4 를 클릭합니다.
Unique Idenfier 부분에 오른쪽 XML Tree 의 id 를 drap & drop 하고 맨 아래의 Continue 를 클릭하고 최종 Import 진행 버튼을 눌러 Import 를 진행합니다.
게시글의 숫자에 따라 다소 시간이 소요될수 있습니다.

게시글의 id 수정하기

게시글의 id 가 단순히 순차적으로 증가된 값으로 등록되기 때문에 원래의 Tistory 블로그 주소 숫자와 매칭이 되지 않을수 있습니다. 따라서 db 를 수정하여 해당 부분을 바로잡아줄수 있습니다. 기존 블로그에서 Redirect 를 시킬 필요가 없다면 본 작업은 수행하지 않아도 됩니다.
본 작업을 위해서는 워드프레스도 동일하게 블로그 주소 방식을 숫자로 변경해야 합니다. 워드프레스 설정에서 Settings - Permalinks - Common Settings 에서 Custom Structure 를 선택하신 후 "/%post_id%" 라고 입력합니다(따옴표는 제외).
본 작업을 수행하기 전에 wordpress db 를 dump 하여 백업해 놓으신뒤 진행하시기 바랍니다. 본 작업으로 인한 데이터 손실은 책임지지 않습니다. 빈 워드프레스 DB 에서 진행할것을 권장합니다.
또한 기존에 워드프레스 글 데이터가 있을 경우 ID 값이 충돌하여 에러가 발생할수 있습니다. 이 경우 본 작업은 진행하시면 안됩니다.

DB 관리툴을 열어 wp_posts (설정에 따라 앞의 wp 부분 prefix가 다를수 있음) 에서 새로 import 된 글의 guid 부분을 확인하고 복사합니다.
해당 guid 에서 뒤의 숫자 부분만 지운 url 을 change_index.sql 의 아래 블로그 주소 부분을 지우고 넣습니다.
자신의 워드프레스 db 테이블 이름에 맞게 wp_posts 와 wp_pmxi_posts 부분을 수정한 후 sql 문을 수행합니다.
sql 문이 잘 수행이 되었는지 확인합니다. 잘 수행이 되었다면 id 가 기존 티스토리 블로그에 매칭되도록 변경됩니다.
블로그 주소로 접속하여 확인해봅니다.

티스토리를 자동으로 새 블로그로 Redirect 하도록 수정하기

티스토리 관리자 페이지에서 꾸미기 - HTML/CSS 편집에 들어간 뒤 </head> 바로 윗 부분에 tistory_auto_redirect.js 파일에 있는 코드를 붙여넣습니다.
붙여넣을때에 해당 스크립트의 윗 부분에 있는 BLOG_SITE_URL 의 변수를 자신의 새로운 블로그 URL 로 변경합니다.
본 스크립트를 추가하고 저장하면 이 스크립트가 자동으로 블로그의 같은 번호의 글을 체크하여 새로운 블로그에 같은 번호의 글이 있을때만 Redirect 시킵니다.
본 스크립트에 대한 별도의 문의는 받지 않습니다.

by. Taylor Starfield (https://blog.kloa.kr)


## change_index.sql
UPDATE wp_posts AS t
   INNER JOIN wp_pmxi_posts AS tr
   ON t.id = tr.post_id
   SET t.id = tr.unique_key;

UPDATE wp_posts SET guid = CONCAT("https://blog.domain.com/archives/", CAST(id AS CHAR)) WHERE id > 40;

## tistory_auto_redirect.js
<script>
var BLOG_SITE_URL = 'https://blog.domain.com'

function isUrlExists(url, page_url)
{
    $.getJSON("http://query.yahooapis.com/v1/public/yql?"+
      "q=select%20*%20from%20html%20where%20url%3D%22"+
      encodeURIComponent(url)+"%22&format=json'&callback=?", function(data){
          if(data.results[0]){
              window.location.replace(BLOG_SITE_URL + page_url);
          }
          else{
          }
      }
    );
}

var page_url = window.location.pathname;

if (page_url.length > 1) {
	isUrlExists(BLOG_SITE_URL + page_url, page_url);
}
</script>

## tistory_backup.py
# -*- coding: utf8 -*-. #
from __future__ import unicode_literals

import urllib2
import BeautifulSoup
import time
import dicttoxml
import sys
import urllib

base_url = 'yourblog.tistory.com'

image_content_url = '/wp-content/uploads/tistory/'

article_data = []

error_count = 0
error_max_count = 50

try:
    for index in range(42, 1000):
        u = None

        try:
            u = urllib2.urlopen('http://' + base_url + '/' + str(index))
        except urllib2.HTTPError as e:
            if e.code == 404:
                print('Post ' + str(index) + ' - Not Found')

                error_count += 1

                if error_count > error_max_count:
                    print('Error Count is over the max value. Stopping the crawler')
                    break

                time.sleep(3)
                continue
            else:
                raise

        error_count = 0

        b = BeautifulSoup.BeautifulSoup(u.read())

        # Check 404 also
        absent_post = b.find('div', {'class': 'absent_post'})

        if absent_post:
            print('Post ' + str(index) + ' - Not Found')
            time.sleep(3)
            continue

        # Get Title of article
        title = b.find('div', {'class': 'tit'}).find('a', {'class': 'link'}).contents[0].encode('utf-8')
        category = b.find('div', {'class': 'tit'}).find('a', {'class': 'category'}).contents[0].encode('utf-8')
        date = b.find('div', {'class': 'tit'}).find('span', {'class': 'date'}).contents[0].replace('.', '-').replace(' ', 'T') + ':00'
        # Get Atricle from page
        article = b.find('div', {'class': 'desc'})
        # Delete another category article element
        another_category = article.find('div', {'class': 'another_category another_category_color_gray'})

        if another_category:
            another_category.decompose()
        # Delete Copyright element
        article.find('div', {'style': 'width:100%;margin-top:30px;clear:both;height:30px'}).decompose()

        imagelist = article.findAll('img')

        for image in imagelist:
            if '//cfile' in image['src']:
                filename = image['src'].split('/')[-1]

                filetype = ''

                if 'itistory-photo' in image['filename']:
                    if image['filemime'] == 'image/jpeg' or image['filemime'] == 'image/jpg':
                        filetype = 'jpg'
                    elif image['filemime'] == 'image/png':
                        filetype = 'png'
                    elif image['filemime'] == 'image/gif':
                        filetype = 'gif'
                else:
                    filetype = image['filename'].split('.')[-1].lower()

                print(filename + '.' + filetype)

                try:
                    urllib.urlretrieve(image['src'], 'image/' + filename + '.' + filetype)
                except Exception:
                    print('File retrieve error - ' + image['src'])

                # Object is soft copy object. It will change root object value.
                image['src'] = image_content_url + filename + '.' + filetype

                image['onclick'] = None

        print('Post ' + str(index) + ' - [' + category.decode('utf-8') + ']' + title.decode('utf-8') + ' - ' + date)

        article_data.append({
                'id': index,
                'title': title,
                'date': date,
                'category': category,
                'content': str(article)
            })

        time.sleep(3)
except KeyboardInterrupt:
    xml = dicttoxml.dicttoxml(article_data, custom_root='article', attr_type=False)

    file = open('data.xml', 'w')
    file.write(xml)
    file.close()

    sys.exit(0)

xml = dicttoxml.dicttoxml(article_data, custom_root='article', attr_type=False)

file = open('data.xml', 'w')
file.write(xml)
file.close()
	UPDATE wp_posts AS t
	INNER JOIN wp_pmxi_posts AS tr
	ON t.id = tr.post_id
	SET t.id = tr.unique_key;

	UPDATE wp_posts SET guid = CONCAT("https://blog.domain.com/archives/", CAST(id AS CHAR)) WHERE id > 40;
	<script>
	var BLOG_SITE_URL = 'https://blog.domain.com'

	function isUrlExists(url, page_url)
	{
	$.getJSON("http://query.yahooapis.com/v1/public/yql?"+
	"q=select%20*%20from%20html%20where%20url%3D%22"+
	encodeURIComponent(url)+"%22&format=json'&callback=?", function(data){
	if(data.results[0]){
	window.location.replace(BLOG_SITE_URL + page_url);
	}
	else{
	}
	}
	);
	}

	var page_url = window.location.pathname;

	if (page_url.length > 1) {
	isUrlExists(BLOG_SITE_URL + page_url, page_url);
	}
	</script>
	# -- coding: utf8 --. #
	from __future__ import unicode_literals

	import urllib2
	import BeautifulSoup
	import time
	import dicttoxml
	import sys
	import urllib

	base_url = 'yourblog.tistory.com'

	image_content_url = '/wp-content/uploads/tistory/'

	article_data = []

	error_count = 0
	error_max_count = 50

	try:
	for index in range(42, 1000):
	u = None

	try:
	u = urllib2.urlopen('http://' + base_url + '/' + str(index))
	except urllib2.HTTPError as e:
	if e.code == 404:
	print('Post ' + str(index) + ' - Not Found')

	error_count += 1

	if error_count > error_max_count:
	print('Error Count is over the max value. Stopping the crawler')
	break

	time.sleep(3)
	continue
	else:
	raise

	error_count = 0

	b = BeautifulSoup.BeautifulSoup(u.read())

	# Check 404 also
	absent_post = b.find('div', {'class': 'absent_post'})

	if absent_post:
	print('Post ' + str(index) + ' - Not Found')
	time.sleep(3)
	continue

	# Get Title of article
	title = b.find('div', {'class': 'tit'}).find('a', {'class': 'link'}).contents[0].encode('utf-8')
	category = b.find('div', {'class': 'tit'}).find('a', {'class': 'category'}).contents[0].encode('utf-8')
	date = b.find('div', {'class': 'tit'}).find('span', {'class': 'date'}).contents[0].replace('.', '-').replace(' ', 'T') + ':00'
	# Get Atricle from page
	article = b.find('div', {'class': 'desc'})
	# Delete another category article element
	another_category = article.find('div', {'class': 'another_category another_category_color_gray'})

	if another_category:
	another_category.decompose()
	# Delete Copyright element
	article.find('div', {'style': 'width:100%;margin-top:30px;clear:both;height:30px'}).decompose()

	imagelist = article.findAll('img')

	for image in imagelist:
	if '//cfile' in image['src']:
	filename = image['src'].split('/')[-1]

	filetype = ''

	if 'itistory-photo' in image['filename']:
	if image['filemime'] == 'image/jpeg' or image['filemime'] == 'image/jpg':
	filetype = 'jpg'
	elif image['filemime'] == 'image/png':
	filetype = 'png'
	elif image['filemime'] == 'image/gif':
	filetype = 'gif'
	else:
	filetype = image['filename'].split('.')[-1].lower()

	print(filename + '.' + filetype)

	try:
	urllib.urlretrieve(image['src'], 'image/' + filename + '.' + filetype)
	except Exception:
	print('File retrieve error - ' + image['src'])

	# Object is soft copy object. It will change root object value.
	image['src'] = image_content_url + filename + '.' + filetype

	image['onclick'] = None

	print('Post ' + str(index) + ' - [' + category.decode('utf-8') + ']' + title.decode('utf-8') + ' - ' + date)

	article_data.append({
	'id': index,
	'title': title,
	'date': date,
	'category': category,
	'content': str(article)
	})

	time.sleep(3)
	except KeyboardInterrupt:
	xml = dicttoxml.dicttoxml(article_data, custom_root='article', attr_type=False)

	file = open('data.xml', 'w')
	file.write(xml)
	file.close()

	sys.exit(0)

	xml = dicttoxml.dicttoxml(article_data, custom_root='article', attr_type=False)

	file = open('data.xml', 'w')
	file.write(xml)
	file.close()