Skip to content

Instantly share code, notes, and snippets.

@LenKIM
Last active September 18, 2018 01:29
Show Gist options
  • Save LenKIM/a48e007ea24197293c5a333baa60ca94 to your computer and use it in GitHub Desktop.
Save LenKIM/a48e007ea24197293c5a333baa60ca94 to your computer and use it in GitHub Desktop.
[파이썬] 아웃풋이 되는 공유자원-리스트를 얻기 위한 멀티스레드 VS 멀티 프로세스 비교
#!/usr/bin/python3
import glob
import multiprocessing
import os
import time
from typing import List
from tqdm import tqdm
from function.function_01 import Function01Impl
from helpers.log_parser_helper import LogParserHelper
file_paths = glob.glob("/Users/len/log-analyer-assignment/logdata/20180824/*.txt")
def get_filename_with_ext(filepath):
return os.path.basename(filepath)
def read_log_files(file_paths):
jobs = []
manager = multiprocessing.Manager()
return_list = manager.list()
# return_tuple = manager.Queue()
# return_dict = manager.dict()
for file_path in file_paths:
p = multiprocessing.Process(target=read_log_worker, args=(file_path, return_list))
jobs.append(p)
p.start()
for proc in jobs:
proc.join()
return return_list
def read_log_worker(file_path, return_list) -> List:
f = open(file_path, 'r', encoding='utf8')
lines = f.readlines()
line_count = 0
for line in tqdm(lines):
a = LogParserHelper.raw_log_parser(line)
line_count = line_count + 1
return_list.append(a)
f.close()
return return_list
s = time.time()
function_01 = Function01Impl()
list = read_log_files(file_paths)
e = time.time()
print(e - s)
# for key in dic.keys():
# print(key, ":", dic[key])
#!/usr/bin/python3
import threading
import time
from tqdm import tqdm
from function.function_01 import Function01Impl
from helpers.log_parser_helper import LogParserHelper
exitFlag = 0
_all = []
class myThread(threading.Thread):
def __init__(self, threadID, file_path):
threading.Thread.__init__(self)
self.threadID = threadID
self.file_path = file_path
def run(self):
f = open(file=self.file_path, mode='r', encoding='utf8')
f_write = open('/Users/len/log-analyer-assignment/out/test2-1.csv', 'w', encoding='utf8')
lines = f.readlines()
for line in tqdm(lines):
a = LogParserHelper.raw_log_parser(line)
_all.append(a)
for line in _all:
f_write.writelines('|'.join(line) + '\n')
s = time.time()
thread1 = myThread(1, "/Users/len/log-analyer-assignment/logdata/20180824/ap1.daouoffice.com_access_2018-08-24.txt")
thread2 = myThread(2, "/Users/len/log-analyer-assignment/logdata/20180824/ap2.daouoffice.com_access_2018-08-24.txt")
thread3 = myThread(3, "/Users/len/log-analyer-assignment/logdata/20180824/ap3.daouoffice.com_access_2018-08-24.txt")
# Start new Threads
thread1.start()
thread2.start()
thread3.start()
thread1.join()
thread2.join()
thread3.join()
e = time.time()
print(len(_all))
function_01 = Function01Impl()
# abc = function_01.extract_the_longest_response_time_request_api(_all)
# print(abc)
print("Exiting Main Thread")
print(e - s)

멀티쓰레딩 VS 멀티프로세스

멀티프로세스 상황 (cf, 행아웃을 켜놓은 상태에서 진행됨.)

it/s => Insert Time / Second

Common

라인을 파싱 한 후-

Output은 동기화된 하나의 List가 나온다.

List 를 안했을 경우

     99%|█████████▉| 2789592/2805010 [02:54<00:00, 16029.65it/s]
    100%|█████████▉| 2794039/2805010 [02:54<00:00, 16043.91it/s]
    100%|█████████▉| 2799177/2805010 [02:54<00:00, 16064.19it/s]
    100%|█████████▉| 2804653/2805010 [02:54<00:00, 16086.38it/s]
    100%|██████████| 2805010/2805010 [02:54<00:00, 16085.95it/s]
    178.66

List를 append했을 경우

    1번째
    100%|█████████▉| 2802522/2805010 [22:29<00:01, 2075.98it/s]
    100%|█████████▉| 2802970/2805010 [22:30<00:00, 2076.16it/s]
    100%|█████████▉| 2803382/2805010 [22:30<00:00, 2076.23it/s]
    100%|█████████▉| 2803823/2805010 [22:30<00:00, 2076.41it/s]
    100%|█████████▉| 2804254/2805010 [22:30<00:00, 2076.57it/s]
    100%|█████████▉| 2804657/2805010 [22:30<00:00, 2076.69it/s]
    100%|██████████| 2805010/2805010 [22:30<00:00, 2076.84it/s]
    
    2번째
    100%|█████████▉| 2801841/2805010 [18:29<00:01, 2524.32it/s]
    100%|█████████▉| 2802551/2805010 [18:30<00:00, 2524.69it/s]
    100%|█████████▉| 2803225/2805010 [18:30<00:00, 2525.05it/s]
    100%|█████████▉| 2803920/2805010 [18:30<00:00, 2525.45it/s]
    100%|█████████▉| 2804593/2805010 [18:30<00:00, 2525.74it/s]
    100%|██████████| 2805010/2805010 [18:30<00:00, 2525.88it/s]

멀티쓰레딩(3개)

List를 사용 안했을 경우

     96%|█████████▌| 2664284/2789569 [00:07<00:00, 377145.39it/s]
    100%|██████████| 2690228/2690228 [00:11<00:00, 226615.12it/s]
    100%|██████████| 2805010/2805010 [00:09<00:00, 284780.62it/s]
     98%|█████████▊| 2745558/2789569 [00:07<00:00, 383870.05it/s]
    100%|██████████| 2789569/2789569 [00:08<00:00, 345637.54it/s]
    36.15
    
    디버깅으로 실행시 :
     99%|█████████▉| 2761917/2789569 [00:45<00:00, 60586.04it/s]
     99%|█████████▉| 2770462/2789569 [00:45<00:00, 60626.48it/s]
    100%|█████████▉| 2779615/2789569 [00:45<00:00, 60693.44it/s]
    100%|██████████| 2789569/2789569 [00:45<00:00, 60784.47it/s]
    100%|██████████| 2805010/2805010 [00:45<00:00, 61459.50it/s]
    Exiting Main Thread

List를 사용하고 append했을 경우

     99%|█████████▉| 2788969/2805010 [06:27<00:02, 7198.46it/s]
    100%|█████████▉| 2793281/2805010 [06:27<00:01, 7207.73it/s]
    100%|█████████▉| 2797397/2805010 [06:27<00:01, 7215.99it/s]
    100%|█████████▉| 2802001/2805010 [06:27<00:00, 7225.95it/s]
    100%|██████████| 2805010/2805010 [06:29<00:00, 7200.06it/s]
    
    디버깅으로 실행시:
    100%|█████████▉| 2786107/2789569 [11:40<00:00, 3977.83it/s]
    100%|█████████▉| 2787206/2789569 [11:40<00:00, 3978.83it/s]
    100%|█████████▉| 2788291/2789569 [11:40<00:00, 3979.75it/s]
    100%|██████████| 2789569/2789569 [11:40<00:00, 3980.53it/s]
    100%|██████████| 2805010/2805010 [11:43<00:00, 3985.64it/s]
    726.7834770679474

하나의 프로세스 설정하고 3개의 쓰레드

List를 사용하고 append했을 경우

    99%|█████████▉| 2672578/2690228 [22:10<00:08, 2009.24it/s]�[A�[A
     96%|█████████▌| 2683970/2789569 [22:11<00:52, 2015.98it/s]�[A
    
    97%|█████████▋| 2716164/2805010 [23:41<00:46, 1910.37it/s]
     97%|█████████▋| 2716582/2805010 [23:41<00:46, 1910.53it/s]
     97%|█████████▋| 2716988/2805010 [23:42<00:46, 1910.66it/s]
     97%|█████████▋| 2717419/2805010 [23:42<00:45, 1910.83it/s]
     97%|█████████▋| 2717903/2805010 [23:42<00:45, 1911.04it/s]
    100%|█████████▉| 2788819/2789569 [23:41<00:00, 1961.70it/s]�[A
     97%|█████████▋| 2718331/2805010 [23:42<00:45, 1911.19it/s]
    100%|██████████| 2805010/2805010 [24:03<00:00, 1943.40it/s]

2개의 프로세스, 3개의 쓰레드

List를 사용하고 append했을 경우

     1%|          | 8514/1380802 [00:08<21:39, 1056.05it/s]
    
      0%|          | 3627/1380802 [00:03<20:05, 1142.17it/s]�[A�[A
      0%|          | 3263/1380802 [00:03<21:20, 1076.09it/s]�[A
      0%|          | 6067/1380797 [00:04<16:50, 1360.77it/s]�[A
    
      0%|          | 3268/1380802 [00:02<20:50, 1101.25it/s]�[A�[A
    
      1%|          | 8652/1380802 [00:08<21:38, 1056.50it/s]

REFERENCE_CODE

  1. 같은 코드라도 디버깅모드에서의 Run과 일반 Run과는 차이가 있다.
  2. 각 라인에 대해서 Dict insert 또는 List append 를 추가시 파일 입력속도가 최고 2배까지 느려진다.(멀티 프로세스에서 확인 할 수 있었음. 멀티프로세스에서 공유 자원을 활용하기 때문)
  3. 많은 프로세스와 쓰레드를 사용한다고 해서 속도가 빨라지는 것은 아니다.
  4. 하나의 공유되는자원이 있다면 쓰레드를 쓰는 것이 현명하다.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment