Last active
October 17, 2017 03:12
-
-
Save seLc7/9bd99365380716f5a3e712a1369a5eda to your computer and use it in GitHub Desktop.
背景:某一个固定目录下定时产生大量小文件,每个文件的内容为每行一个中文人名,因某种原因,人名中可能包含多余的英文字符。要求:对目录下新产生的文件进行监控,多线程实现对文件的处理,并对文件中的人名进行去杂、分析,周期性产出一段时间内的每个人名出现次数的统计结果。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import sys | |
import time | |
import re | |
def detector(dir, sec): | |
"""可以采用多线程 | |
一个时间间隔扫描同一个文件夹,返回不同的内容""" | |
origin = set([_f[2] for _f in os.walk(dir)][0]) | |
time.sleep(sec) | |
final = set([_f[2] for _f in os.walk(dir)][0]) | |
return final.difference(origin) | |
def handler(set): | |
s = u"中文bab#$%$#%#$" | |
name_dict = {} | |
for file_name in set: | |
file = open(file_name) | |
try: | |
read_lines = file.readlines() | |
for row in read_lines: | |
row_filter = row.sub( # 去掉除中文的其他字符 | |
"[A-Za-z0-9\[\`\~\!\@\#\$\^\&\*\(\)\=\|\{\}\'\:\;\'\,\[\]\.\<\>\/\?\~\!\@\#\\\&\*\%]", "", s) | |
if row_filter in name_dict.keys(): | |
name_dict......#字典添加名字个数 | |
except Exception as e: | |
pass | |
# print("something wrong:" + row) | |
finally: | |
file.close() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment