Skip to content

Instantly share code, notes, and snippets.

View arnaldoleandro's full-sized avatar
🎯
Focusing in learning python and NLP

Arnaldo arnaldoleandro

🎯
Focusing in learning python and NLP
  • Personal Projects
  • Sao Paulo - SP - Brasil
View GitHub Profile
@arnaldoleandro
arnaldoleandro / buffering_until
Created April 3, 2023 11:36
Buffers until size limit is reached
for line in file:
stripped_line = line.strip()
lines_buffer.append(line)
if last_element_close in stripped_line:
size += len(line.encode('utf-8'))
if size > size_limit:
output_file = os.path.join(output_dir, f"{output_file_prefix}_{file_number}.xml")
with open(output_file, 'w', encoding='utf-8') as out_file:
out_file.write(header)
@arnaldoleandro
arnaldoleandro / parameters
Created April 3, 2023 10:53
setting up for your needs
def main():
filename = 'your_input_file.xml'
input_file = f'path/to/your/input/folder/{filename}'
size_limit = 200 #MEGABYTES
size_limit = size_limit * 10240 #convert to BYTES
output_dir = 'path/to/your/output/folder/'
output_file_prefix = 'output_file_prefix_'
root_tag = get_root_tag(input_file)
print(root_tag)
@arnaldoleandro
arnaldoleandro / reading
Created April 3, 2023 10:49
initial steps
with open(input_file, 'r', encoding='utf-8') as file:
first_line = file.readline()
second_line = file.readline()
header = f"{first_line}{second_line}"
footer = f'</{root_tag}>'
file_number = 1
size = 0
lines_buffer = []
@arnaldoleandro
arnaldoleandro / root_tag
Created April 3, 2023 10:39
stores xml's root tag
def get_root_tag(input_file):
with open(input_file, 'r', encoding='utf-8') as file:
file.readline()
second_line = file.readline().strip()
root_tag = second_line[1:].split(' ', 1)[0]
return root_tag[:-1] if root_tag.endswith('>') else root_tag