Last active
February 27, 2019 22:48
-
-
Save miraculixx/26aeb6d614c8adde95aff3719d5c4119 to your computer and use it in GitHub Desktop.
Read arbitrary formatted file into pandas dataframe
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
text = """ | |
Men super men size Energy (J) type num g | |
---------------------------------------------------------------------- | |
50 1 1 1.0234E+03 A abcd 12.1 | |
20 7 4 5.0211E+02 A2 C agcd 14.1 | |
10 2 3 -1.0347E+02 B2 abkd 72.1 | |
""" | |
columns = [] | |
data = [] | |
import re | |
def read_text(fin): | |
for i, line in enumerate(fin): | |
line = line.split('\n')[0] | |
if i == 0: | |
columns = [val for val in re.split(r'\s{3}', line) if val] | |
continue | |
if i == 1: | |
continue | |
values = [val for val in re.split(r'\s{3}', line) if val] | |
row = { | |
col: val | |
for col, val in zip(columns, values)} | |
data.append(row) | |
return pd.DataFrame(data, columns=columns) | |
read_text(StringIO(text.strip())) | |
Result
i | Men | super men | size | Energy (J) | type | num | g |
---|---|---|---|---|---|---|---|
0 | 50 | 1 | 1 | 1.0234E+03 | A | abcd | 12.1 |
1 | 20 | 7 | 4 | 5.0211E+02 | A2 C | agcd | 14.1 |
2 | 10 | 2 | 3 | -1.0347E+02 | B2 | abkd | 72.1 |
Here's a two-liner that does the same thing
rows = [[val for val in re.split("\s{3}", line) if val and not val.startswith('---')] for line in text if line]
pd.DataFrame((row for row in rows[1:] if row), columns=rows[0])
Note the input has to be an array of lines, e.g.
text = """Men super men size Energy (J) type num g
----------------------------------------------------------------------
50 1 1 1.0234E+03 A abcd 12.1
20 7 4 5.0211E+02 A2 C agcd 14.1
10 2 3 -1.0347E+02 B2 abkd 72.1
""".split('\n')
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
https://stackoverflow.com/questions/54914841/how-to-do-a-formatted-pandas-read