Skip to content

Instantly share code, notes, and snippets.

@miraculixx miraculixx/sample.py
Last active Feb 27, 2019

Embed
What would you like to do?
Read arbitrary formatted file into pandas dataframe
import pandas as pd
text = """
Men super men size Energy (J) type num g
----------------------------------------------------------------------
50 1 1 1.0234E+03 A abcd 12.1
20 7 4 5.0211E+02 A2 C agcd 14.1
10 2 3 -1.0347E+02 B2 abkd 72.1
"""
columns = []
data = []
import re
def read_text(fin):
for i, line in enumerate(fin):
line = line.split('\n')[0]
if i == 0:
columns = [val for val in re.split(r'\s{3}', line) if val]
continue
if i == 1:
continue
values = [val for val in re.split(r'\s{3}', line) if val]
row = {
col: val
for col, val in zip(columns, values)}
data.append(row)
return pd.DataFrame(data, columns=columns)
read_text(StringIO(text.strip()))
@miraculixx

This comment has been minimized.

@miraculixx

This comment has been minimized.

Copy link
Owner Author

commented Feb 27, 2019

Result

i Men super men size Energy (J) type num g
0 50 1 1 1.0234E+03 A abcd 12.1
1 20 7 4 5.0211E+02 A2 C agcd 14.1
2 10 2 3 -1.0347E+02 B2 abkd 72.1
@miraculixx

This comment has been minimized.

Copy link
Owner Author

commented Feb 27, 2019

Here's a two-liner that does the same thing

rows = [[val for val in re.split("\s{3}", line) if val and not val.startswith('---')] for line in text if line]
pd.DataFrame((row for row in rows[1:] if row), columns=rows[0])

Note the input has to be an array of lines, e.g.

text = """Men      super men        size       Energy (J)    type    num      g
----------------------------------------------------------------------
50          1             1          1.0234E+03    A      abcd   12.1
20          7             4          5.0211E+02    A2 C   agcd   14.1
10          2             3         -1.0347E+02    B2     abkd   72.1
""".split('\n')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.