Skip to content

Instantly share code, notes, and snippets.

@jtemporal
Created January 26, 2017 12:50
Show Gist options
  • Select an option

  • Save jtemporal/b322e939eadbac8b472f3bcb1d64c2fd to your computer and use it in GitHub Desktop.

Select an option

Save jtemporal/b322e939eadbac8b472f3bcb1d64c2fd to your computer and use it in GitHub Desktop.
Traceback for the memory error when running Rosie
$ python rosie.py run
2017-01-26 00:04:03 Creating the CSV file
2017-01-26 00:04:03 Reading the XML file
2017-01-26 00:04:04 Writing record #2,609 to the CSV
2017-01-26 00:04:04 Done!
2017-01-26 00:04:04 Creating the CSV file
2017-01-26 00:04:04 Reading the XML file
2017-01-26 00:05:36 Writing record #341,938 to the CSV
2017-01-26 00:05:36 Done!
2017-01-26 00:05:36 Creating the CSV file
2017-01-26 00:05:36 Reading the XML file
2017-01-26 00:16:41 Writing record #2,404,938 to the CSV
2017-01-26 00:16:41 Done!
Merging all datasets…
Loading current-year.xz…
Loading last-year.xz…
Loading previous-years.xz…
Dropping rows without document_value or reimbursement_number…
Grouping dataset by applicant_id, document_id and year…
Gathering all reimbursement numbers together…
Summing all net values together…
Summing all reimbursement values together…
Generating the new dataset…
Casting changes to a new DataFrame…
Writing it to file…
Done.
/home/temporal/Documents/Serenata/rosie/rosie/dataset.py:52: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
dataset['cnpj'] = dataset['cnpj'].str.replace(r'\D', '')
Traceback (most recent call last):
File "rosie.py", line 36, in <module>
command()
File "rosie.py", line 23, in run
rosie.main(target_directory)
File "/home/temporal/Documents/Serenata/rosie/rosie/__init__.py", line 65, in main
Rosie(dataset, target_directory).run_classifiers()
File "/home/temporal/Documents/Serenata/rosie/rosie/__init__.py", line 25, in __init__
self.irregularities = self.dataset[self.DATASET_KEYS].copy()
File "/home/temporal/anaconda3/envs/serenata_rosie/lib/python3.5/site-packages/pandas/core/frame.py", line 2053, in __getitem__
return self._getitem_array(key)
File "/home/temporal/anaconda3/envs/serenata_rosie/lib/python3.5/site-packages/pandas/core/frame.py", line 2098, in _getitem_array
return self.take(indexer, axis=1, convert=True)
File "/home/temporal/anaconda3/envs/serenata_rosie/lib/python3.5/site-packages/pandas/core/generic.py", line 1666, in take
self._consolidate_inplace()
File "/home/temporal/anaconda3/envs/serenata_rosie/lib/python3.5/site-packages/pandas/core/generic.py", line 2801, in _consolidate_inplace
self._protect_consolidate(f)
File "/home/temporal/anaconda3/envs/serenata_rosie/lib/python3.5/site-packages/pandas/core/generic.py", line 2790, in _protect_consolidate
result = f()
File "/home/temporal/anaconda3/envs/serenata_rosie/lib/python3.5/site-packages/pandas/core/generic.py", line 2799, in f
self._data = self._data.consolidate()
File "/home/temporal/anaconda3/envs/serenata_rosie/lib/python3.5/site-packages/pandas/core/internals.py", line 3526, in consolidate
bm._consolidate_inplace()
File "/home/temporal/anaconda3/envs/serenata_rosie/lib/python3.5/site-packages/pandas/core/internals.py", line 3531, in _consolidate_inplace
self.blocks = tuple(_consolidate(self.blocks))
File "/home/temporal/anaconda3/envs/serenata_rosie/lib/python3.5/site-packages/pandas/core/internals.py", line 4523, in _consolidate
_can_consolidate=_can_consolidate)
File "/home/temporal/anaconda3/envs/serenata_rosie/lib/python3.5/site-packages/pandas/core/internals.py", line 4546, in _merge_blocks
new_values = new_values[argsort]
MemoryError
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment