Skip to content

Instantly share code, notes, and snippets.

@detrout
Created July 21, 2017 22:38
Show Gist options
  • Save detrout/26459718e933bcc3feab0151fc66bbee to your computer and use it in GitHub Desktop.
Save detrout/26459718e933bcc3feab0151fc66bbee to your computer and use it in GitHub Desktop.
dask notes[0] body
Dask doesn't support the following argument(s).
* buf
* columns
* col_space
* header
* index
* na_rep
* formatters
* float_format
* sparsify
* index_names
* justify
* bold_rows
* classes
* escape
* max_cols
* show_dimensions
* notebook
* decimal
* border
.. py:method:: DataFrame.to_parquet(path, *args, **kwargs)
:module: dask.dataframe
Store Dask.dataframe to Parquet files
:Parameters:
**path** : string
Destination directory for data. Prepend with protocol like ``s3://``
or ``hdfs://`` for remote data.
**df** : Dask.dataframe
**compression** : string or dict
Either a string like "SNAPPY" or a dictionary mapping column names to
compressors like ``{"name": "GZIP", "values": "SNAPPY"}``
**write_index** : boolean
Whether or not to write the index. Defaults to True *if* divisions are
known.
**has_nulls** : bool, list or 'infer'
Specifies whether to write NULLs information for columns. If bools,
apply to all columns, if list, use for only the named columns, if
'infer', use only for columns which don't have a sentinel NULL marker
(currently object columns only).
**fixed_text** : dict {col: int}
For column types that are written as bytes (bytes, utf8 strings, or
json and bson-encoded objects), if a column is included here, the
data will be written in fixed-length format, which should be faster
but can potentially result in truncation.
**object_encoding** : dict {col: bytes|utf8|json|bson} or str
For object columns, specify how to encode to bytes. If a str, same
encoding is applied to all object columns.
**storage_options** : dict
Key/value pairs to be passed on to the file-system backend, if any.
**append: bool (False)**
If False, construct data-set from scratch; if True, add new
row-group(s) to existing data-set. In the latter case, the data-set
must exist, and the schema must match the input data.
**ignore_divisions: bool (False)**
If False raises error when previous divisions overlap with the new
appended divisions. Ignored if append=False.
**partition_on: list**
Construct directory-based partitioning by splitting on these fields'
values. Each dask partition will result in one or more datafiles,
there will be no global groupby.
**compute: bool (True)**
If true (default) then we compute immediately.
If False then we return a dask.delayed object for future computation.
**This uses the fastparquet project:**
**http://fastparquet.readthedocs.io/en/latest**
.. seealso::
:obj:`read_parquet`
Read parquet data to dask.dataframe
.. rubric:: Notes
Each partition will be written to a separate file.
.. rubric:: Examples
>>> df = dd.read_csv(...) # doctest: +SKIP
>>> to_parquet('/path/to/output/', df, compression='SNAPPY') # doctest: +SKIP
.. py:method:: DataFrame.to_records(index=False)
:module: dask.dataframe
Create Dask Array from a Dask Dataframe
Warning: This creates a dask.array without precise shape information.
Operations that depend on shape information, like slicing or reshaping,
will not work.
.. seealso::
:obj:`dask.dataframe._Frame.values`, :obj:`dask.dataframe.from_dask_array`
.. rubric:: Examples
>>> df.to_records() # doctest: +SKIP
dask.array<shape=(nan,), dtype=(numpy.record, [('ind', '<f8'), ('x', 'O'), ('y', '<i8')]), chunksize=(nan,)>
.. py:method:: DataFrame.to_string(max_rows=5)
:module: dask.dataframe
Render a DataFrame to a console-friendly tabular output.
:Parameters:
**buf** : StringIO-like, optional
buffer to write to
**columns** : sequence, optional
the subset of columns to write; default None writes all columns
**col_space** : int, optional
the minimum width of each column
**header** : bool, optional
Write out column names. If a list of string is given, it is assumed to be aliases for the column names
**index** : bool, optional
whether to print index (row) labels, default True
**na_rep** : string, optional
string representation of NAN to use, default 'NaN'
**formatters** : list or dict of one-parameter functions, optional
formatter functions to apply to columns' elements by position or name,
default None. The result of each function must be a unicode string.
List must be of length equal to the number of columns.
**float_format** : one-parameter function, optional
formatter function to apply to columns' elements if they are floats,
default None. The result of this function must be a unicode string.
**sparsify** : bool, optional
Set to False for a DataFrame with a hierarchical index to print every
multiindex key at each row, default True
**index_names** : bool, optional
Prints the names of the indexes, default True
**line_width** : int, optional
Width to wrap a line in characters, default no wrap
**justify** : {'left', 'right'}, default None
Left or right-justify the column labels. If None uses the option from
the print configuration (controlled by set_option), 'right' out
of the box.
:Returns:
**formatted** : string (or unicode, depending on data and options)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment