Created
July 21, 2017 22:38
-
-
Save detrout/26459718e933bcc3feab0151fc66bbee to your computer and use it in GitHub Desktop.
dask notes[0] body
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Dask doesn't support the following argument(s). | |
* buf | |
* columns | |
* col_space | |
* header | |
* index | |
* na_rep | |
* formatters | |
* float_format | |
* sparsify | |
* index_names | |
* justify | |
* bold_rows | |
* classes | |
* escape | |
* max_cols | |
* show_dimensions | |
* notebook | |
* decimal | |
* border | |
.. py:method:: DataFrame.to_parquet(path, *args, **kwargs) | |
:module: dask.dataframe | |
Store Dask.dataframe to Parquet files | |
:Parameters: | |
**path** : string | |
Destination directory for data. Prepend with protocol like ``s3://`` | |
or ``hdfs://`` for remote data. | |
**df** : Dask.dataframe | |
**compression** : string or dict | |
Either a string like "SNAPPY" or a dictionary mapping column names to | |
compressors like ``{"name": "GZIP", "values": "SNAPPY"}`` | |
**write_index** : boolean | |
Whether or not to write the index. Defaults to True *if* divisions are | |
known. | |
**has_nulls** : bool, list or 'infer' | |
Specifies whether to write NULLs information for columns. If bools, | |
apply to all columns, if list, use for only the named columns, if | |
'infer', use only for columns which don't have a sentinel NULL marker | |
(currently object columns only). | |
**fixed_text** : dict {col: int} | |
For column types that are written as bytes (bytes, utf8 strings, or | |
json and bson-encoded objects), if a column is included here, the | |
data will be written in fixed-length format, which should be faster | |
but can potentially result in truncation. | |
**object_encoding** : dict {col: bytes|utf8|json|bson} or str | |
For object columns, specify how to encode to bytes. If a str, same | |
encoding is applied to all object columns. | |
**storage_options** : dict | |
Key/value pairs to be passed on to the file-system backend, if any. | |
**append: bool (False)** | |
If False, construct data-set from scratch; if True, add new | |
row-group(s) to existing data-set. In the latter case, the data-set | |
must exist, and the schema must match the input data. | |
**ignore_divisions: bool (False)** | |
If False raises error when previous divisions overlap with the new | |
appended divisions. Ignored if append=False. | |
**partition_on: list** | |
Construct directory-based partitioning by splitting on these fields' | |
values. Each dask partition will result in one or more datafiles, | |
there will be no global groupby. | |
**compute: bool (True)** | |
If true (default) then we compute immediately. | |
If False then we return a dask.delayed object for future computation. | |
**This uses the fastparquet project:** | |
**http://fastparquet.readthedocs.io/en/latest** | |
.. seealso:: | |
:obj:`read_parquet` | |
Read parquet data to dask.dataframe | |
.. rubric:: Notes | |
Each partition will be written to a separate file. | |
.. rubric:: Examples | |
>>> df = dd.read_csv(...) # doctest: +SKIP | |
>>> to_parquet('/path/to/output/', df, compression='SNAPPY') # doctest: +SKIP | |
.. py:method:: DataFrame.to_records(index=False) | |
:module: dask.dataframe | |
Create Dask Array from a Dask Dataframe | |
Warning: This creates a dask.array without precise shape information. | |
Operations that depend on shape information, like slicing or reshaping, | |
will not work. | |
.. seealso:: | |
:obj:`dask.dataframe._Frame.values`, :obj:`dask.dataframe.from_dask_array` | |
.. rubric:: Examples | |
>>> df.to_records() # doctest: +SKIP | |
dask.array<shape=(nan,), dtype=(numpy.record, [('ind', '<f8'), ('x', 'O'), ('y', '<i8')]), chunksize=(nan,)> | |
.. py:method:: DataFrame.to_string(max_rows=5) | |
:module: dask.dataframe | |
Render a DataFrame to a console-friendly tabular output. | |
:Parameters: | |
**buf** : StringIO-like, optional | |
buffer to write to | |
**columns** : sequence, optional | |
the subset of columns to write; default None writes all columns | |
**col_space** : int, optional | |
the minimum width of each column | |
**header** : bool, optional | |
Write out column names. If a list of string is given, it is assumed to be aliases for the column names | |
**index** : bool, optional | |
whether to print index (row) labels, default True | |
**na_rep** : string, optional | |
string representation of NAN to use, default 'NaN' | |
**formatters** : list or dict of one-parameter functions, optional | |
formatter functions to apply to columns' elements by position or name, | |
default None. The result of each function must be a unicode string. | |
List must be of length equal to the number of columns. | |
**float_format** : one-parameter function, optional | |
formatter function to apply to columns' elements if they are floats, | |
default None. The result of this function must be a unicode string. | |
**sparsify** : bool, optional | |
Set to False for a DataFrame with a hierarchical index to print every | |
multiindex key at each row, default True | |
**index_names** : bool, optional | |
Prints the names of the indexes, default True | |
**line_width** : int, optional | |
Width to wrap a line in characters, default no wrap | |
**justify** : {'left', 'right'}, default None | |
Left or right-justify the column labels. If None uses the option from | |
the print configuration (controlled by set_option), 'right' out | |
of the box. | |
:Returns: | |
**formatted** : string (or unicode, depending on data and options) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment