This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import io.netty.buffer.ArrowBuf; | |
import org.apache.arrow.memory.BufferAllocator; | |
import org.apache.arrow.memory.RootAllocator; | |
import org.apache.arrow.vector.file.ArrowWriter; | |
import org.apache.arrow.vector.schema.ArrowFieldNode; | |
import org.apache.arrow.vector.schema.ArrowRecordBatch; | |
import org.apache.arrow.vector.types.pojo.Field; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class DataFrame(object): | |
... | |
def asPandas(self): | |
return ArrowDataFrame(self) | |
class ArrowDataFrame(object): | |
""" | |
Wraps a Python DataFrame to group/winow then apply using``pandas.DataFrame`` | |
""" |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def serve_csv_data(ip_addr, port_num, directory): | |
""" | |
Create a socket and serve Arrow record batches as a stream read from the | |
given directory containing CVS files. | |
""" | |
# Create the socket | |
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | |
sock.bind((ip_addr, port_num)) | |
sock.listen(1) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def read_and_process_dir(directory): | |
"""Read a directory of CSV files and yield processed Arrow batches.""" | |
for f in os.listdir(directory): | |
if f.endswith(".csv"): | |
filename = os.path.join(directory, f) | |
for batch in read_and_process(filename): | |
yield batch |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ds = make_local_dataset(filename) | |
model = model_fit(ds) | |
print("Fit model with weights: {}".format(model.get_weights())) | |
# Fit model with weights: | |
# [array([[0.7793554 ], [0.61216295]], dtype=float32), | |
# array([0.03328196], dtype=float32)] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def model_fit(ds): | |
"""Create and fit a Keras logistic regression model.""" | |
# Build the Keras model | |
model = tf.keras.Sequential() | |
model.add(tf.keras.layers.Dense(1, input_shape=(2,), | |
activation='sigmoid')) | |
model.compile(optimizer='sgd', loss='mean_squared_error', | |
metrics=['accuracy']) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import tensorflow_io.arrow as arrow_io | |
ds = arrow_io.ArrowStreamDataset.from_pandas( | |
df, | |
batch_size=2, | |
preserve_index=False) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import tensorflow_io.arrow as arrow_io | |
from pyarrow.feather import write_feather | |
# Write the Pandas DataFrame to a Feather file | |
write_feather(df, '/path/to/df.feather') | |
# Create the dataset with one or more filenames | |
ds = arrow_io.ArrowFeatherDataset( | |
['/path/to/df.feather'], | |
columns=(0, 1, 2), |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas as pd | |
data = {'label': np.random.binomial(1, 0.5, 10)} | |
data['x0'] = np.random.randn(10) + 5 * data['label'] | |
data['x1'] = np.random.randn(10) + 5 * data['label'] | |
df = pd.DataFrame(data) | |
print(df.head()) |
OlderNewer