Skip to content

Instantly share code, notes, and snippets.

@TejaGollapudi
Created April 5, 2023 00:50
Show Gist options
  • Save TejaGollapudi/296aa870624920574c8cf3c17ce02f7c to your computer and use it in GitHub Desktop.
Save TejaGollapudi/296aa870624920574c8cf3c17ce02f7c to your computer and use it in GitHub Desktop.
Convert Alpaca dataset into csv format
import json
import pandas as pd
with open('alpaca_data.json') as f:
data = json.load(f)
new_format = []
for i, point in enumerate(data):
# no input
if len(point['input']) == 0:
inputt = "Below is an instruction that describes a task.\n "
inputt += "Write a response that appropriately completes the request.\n\n"
inputt += f"### Instruction:\n{point['instruction']}\n\n### Response:"
else:
inputt = "Below is an instruction that describes a task.\n "
inputt += "Write a response that appropriately completes the request.\n\n"
inputt += f"### Instruction:\n{point['instruction']}\n\n### Input:\n{point['input']}\n\n### Response:"
item = {'input': inputt, 'output': str(point['output'])}
new_format.append(item)
df = pd.DataFrame(new_format)
df = df.dropna()
df.to_csv('alpaca_data.csv')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment