Wanna create and play with an AI clone of yourself or someone else (my lawyer says please don't)1 like this one? You're in luck because it's super easy!
This step really varies depending on your data sources, but the end goal is to turn some of real-you's conversations (from your platforms of choice) into a ShareGPT format dataset with you as the gpt
. Here's what your (json) file should end up looking like:
{"conversations": [{"from": "human", "value": "Hi"}, {"from": "gpt", "value": "Hello"}]}
{"conversations": [{"from": "human", "value": "What's up "}, {"from": "gpt", "value": "not much, you?"}, {"from": "human", "value": "Just thinking, what if you're a robot and I don't realize it?"}, {"from": "gpt", "value": "hahaha don't be crazy"}]}
...
NOTE: Make sure every line starts with a message from the other person ("human")
You'll probably want to split long (more than-a-few-messages) conversations into several lines. I've tried a few different things but it seems like 20-ish messages (10-ish them-you pairs) a line is good.
If you sometimes have one person send multiple messages in a row, I recommend merging them together with newlines and putting them in the dataset as one message that way.
Finally, upload the file to a new huggingface dataset.
I used only discord messages for my clone. Here's what I did to get my dataset:
- Used Discord Chat Exporter to export the conversations I wanted
- Put the resulting file names into
filenames
in this script and ran it (also add your discord username as it appears in the exported chats asmyname
). Boom, that easy. Now upload the result to huggingface.
This is the easy part! Make a copy of this colab notebook and edit the config. I think the options are self-explanatory. It uses llama 3 8b for the MODEL
, but you should be able to use any. I'd be curious to see how other base models do.
Then run all and your model should be there on huggingface in TRAINED_REPO
.
This is the easy-est part. Just duplicate this space, swap out model_id
for your new model's repo, and it should deploy.
Have fun with your new mini-you!
Footnotes
-
Seriously don't be that guy. ↩