Skip to content

Instantly share code, notes, and snippets.

@thomwolf
Last active February 28, 2023 07:25
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save thomwolf/ecc52ea728d29c9724320b38619bd6a6 to your computer and use it in GitHub Desktop.
Save thomwolf/ecc52ea728d29c9724320b38619bd6a6 to your computer and use it in GitHub Desktop.
Download and load persona-chat json dataset
import json
from pytorch_pretrained_bert import cached_path
url = "https://s3.amazonaws.com/datasets.huggingface.co/personachat/personachat_self_original.json"
# Download and load JSON dataset
personachat_file = cached_path(url)
with open(personachat_file, "r", encoding="utf-8") as f:
dataset = json.loads(f.read())
# Tokenize and encode the dataset using our loaded GPT tokenizer
def tokenize(obj):
if isinstance(obj, str):
return tokenizer.convert_tokens_to_ids(tokenizer.tokenize(obj))
if isinstance(obj, dict):
return dict((n, tokenize(o)) for n, o in obj.items())
return list(tokenize(o) for o in obj)
dataset = tokenize(dataset)
@oltip
Copy link

oltip commented May 16, 2019

Hi, I am trying to download the file form the s3 bucket you have indicated in the link, but it raises an error:
NoCredentialsError: Unable to locate credentials
This happens at the function s3_etag(url)

At seems as any kind of credentials is needed. Any help would be welcomed.

@mandar1010
Copy link

getting the same error

@Pranav-Goel
Copy link

same error here too

@thomwolf
Copy link
Author

Should be fixed now

@sashank06
Copy link

@thomwolf the error still persists. Unable to download the json dataset due to that issue.

@sashank06
Copy link

@thomwolf the error still persists. Unable to download the json dataset due to that issue.

I fixed the error. It was an error on my end. I had to reconfigure the AWS credentials.

@ShivaShanmuganathan
Copy link

ShivaShanmuganathan commented Jul 31, 2019

Should be fixed now

@thomwolf the error still persists. Unable to download the json dataset due to that issue.

I fixed the error. It was an error on my end. I had to reconfigure the AWS credentials.

I am still getting the same error. Please help.

@naveentvelu
Copy link

@thomwolf the error still persists. Unable to download the json dataset due to that issue.

I fixed the error. It was an error on my end. I had to reconfigure the AWS credentials.

@sashank06 I am still getting the error, can you please share how you rectified the error.

@Khaled-Abdelhamid
Copy link

@CatarauCorina
Copy link

@Houssem96
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment