Skip to content

Instantly share code, notes, and snippets.

@nat
Last active June 4, 2024 20:18
Show Gist options
  • Save nat/e7266a5c765686b7976df10d3a85041b to your computer and use it in GitHub Desktop.
Save nat/e7266a5c765686b7976df10d3a85041b to your computer and use it in GitHub Desktop.

Thank you for registering for the Vesuvius Challenge!

IMPORTANT REMINDER: Please do not share these links or this page without permission. To access this data, we require that you register for the Vesuvius Challenge and accept the license. Thank you for understanding!

Download instructions

The Vesuvius Challenge data files can be found here:

http://dl.ash2txt.org/

username: registeredusers, password: only

As described on the data page, the data is very large.

You can use wget to download this data recursively like this:

wget --no-parent -r --user=registeredusers --password=only http://dl.ash2txt.org/fragments

The data for the full scrolls is so large, you might want to only download parts of it to work with at first.

Here is a command to download 1cm of scan data from the center of Scroll 1:

for i in `seq 6000 7250`; do wget --user=registeredusers --password=only http://dl.ash2txt.org/full-scrolls/Scroll1/PHercParis4.volpkg/volumes/20230205180739/0$i.tif; done

Faster downloads

For faster downloads, use rclone:

rclone copy :http:/full-scrolls/ ./dl.ash2txt.org/full-scrolls/ --http-url http://registeredusers:only@dl.ash2txt.org/ --progress --multi-thread-streams=32 --transfers=32 --size-only

On Linux, follow these instructions to make downloads much faster on linux without needing to use rclone.

On Windows, follow these instructions.

LICENSE

By downloading this data, you agree to license the data from Vesuvius Challenge under the following licensing terms:

  • You will not redistribute the data without the written approval of Vesuvius Challenge.
  • Vesuvius Challenge reserve the right to use in any way, including in an academic or other publication, all submissions or results produced from this dataset.
  • You will not make public any revelation of hidden text (or associated code) without the written approval of Vesuvius Challenge.
  • You agree all publications and presentations resulting from any use of the EduceLab-Scrolls Dataset must cite use of the EduceLab-Scrolls Dataset as follows:
  • In any published abstract, you will cite “EduceLab-Scrolls” as the source of the data in the abstract.
  • In any published manuscripts using data from EduceLab-Scrolls, you will reference the data paper linked from scrollprize.org.
  • You will include language similar to the following in the methods section of my manuscripts in order to accurately acknowledge the data source: “Data used in the preparation of this article were obtained from the EduceLab-Scrolls dataset [Stephen Parsons, C. Seth Parker, Christy Chapman, Mami Hayashida, and W. Brent Seales. EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT. 2023].”
  • You understand that all submissions will be reviewed by the Vesuvius Challenge Review Team, and that prizes will be awarded as the sole discretion of Vesuvius Challenge.

All EduceLab-Scrolls data is copyrighted by EduceLab/The University of Kentucky. Permission to use the data linked herein according to the terms outlined above is granted to Vesuvius Challenge.

@filipebuba
Copy link

I am not able to access the files because of my username. But I already signed the term

@15m43lk4155y
Copy link

I am not able to access the files because of my username. But I already signed the term

There is no custom username the username is "registeredusers" and the password is "only" as mentioned above

@pythonbdfl
Copy link

I'm late to the game. Where can I find out about challenges like this when they start?

@blakesturges
Copy link

I'm late to the game. Where can I find out about challenges like this when they start?

I feel similarly. From what I understand a lot of challenges are posted on Kaggle.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment