Lecture 1: Introduction to Research — [📝Lecture Notebooks] [
Lecture 2: Introduction to Python — [📝Lecture Notebooks] [
Lecture 3: Introduction to NumPy — [📝Lecture Notebooks] [
Lecture 4: Introduction to pandas — [📝Lecture Notebooks] [
Lecture 5: Plotting Data — [📝Lecture Notebooks] [[
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
* An implementation of the Wavelet Tree data structure. | |
* It is similar to a binary search tree using bits. | |
* | |
* More info here: | |
* http://siganakis.com/challenge-design-a-data-structure-thats-small | |
* http://www.alexbowe.com/wavelet-trees | |
*/ | |
#include <stdio.h> | |
#include <stdlib.h> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
inline item_bound sequential_SSE(int *sentence_num, unsigned long *bit_index, int shortlen, int short_start, int short_end, int longlen, int long_start, int long_end, int distance, int *res_sentence_num, unsigned long *res_bit_index, int *globalIndexStart) | |
{ | |
int sind = short_start, lind = long_start; | |
int shortLenReg = ((shortlen / 4)) * 4 - 4; | |
int longLenReg = ((longlen / 4)) * 4 - 4; | |
bool flager = true; | |
if (shortLenReg < 0 || longLenReg < 0) { | |
flager = false; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' Script for downloading all GLUE data. | |
Note: for legal reasons, we are unable to host MRPC. | |
You can either use the version hosted by the SentEval team, which is already tokenized, | |
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually. | |
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example). | |
You should then rename and place specific files in a folder (see below for an example). | |
mkdir MRPC | |
cabextract MSRParaphraseCorpus.msi -d MRPC |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# coding=utf-8 | |
# Copyright 2023 The HuggingFace Inc. team. All rights reserved. | |
# | |
# Licensed under the Apache License, Version 2.0 (the "License"); | |
# you may not use this file except in compliance with the License. | |
# You may obtain a copy of the License at | |
# | |
# http://www.apache.org/licenses/LICENSE-2.0 | |
# | |
# Unless required by applicable law or agreed to in writing, software |