This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
OpenScholar is a retrieval-augmented language model that assists researchers in synthesizing scientific literature. The system uses a database of 45 million open-access papers to provide citation-backed responses to queries, accurately identifying relevant passages and generating reliable answers across multiple scientific domains. This approach addresses the growing challenge of keeping up with rapidly expanding scientific literature. | |
The researchers developed ScholarQABench, a multi-domain benchmark for evaluating literature search capabilities, with 2,967 expert-written queries and 208 detailed answers across computer science, physics, neuroscience, and biomedicine. In testing, OpenScholar-8B outperformed GPT-4o by 5% and PaperQA2 by 7% in correctness metrics, despite being a smaller, open model. | |
Citation accuracy stands as a key strength of OpenScholar. While GPT-4o shows concerning citation hallucination rates of 78-90%, OpenScholar matches human expert-level accuracy in citation verification. The syst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# NOTE: | |
# You can find an updated, more robust and feature-rich implementation | |
# in Zeno Build | |
# - Zeno Build: https://github.com/zeno-ml/zeno-build/ | |
# - Implementation: https://github.com/zeno-ml/zeno-build/blob/main/zeno_build/models/providers/openai_utils.py | |
import openai | |
import asyncio | |
from typing import Any |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import requests | |
import sys | |
import time | |
sleep_time = 20 | |
def query_api(url, session): | |
global sleep_time | |
time.sleep(sleep_time / 1000.0) | |
r = session.get(url) | |
while r.status_code == 429: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import openreview | |
import argparse | |
import requests | |
import time | |
import sys | |
import csv | |
import json | |
from tqdm import tqdm # Progress bar | |
# This is a utility script to get a CSV of papers from semantic scholar given OpenReview ids |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package edu.cmu.empty; | |
import com.intellij.openapi.actionSystem.AnAction; | |
import com.intellij.openapi.actionSystem.AnActionEvent; | |
import com.intellij.openapi.project.Project; | |
import com.intellij.openapi.ui.Messages; | |
import com.intellij.openapi.ui.popup.JBPopupFactory; | |
import com.intellij.openapi.ui.popup.ListPopup; | |
import com.intellij.openapi.ui.popup.PopupStep; | |
import com.intellij.openapi.ui.popup.util.BaseListPopupStep; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import sys | |
import re | |
from collections import defaultdict | |
# This is a script to identify pronouns in Japanese | |
# It requires data segmented by KyTea (http://www.phontron.com/kytea/) | |
# | |
# If you have raw Japanese text (with no spaces), use this script like: | |
# cat japanese.txt | kytea | python identify_japanese_pronouns.py > japanese_with_pronouns.txt | |
# |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#### Script to calculate the best paper deadline based on the population on earth based on some not-completely-arbitrary assumptions | |
# by Graham Neubig | |
# Results are: | |
# UTC 8:00 deadline, utility is 1476.1150000000002 | |
# UTC 9:00 deadline, utility is 1438.7800000000002 | |
# UTC 14:00 deadline, utility is 1385.2949999999998 | |
# UTC 15:00 deadline, utility is 1345.945 | |
# UTC 13:00 deadline, utility is 1291.4950000000003 | |
# UTC 7:00 deadline, utility is 1287.1649999999997 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
openapi: "3.0.0" | |
info: | |
version: 1.0.0 | |
title: Swagger Petstore | |
license: | |
name: MIT | |
servers: | |
- url: http://petstore.swagger.io/v1 | |
paths: | |
/pets: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
DyNet implementation of a sequence labeler (POS taggger). | |
This is a translation of this tagger in PyTorch: https://gist.github.com/hal3/8c170c4400576eb8d0a8bd94ab231232 | |
Basic architecture: | |
- take words | |
- run though bidirectional GRU | |
- predict labels one word at a time (left to right), using a recurrent neural network "decoder" | |
The decoder updates hidden state based on: | |
- most recent word |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import sys | |
################# Explanation ################## | |
# This is a function to calculate house prices h(x) = -40 + 0.25x | |
# The first term (-40) is the base price, and "x" is the number of square feet in the house | |
################################################ | |
# Set up the function | |
my_function = np.array([-40, 0.25]) |
NewerOlder