Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save evangriffiths/9e3be550d80ff4b675e6db5770e1d1f2 to your computer and use it in GitHub Desktop.
Save evangriffiths/9e3be550d80ff4b675e6db5770e1d1f2 to your computer and use it in GitHub Desktop.
import time
import evo_researcher.benchmark.benchmark as bm
from evo_researcher.benchmark.utils import get_manifold_markets
benchmarker = bm.Benchmarker(
markets=get_manifold_markets(number=24),
agents=[
bm.EvoAgent(model="gpt-4-1106-preview"),
bm.OlasAgent(model="gpt-3.5-turbo"),
],
cache_path="./.cache.json",
)
benchmarker.run_agents()
md = benchmarker.generate_markdown_report()
output = f"./benchmark_report.{int(time.time())}.md"
with open(output, "w") as f:
print(f"Writing benchmark report to: {output}")
f.write(md)
@evangriffiths
Copy link
Author

Comparison Report

Summary Statistics

Agents MSE for p_yes Mean confidence Mean info_utility Mean cost ($) Mean time (s)
evo 0.088326 0.8 0.902083 0.119027 104.738
olas 0.102439 0.779167 0.870833 0.000566854 38.388

Markets

Market Question evo p_yes olas p_yes manifold p_yes
Will the LK-99 room temp, ambient pressure superconductivity pre-print replicate before 2025? 0.35 0.35 0.04
Will Vladimir Putin be the President of Russia at the end of 2024? 0.95 0.15 0.94
Will Joe Biden win the 2024 US Presidential Election? 0.8 0.85 0.516853
By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006? 0.4 0.65 0.592878
Will an AI get gold on any International Math Olympiad by 2025? 0.45 0.6 0.24
Will there be an AI language model that surpasses ChatGPT and other OpenAI models before the end of 2024? 0.7 0.35 0.425776
Will the five most valuable members of the S&P 500 be more than 21% of the index's value in 2025? 0.9 0.85 0.748911
Will a reliable and general household robot be developed before January 1st, 2030? 0.8 0.6 0.335119
Will China launch a full-scale invasion of Taiwan before 2030? 0.1 0.2 0.288374
Will @firstuserhere coauthor a NeurIPS or ICML conference publication before end of 2024? (10,000 Mana subsidy) 0.25 0.6 0.437247
Will @firstuserhere coauthor a publication in AIstats, AAAI, ICLR or JMLR before end of 2024? ($11,000M subsidy) 0.4 0.3 0.649128
Will over 100,000 people be conceived with the help of advanced embryo selection techniques by 2030? 0.6 0.65 0.361456
Will there be a >0 value liquidity event for me, a former Consensys Software Inc. employee, on my shares of the company? 0.85 0.15 0.637982
World gdp growth from 2023 to 2100? (nominal USD, annualized, 10x amplified) (M10,000 subsidy) 0.4 0.7 0.560417
Will this market have an odd number of traders by the end of 2024? 0.35 0.2 0.5
Will Binance collapse before the end of 2024? 0.35 0.25 0.151545
Will Donald Trump win the 2024 presidential election? 0.55 0.75 0.426377
Will there be a >0 value liquidity event for me, a former Consensys employee, on my shares of the company by 2025? 0.65 0.2 0.243298
Will we find something showing equal or greater architectural advancement to Gobekli Tepe, from before 11,000 BC? 0.7 0 0.333948
Will @Mira cause human extinction by 2030? 0.05 0.6 0.03
Will OpenAI hint at or claim to have AGI by 2025 end? 0.15 0.65 0.204877
Will AI be a major topic during the 2024 presidential debates in the United States? 0.8 0.6 0.334091
Will Bougainville be a country by end of 2027? [Showcase: M$10,000 subsidy] 0.3 0.15 0.696564
Will a room-temperature, atmospheric pressure superconductor be discovered before 2030? 0.6 0.35 0.0834603

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment