Skip to content

Instantly share code, notes, and snippets.

@seanjensengrey
Last active August 31, 2024 17:21
Show Gist options
  • Save seanjensengrey/92a6c19520f36798aeab0c3b0911f9a2 to your computer and use it in GitHub Desktop.
Save seanjensengrey/92a6c19520f36798aeab0c3b0911f9a2 to your computer and use it in GitHub Desktop.

arxiv-ml

I have a tuple of 3 numbers that I would like to predict a final value. The first row is the one I would like to complete with the value ? the other two rows are training data. Could you find the missing value or a way to approximate it?

(2607, 2671, 1975) = ?
(2495, 2488, 1879) = 28644
(2269, 2263, 1597) = 26513

ChatGPT4 estimated the following number of papers for 2023

  • hacky linear regression, 30786
  • average of absolute differences, 30522
  • average of ratios, 31206

Counts by month for each year, attempting to find some sort of correlation with the first three months and the total papers posted to arxiv in that year.

2023

https://arxiv.org/list/cs.LG/2303 2607
https://arxiv.org/list/cs.LG/2302 2671
https://arxiv.org/list/cs.LG/2301 1975

2022

https://arxiv.org/list/cs.LG/2203 2495
https://arxiv.org/list/cs.LG/2202 2488
https://arxiv.org/list/cs.LG/2201 1879

2021

https://arxiv.org/list/cs.LG/2103 2269
https://arxiv.org/list/cs.LG/2102 2263
https://arxiv.org/list/cs.LG/2101 1597

Total number of papers for "Machine Learning" posted to arxiv.org in that year.

https://arxiv.org/list/cs.LG/22 28644
https://arxiv.org/list/cs.LG/21 26513
https://arxiv.org/list/cs.LG/20 25887
https://arxiv.org/list/cs.LG/19 19253
https://arxiv.org/list/cs.LG/18 10491
https://arxiv.org/list/cs.LG/17 5232
https://arxiv.org/list/cs.LG/16 3569
https://arxiv.org/list/cs.LG/15 2488
https://arxiv.org/list/cs.LG/14 1743
https://arxiv.org/list/cs.LG/13 1419
https://arxiv.org/list/cs.LG/12 1223
https://arxiv.org/list/cs.LG/11 469
https://arxiv.org/list/cs.LG/10 333
https://arxiv.org/list/cs.LG/09 231
https://arxiv.org/list/cs.LG/08 126
https://arxiv.org/list/cs.LG/07 84
https://arxiv.org/list/cs.LG/06 77
https://arxiv.org/list/cs.LG/05 67
https://arxiv.org/list/cs.LG/04 38
https://arxiv.org/list/cs.LG/03 30
https://arxiv.org/list/cs.LG/02 44
https://arxiv.org/list/cs.LG/01 23
https://arxiv.org/list/cs.LG/00 36
https://arxiv.org/list/cs.LG/99 25
https://arxiv.org/list/cs.LG/98 7
https://arxiv.org/list/cs.LG/97 1

ChatGPT generated the chart.

prompt

I am going to paste a block of URLs followed by a space and a number. The last part of the URL is the year and then space followed by the number is the number of papers in that year. Please extract the data and then order by year, soonest to latest.

Take that data and plot in a seaborn line plot with the following axis, x: by year ascending, title "year posted to arxiv" the y axis, "papers posted to arxiv" with a title "growth in cs.LG papers posted to arxiv by year"

output

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

data = {
    'year': [1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022],
    'papers': [1, 7, 25, 36, 23, 44, 30, 38, 67, 77, 84, 126, 231, 333, 469, 1223, 1419, 1743, 2488, 3569, 5232, 10491, 19253, 25887, 26513, 28644]
}

df = pd.DataFrame(data)

sns.set(style="whitegrid")
plt.figure(figsize=(12, 6))
ax = sns.lineplot(x='year', y='papers', data=df)
ax.set(xlabel='Year Posted to arXiv', ylabel='Papers Posted to arXiv', title='Growth in cs.LG Papers Posted to arXiv by Year')

plt.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment