Skip to content

Instantly share code, notes, and snippets.

@michaelosthege
Last active May 28, 2024 18:36
Show Gist options
  • Save michaelosthege/27315631c1aedbe55f5affbccabef1ca to your computer and use it in GitHub Desktop.
Save michaelosthege/27315631c1aedbe55f5affbccabef1ca to your computer and use it in GitHub Desktop.
Hudson Bay company Lynx-Hare dataset from Leigh 1968, parsed from paper copy http://katalog.ub.uni-heidelberg.de/titel/66489211 (This is the entire Table II)
year hare lynx
1847 21000 49000
1848 12000 21000
1849 24000 9000
1850 50000 7000
1851 80000 5000
1852 80000 5000
1853 90000 11000
1854 69000 22000
1855 80000 33000
1856 93000 33000
1857 72000 27000
1858 27000 18000
1859 14000 8000
1860 16000 4000
1861 38000 4000
1862 5000 4000
1863 153000 20000
1864 145000 35000
1865 106000 68000
1866 46000 70000
1867 23000 40000
1868 2000 22000
1869 4000 9000
1870 8000 5000
1871 7000 4000
1872 60000 10000
1873 46000 18000
1874 50000 19000
1875 103000 43000
1876 87000 37000
1877 68000 22000
1878 17000 15000
1879 10000 10000
1880 17000 8000
1881 16000 8000
1882 15000 30000
1883 46000 52000
1884 55000 75000
1885 137000 80000
1886 137000 33000
1887 95000 20000
1888 37000 13000
1889 22000 7000
1890 50000 6000
1891 54000 10000
1892 65000 20000
1893 60000 35000
1894 81000 55000
1895 95000 40000
1896 56000 28000
1897 18000 16000
1898 5000 5000
1899 2000 6000
1900 15000 10000
1901 2000 21000
1902 6000 35000
1903 45000 50000
@AdriaCoding
Copy link

Dear Mr. Osthege,

I believe there is a typo in this dataset. At year 1862 the hare population should be in a growth period, but it does a sudden dropdown to 5000. We believe that 5000 -> 50000.

I can not open a pull request, but forked you gist and made the changes here. We were using your dataset for a ML project at our university, and we needed to introduce this change for our models to make sense.

Best wishes,
Adrià Lisa.

@michaelosthege
Copy link
Author

hi @AdriaCoding, thanks for commenting.
I checked it again and it turns out that I have a typo, but it's not what you pointed at: its Table III not Table II.

The scan available at https://archive.org/details/somemathematical0000symp_o1x7 clearly shows "5,000" hare in 1862.

I do feel sorry for your ML model, but the dataset says 5,000 and if that was a typo, it happened more than half a century ago.

You might want to try a Student-t for the loss or likehood function to get some robustness against outliers.

best
Michael

@AdriaCoding
Copy link

Thanks for your time Michael.

Indeed I was suspecting that the typo comes from the original authors. Maybe it is not a typo, as the hare population is quite noisy all along the dataset, possibly due to external factors that do not affect the lynx population as much.

Thanks for your suggestions, i will be trying them out. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment