Skip to content

Instantly share code, notes, and snippets.

More detailed thoughts about data extraction

This gist contains some ideas about using LLMs to extract data from papers (specifically related to biology, aging research and the like).

Just to quickly expand a bit on what I was trying to say when our meeting was cut off:

I think the LLM data extraction can be viewed as a problem tractable at 3 different layers:
1. purely text based, e.g. use `pdftotext` to turn a PDF into a text document, then use LLMs to summarize, extract, tag, ... papers in order to have machine readable data.
@Vindaar
Vindaar / neural_spike_plot.org
Last active May 29, 2024 15:34
Raster plots for neuroscience in ggplotnim
@Vindaar
Vindaar / dplyr_pandas_comparison_to_nim.org
Last active October 10, 2023 17:32
A comparison of dplyr, Pandas and data frames in Nim (using ggplotnim)

Dplyr (R), Pandas (Python) & Nim data frame comparison

This comparison is inspired by the comparison here: https://gist.github.com/conormm/fd8b1980c28dd21cfaf6975c86c74d07

The Nim data frame implementation we use here is the Datamancer data frame.

Note that due to UFCS in Nim we can write the commands we present either similar to the Python notation as if the function were a method of the

@Vindaar
Vindaar / dynlib_based_nim_repl_clean.nim
Created September 22, 2023 10:20
Dynlib based Nim REPL using compiler API, clean up a bit
import std / [strutils, strformat, tables, dynlib, os]
import noise, shell
import compiler/[llstream, renderer, types, magicsys, ast,
transf, # for code transformation (for -> while etc)
injectdestructors, # destructor injection
pathutils, # AbsoluteDir
modulegraphs] # getBody
import ./nimeval_dynlib_clean
@Vindaar
Vindaar / dynlib_based_nim_repl_v2.nim
Last active September 22, 2023 00:29
Dynlib based Nim REPL using compiler API
import std / [strutils, strformat, tables, dynlib, os]
import noise, shell
import compiler/[llstream, renderer, types, magicsys, ast,
transf, # for code transformation (for -> while etc)
injectdestructors, # destructor injection
pathutils, # AbsoluteDir
modulegraphs] # getBody
import ./nimeval_dynlib
@Vindaar
Vindaar / dynlib_based_nim_repl.nim
Created September 22, 2023 00:25
Toy dynlib based Nim REPL
import noise, strutils, strformat, shell, tables, dynlib
proc printHelp() = echo ""
const procTmpl = """
{.push cdecl, exportc, dynlib.}
$#
{.pop.}
"""
const exprTmpl = """
@Vindaar
Vindaar / mandelbrot.nim
Last active September 9, 2023 16:14
Embedding ggplotnim in SDL2
import datamancer
import std / [math, complex]
const xn = 960
const yn = 960
const xmin = -2.0
const xmax = 0.6
const ymin = -1.5
const ymax = 1.5
const MAX_ITERS = 200
@Vindaar
Vindaar / weather.json
Last active December 22, 2022 16:39
Wind speed and angles linear interpolation
{"type":"Feature","geometry":{"type":"Point","coordinates":[18.9276,69.69,100]},"properties":{"meta":{"updated_at":"2022-12-22T14:42:02Z","units":{"air_pressure_at_sea_level":"hPa","air_temperature":"celsius","cloud_area_fraction":"%","precipitation_amount":"mm","relative_humidity":"%","wind_from_direction":"degrees","wind_speed":"m/s"}},"timeseries":[{"time":"2022-12-22T14:00:00Z","data":{"instant":{"details":{"air_pressure_at_sea_level":986.2,"air_temperature":-2.1,"cloud_area_fraction":86.7,"relative_humidity":81.6,"wind_from_direction":184.4,"wind_speed":6.1}},"next_12_hours":{"summary":{"symbol_code":"lightsnowshowers_day"}},"next_1_hours":{"summary":{"symbol_code":"lightsnow"},"details":{"precipitation_amount":0.2}},"next_6_hours":{"summary":{"symbol_code":"snow"},"details":{"precipitation_amount":2.2}}}},{"time":"2022-12-22T15:00:00Z","data":{"instant":{"details":{"air_pressure_at_sea_level":985.9,"air_temperature":-1.9,"cloud_area_fraction":84.9,"relative_humidity":80.7,"wind_from_direction":187.0,"wi
import ggplotnim, math
import arraymancer
const ε = 3
proc φ(r: float): float =
result = exp(-pow((ε.float * r), 2.0))
proc toMatrix(n: int, start, stop: float): Tensor[float] =
result = zeros[float]([n, n])
let xs = linspace(start, stop, n)