paco xander nathan ceteri

## normcore-llm.md

      
              1 file
            
          
              218 forks
            
          
              38 comments
            
          
              2780 stars
            
          
                veekaybee
                / normcore-llm.md
            
            
              Last active
              July 26, 2024 01:10
            
              
                Normcore LLM Reads
              
          
    Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
Foundational Concepts


Pre-Transformer Models


## Jupyter_vs_Mathematica.ipynb

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              6 stars
            
          
                jakevdp
                / Jupyter_vs_Mathematica.ipynb
            
            
              Created
              April 8, 2018 05:01
            
              
                Jupyter vs Mathematica Google Trends
              
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## hello_mesos.py
import logging
import uuid
import time

from mesos.interface import Scheduler
from mesos.native import MesosSchedulerDriver
from mesos.interface import mesos_pb2

logging.basicConfig(level=logging.INFO)

## spark-svd.scala
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg._
import org.apache.spark.{SparkConf, SparkContext}

// To use the latest sparse SVD implementation, please build your spark-assembly after this
// change: https://github.com/apache/spark/pull/1378

// Input tsv with 3 fields: rowIndex(Long), columnIndex(Long), weight(Double), indices start with 0
// Assume the number of rows is larger than the number of columns, and the number of columns is
// smaller than Int.MaxValue

## Vagrantfile
Vagrant.configure("2") do |config|

  config.vm.box = "dummy"

  config.vm.provider :aws do |aws, override|
    aws.access_key_id = "..."
    aws.secret_access_key = "..."
    # you'll need to create the EC2 keypair used here -- I called it vagrant for easy tracking
    aws.keypair_name = "vagrant"
    # you'll want to use a group that has at least SSH open

## ProgrammaticNotebook.ipynb

      
              1 file
            
          
              22 forks
            
          
              8 comments
            
          
              114 stars
            
          
                fperez
                / ProgrammaticNotebook.ipynb
            
            
              Last active
              May 2, 2024 19:14
            
              
                Creating an IPython Notebook programatically
              
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## gist:8961994

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              9 stars
            
          
                johnynek
                / gist:8961994
            
            
              Last active
              August 29, 2015 13:56
            
              
                Some Questions with Sketch Monoids
              
          
    Unifying Sketch Monoids

As I discussed in Algebra for Analytics, many sketch monoids, such as Bloom filters, HyperLogLog, and Count-min sketch, can be described as a hashing (projection) of items into a sparse space, then using two different commutative monoids to read and write respectively. Finally, the read monoids always have the property that (a + b) <= a, b and the write monoids has the property that (a + b) >= a, b.
##Some questions:

Note how similar CMS and Bloom filters are. The difference: bloom hashes k times onto the same space, CMS hashes k times onto a k orthogonal subspaces. Why the difference? Imagine a fixed space bloom that hashes onto k orthogonal spaces, or an overlapping CMS that hashes onto k * m length space. How do the error asymptotics change?
CMS has many query modes (dot product, etc...) can those generalize to other sketchs (HLL, Bloom)?
What other sketch or non-sketch algorithms can be expressed in this dual mo


## lm.pmml.xml
<?xml version="1.0"?>
<PMML version="4.1" xmlns="http://www.dmg.org/PMML-4_1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_1 http://www.dmg.org/v4-1/pmml-4-1.xsd">
 <Header copyright="Copyright (c) 2014 lanenga" description="Linear Regression Model">
  <Extension name="user" value="lanenga" extender="Rattle/PMML"/>
  <Application name="Rattle/PMML" version="1.4"/>
  <Timestamp>2014-01-07 15:33:34</Timestamp>
 </Header>
 <DataDictionary numberOfFields="4">
  <DataField name="sepal_width" optype="continuous" dataType="double"/>
  <DataField name="sepal_length" optype="continuous" dataType="double"/>

## cascalog_build.log
bash-3.2$ lein do sub install, deps, compile, repl
Could not find artifact lein-newnew:lein-newnew:pom:0.3.5 in central (http://repo1.maven.org/maven2)
Retrieving lein-newnew/lein-newnew/0.3.5/lein-newnew-0.3.5.pom (3k)
    from https://clojars.org/repo/
Could not find artifact stencil:stencil:pom:0.3.0 in central (http://repo1.maven.org/maven2)
Retrieving stencil/stencil/0.3.0/stencil-0.3.0.pom (3k)
    from https://clojars.org/repo/
Retrieving org/clojure/clojure/1.3.0/clojure-1.3.0.pom (5k)
    from http://repo1.maven.org/maven2/
Retrieving org/sonatype/oss/oss-parent/5/oss-parent-5.pom (4k)

## Cascalog.log
bash-3.2$ lein version
Leiningen 2.0.0-preview10 on Java 1.6.0_43 Java HotSpot(TM) 64-Bit Server VM
bash-3.2$ hadoop version
Warning: $HADOOP_HOME is deprecated.

Hadoop 1.0.3
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192
Compiled by hortonfo on Tue May  8 20:31:25 UTC 2012
From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be
bash-3.2$ lein clean
	import logging
	import uuid
	import time

	from mesos.interface import Scheduler
	from mesos.native import MesosSchedulerDriver
	from mesos.interface import mesos_pb2

	logging.basicConfig(level=logging.INFO)
	import org.apache.spark.mllib.linalg.distributed.RowMatrix
	import org.apache.spark.mllib.linalg._
	import org.apache.spark.{SparkConf, SparkContext}

	// To use the latest sparse SVD implementation, please build your spark-assembly after this
	// change: https://github.com/apache/spark/pull/1378

	// Input tsv with 3 fields: rowIndex(Long), columnIndex(Long), weight(Double), indices start with 0
	// Assume the number of rows is larger than the number of columns, and the number of columns is
	// smaller than Int.MaxValue
	Vagrant.configure("2") do \|config\|

	config.vm.box = "dummy"

	config.vm.provider :aws do \|aws, override\|
	aws.access_key_id = "..."
	aws.secret_access_key = "..."
	# you'll need to create the EC2 keypair used here -- I called it vagrant for easy tracking
	aws.keypair_name = "vagrant"
	# you'll want to use a group that has at least SSH open
	<?xml version="1.0"?>
	<PMML version="4.1" xmlns="http://www.dmg.org/PMML-4_1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_1 http://www.dmg.org/v4-1/pmml-4-1.xsd">
	<Header copyright="Copyright (c) 2014 lanenga" description="Linear Regression Model">
	<Extension name="user" value="lanenga" extender="Rattle/PMML"/>
	<Application name="Rattle/PMML" version="1.4"/>
	<Timestamp>2014-01-07 15:33:34</Timestamp>
	</Header>
	<DataDictionary numberOfFields="4">
	<DataField name="sepal_width" optype="continuous" dataType="double"/>
	<DataField name="sepal_length" optype="continuous" dataType="double"/>
	bash-3.2$ lein do sub install, deps, compile, repl
	Could not find artifact lein-newnew:lein-newnew:pom:0.3.5 in central (http://repo1.maven.org/maven2)
	Retrieving lein-newnew/lein-newnew/0.3.5/lein-newnew-0.3.5.pom (3k)
	from https://clojars.org/repo/
	Could not find artifact stencil:stencil:pom:0.3.0 in central (http://repo1.maven.org/maven2)
	Retrieving stencil/stencil/0.3.0/stencil-0.3.0.pom (3k)
	from https://clojars.org/repo/
	Retrieving org/clojure/clojure/1.3.0/clojure-1.3.0.pom (5k)
	from http://repo1.maven.org/maven2/
	Retrieving org/sonatype/oss/oss-parent/5/oss-parent-5.pom (4k)
	bash-3.2$ lein version
	Leiningen 2.0.0-preview10 on Java 1.6.0_43 Java HotSpot(TM) 64-Bit Server VM
	bash-3.2$ hadoop version
	Warning: $HADOOP_HOME is deprecated.

	Hadoop 1.0.3
	Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192
	Compiled by hortonfo on Tue May 8 20:31:25 UTC 2012
	From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be
	bash-3.2$ lein clean