Martin Laprise mlaprise

## more-llms.md

      
              1 file
            
          
              2 forks
            
          
              1 comment
            
          
              14 stars
            
          
                mlaprise
                / more-llms.md
            
            
              Created
              April 24, 2023 01:45
            
          
    Training open-source LLMs on ChatGPT output is a really bad idea.

Everyone is now racing to create open-source alternatives to compete with GPT3.5/GPT4. A common shortcut used by some teams to bootstrap their effort is to fine-tune their model on ChatGPT output. I used to think it was a good idea and totally fair play to do this. Actually, I still think it’s fair play. OpenAI effectively distilled the entire web into its models. They are saying themself that they are using publicly accessible information (mostly). So distilling their model is, in effect, distilling the public open web, so small Term of Service details aside, I don’t see major ethical problems with that. Right? Well, it’s not entirely true and I realized now that, even when ignoring the ethical considerations, using their output is a really bad idea.
First of all, from a purely technical point of view, as @yoavgo is explaining it beautifully in his recent post, there is no way to align LLMs correctly without the RLHF component. I encourag

  
## stablediffusionwalk.py
"""
draws many samples from a diffusion model by slerp'ing around
the noise space, and dumps frames to a directory. You can then
stitch up the frames with e.g.:

$ ffmpeg -r 10 -f image2 -s 512x512 -i out/frame%04d.jpg -vcodec libx264 -crf 10 -pix_fmt yuv420p test.mp4

THIS FILE IS HACKY AND NOT CONFIGURABLE READ THE CODE, MAKE EDITS TO PATHS AND SETTINGS YOU LIKE
THIS FILE IS HACKY AND NOT CONFIGURABLE READ THE CODE, MAKE EDITS TO PATHS AND SETTINGS YOU LIKE
THIS FILE IS HACKY AND NOT CONFIGURABLE READ THE CODE, MAKE EDITS TO PATHS AND SETTINGS YOU LIKE

## TDA_resources.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mlaprise
                / TDA_resources.md
            
            
              Created
              November 5, 2016 01:30
                — forked from calstad/TDA_resources.md
            
              
                List of resources for TDA
              
          
    Quick List of Resources for Topological Data Analysis with Emphasis on Machine Learning

This is just a quick list of resourses on TDA that I put together for @rickasaurus after he was asking for links to papers, books, etc on Twitter and is by no means an exhaustive list.
Survey Papers

Both Carlsson's and Ghrist's survey papers offer a very good introduction to the subject

Topology and Data by Gunnar Carlsson
Barcodes: The Persistent Topology of Data by Robert Ghrist

Other Papers and Web Resources


Extracting insights from the shape of complex data using topology A good introductory paper in Nature on the Mapper algorithm.


## gist:3feba17d2aa27dfe91716b42b30d355e
{u'anonymousId': u'9c382115-1b4e-4d54-9b91-515eb627b284',
 u'category': None,
 u'channel': u'client',
 u'context.ip': u'82.11.127.141',
 u'context.library.name': u'analytics.js',
 u'context.library.version': u'2.11.1',
 u'context.page.path': u'/inbox',
 u'context.page.referrer': u'',
 u'context.page.search': u'',
 u'context.page.title': u'Klood',

## gist:cb07eb96c21cd8338a059104694c381c
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA8vqEl2gcm8gG8o7Jopv8l2MSpKiqp/QZJRIqeVVGzo8fsQoP+aYxtfLXAuuAfwgIwbRBD7h5Fb4AI7GUsIFSGSP/n5sMW01hhU2mEun3e+EO4qW1O95U95tVLE/fZu4N6if4Ep/sY9ilBIjkoWDxdZNajwT6BTGdvqLHRTx41KzvE08J7xNqC/B27GECPnUvuRhQD/CofiRwun1sJKXQ2TtZ937HoF5TyP7NbpsVLVZYSFI7HXK4ij6bbJgxlVK1EWyxh5Dcjnh7CRv/0yD1ldJsWHnapSeDDM0mHFBboHkSwdqlQNUcPRUE76ou3sVih8DD+RRsh6z13CaE2CQgOQ== mlaprise@mlaprise-laptop

## sorting by relevance
### Sort news by authority
GET documents-2016-04-12/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "viagra",
            "fields": ["content.title", "content.text"]

## gist:54feda33e5f58e79188f
Backend error message
---------------------
java.lang.IllegalArgumentException: Invalid format: "31bf8798-5495-42f4-8eab-76c906ef5470" is malformed at "bf8798-5495-42f4-8eab-76c906ef5470"
	at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:873)
	at org.apache.pig.piggybank.evaluation.datetime.truncate.ISOHelper.parseDateTime(ISOHelper.java:65)
	at org.apache.pig.piggybank.evaluation.datetime.truncate.ISOToHour.exec(ISOToHour.java:91)
	at org.apache.pig.piggybank.evaluation.datetime.truncate.ISOToHour.exec(ISOToHour.java:83)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:434)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:340)

## gist:74948d5e351238ffe934
Traceback (most recent call last):
  File "/home/cogtree/extract_post_cycle.py", line 207, in <module>
    get_post_cycles(apikey)
  File "/home/cogtree/extract_post_cycle.py", line 183, in get_post_cycles
    .saveAsTextFile("s3n://parsely-pyspark/output/post_cycle_logs/{}.gz".format(apikey)))
  File "/opt/spark/python/pyspark/rdd.py", line 1288, in saveAsTextFile
    keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path)
  File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
  File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o9885.saveAsTextFile.

## lorenz3d.py
#!/usr/bin/env python

import vtk
from scipy import *
from scipy import integrate
from pylab import *
from vtk.util.colors import tomato, banana

nbrPoints = 5000


## mandelbrot.py
from numpy import *
import matplotlib.pylab as pl

maxIteration = 128
z_min = -2-1j
z_max = 1+1j

# Set the image size here
imageSize = [512,512]
image = zeros(imageSize,int)
	"""
	draws many samples from a diffusion model by slerp'ing around
	the noise space, and dumps frames to a directory. You can then
	stitch up the frames with e.g.:

	$ ffmpeg -r 10 -f image2 -s 512x512 -i out/frame%04d.jpg -vcodec libx264 -crf 10 -pix_fmt yuv420p test.mp4

	THIS FILE IS HACKY AND NOT CONFIGURABLE READ THE CODE, MAKE EDITS TO PATHS AND SETTINGS YOU LIKE
	THIS FILE IS HACKY AND NOT CONFIGURABLE READ THE CODE, MAKE EDITS TO PATHS AND SETTINGS YOU LIKE
	THIS FILE IS HACKY AND NOT CONFIGURABLE READ THE CODE, MAKE EDITS TO PATHS AND SETTINGS YOU LIKE
	{u'anonymousId': u'9c382115-1b4e-4d54-9b91-515eb627b284',
	u'category': None,
	u'channel': u'client',
	u'context.ip': u'82.11.127.141',
	u'context.library.name': u'analytics.js',
	u'context.library.version': u'2.11.1',
	u'context.page.path': u'/inbox',
	u'context.page.referrer': u'',
	u'context.page.search': u'',
	u'context.page.title': u'Klood',
	### Sort news by authority
	GET documents-2016-04-12/_search
	{
	"query": {
	"bool": {
	"must": [
	{
	"multi_match": {
	"query": "viagra",
	"fields": ["content.title", "content.text"]
	Backend error message
	---------------------
	java.lang.IllegalArgumentException: Invalid format: "31bf8798-5495-42f4-8eab-76c906ef5470" is malformed at "bf8798-5495-42f4-8eab-76c906ef5470"
	at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:873)
	at org.apache.pig.piggybank.evaluation.datetime.truncate.ISOHelper.parseDateTime(ISOHelper.java:65)
	at org.apache.pig.piggybank.evaluation.datetime.truncate.ISOToHour.exec(ISOToHour.java:91)
	at org.apache.pig.piggybank.evaluation.datetime.truncate.ISOToHour.exec(ISOToHour.java:83)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:434)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:340)
	Traceback (most recent call last):
	File "/home/cogtree/extract_post_cycle.py", line 207, in <module>
	get_post_cycles(apikey)
	File "/home/cogtree/extract_post_cycle.py", line 183, in get_post_cycles
	.saveAsTextFile("s3n://parsely-pyspark/output/post_cycle_logs/{}.gz".format(apikey)))
	File "/opt/spark/python/pyspark/rdd.py", line 1288, in saveAsTextFile
	keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path)
	File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
	File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
	py4j.protocol.Py4JJavaError: An error occurred while calling o9885.saveAsTextFile.
	#!/usr/bin/env python

	import vtk
	from scipy import *
	from scipy import integrate
	from pylab import *
	from vtk.util.colors import tomato, banana

	nbrPoints = 5000
	from numpy import *
	import matplotlib.pylab as pl

	maxIteration = 128
	z_min = -2-1j
	z_max = 1+1j

	# Set the image size here
	imageSize = [512,512]
	image = zeros(imageSize,int)