Michel Nivard MichelNivard

## mediation.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                MichelNivard
                / mediation.md
            
            
              Last active
              August 14, 2023 17:25
            
              
                Mediation model in GenomicSEM
              
          
    Mediation model in GenomicSEM

As part of a genome wide association study (GWAS) it has become common practice to find traits genetically correlated to your trait of interest. The goal for these type of analyses is generally to better understand the ethiology of a trait. This is also done in this GWAS of social outcomes by Hill et al.(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5130721/)
Their GWAS of social deprivation, and income, and uses LD score regression to correlate these traits to several other traits. High genetic correlations exist betweem income and  edicationa attainment and between income and ADHD.
A very obvious follow up question would be whether ADHD effects income directly, or whether the relation between income and ADHD is mediated by education. Perhaps the effects of ADHD on income are entirely attributable to a reduction in educational attainment caused by ADHD. We are going to fit a model to try and awnser this question.
**You can use GenomicSEMto awnser this question, you can fi

  
## Script_barplot.R
# I used this online tool to extract the data from the scatter plot:

# https://apps.automeris.io/wpd/

# need Hmisc:
install.packages("Hmisc")

# read the data from my mac into R:
ssgac <- read.csv("ssgac.csv", header=FALSE)

## Better Genomic Control in GenomicSEM.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                MichelNivard
                / Better Genomic Control in GenomicSEM.md
            
            
              Last active
              June 25, 2020 06:15
            
          
    Minor update: Genetic correlations and Genomic Control(GC) in GenomicSEM

This document describes a minor update to genomic SEM that provides the user with the option to control how the LD score intercept is used to apply genomic control to GenomicSEM GWAS and code to get quick initial genetic correlations and the standard errors of the genetic correlation from the ldsc() function.
Better documentation and options for Genomic Control.

Behind the scenes, and poorly documented (there were some comments in the code, that’s it), GenomicSEM was applying Genomic Control. The LD score regression intercept produces an expectation for the mean chi-square statistic under the null. As a chi2 distribution with 1 df has a mean of 1.0, an LDSC intercept greater than 1.0 can be used as an index of inflation of the test statistic attributable to uncontrolled confounding (Bulik Sullivan et al. 2015). Specifically, we estimate the univariate LD score intercept and inflate the SE of the estimated SNP-trait covarianc

  
## BBC News Train.csv
ArticleId,Text,Category
1833,worldcom ex-boss launches defence lawyers defending former worldcom chief bernie ebbers against a battery of fraud charges have called a company whistleblower as their first witness.  cynthia cooper  worldcom s ex-head of internal accounting  alerted directors to irregular accounting practices at the us telecoms giant in 2002. her warnings led to the collapse of the firm following the discovery of an $11bn (£5.7bn) accounting fraud. mr ebbers has pleaded not guilty to charges of fraud and conspiracy.  prosecution lawyers have argued that mr ebbers orchestrated a series of accounting tricks at worldcom  ordering employees to hide expenses and inflate revenues to meet wall street earnings estimates. but ms cooper  who now runs her own consulting business  told a jury in new york on wednesday that external auditors arthur andersen had approved worldcom s accounting in early 2001 and 2002. she said andersen had given a  green light  to the procedures and practices used by worldcom. mr

## Search_guided_GPT.r
Sys.setenv( # get an API key here: https://platform.openai.com/account/api-keys
  OPENAI_API_KEY = 'YOUR_API_KEY_HERE'
)


### Make a text "database" to search:
library(tm)
library(dplyr)
library(corpus)
library(rjson)

## chatGPT_own_PDFs
import tkinter
import customtkinter
from bs4 import BeautifulSoup

# Langchain loads:
from langchain.document_loaders import DirectoryLoader,PagedPDFSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS, Qdrant
from langchain.chains.qa_with_sources import load_qa_with_sources_chain

## little_expriment.py
from transformers import pipeline
from transformers import GPTJForCausalLM
from transformers import GPTJForCausalLM, AutoTokenizer
import torch

model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")

prompt = (
     "Monday Diary Entry: Had a busy weekend and started the day tired and empty, the good weather helps lift the mood. Worked fro home and spend (too much) time on learning about language models. Had 2 or 3 productive calls, tired and prob still a bit sick today, which put me in a somewhat somber mood. Had a long bath which maybe helped?"

## full_text.py
cat author_manuscript_txt.incr.2022-12-19/*/*.txt > merged-file.txt


from datasets import load_dataset
dataset = load_dataset('text', data_files="merged-file.txt")
print(dataset)
dataset2 = dataset.filter(lambda x: len(x["text"]) > 500)
print(dataset2)

## Example _transcript.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                MichelNivard
                / Example _transcript.md
            
            
              Last active
              March 6, 2023 11:52
            
              
                Example long training data
              
          
    Speaker 0:
You wrote a piece a follow-up piece to your oral history titled, there is no replacement for black Twitter. I think back in November, What do you think we lose if we lose black Twitter? Tell
Speaker 1:
me not to meet your Mac, but we lose everything. I'm John Favreau. Welcome to offline.
Speaker 0:

  
## train.py
import torch
from torch.utils.data import Dataset, DataLoader
from transformers import GPT2Tokenizer, GPT2LMHeadModel, TextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments

# Set the path to the text file to fine-tune on
path_to_file = "path/to/text/file.txt"

# Load the tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
	# I used this online tool to extract the data from the scatter plot:

	# https://apps.automeris.io/wpd/

	# need Hmisc:
	install.packages("Hmisc")

	# read the data from my mac into R:
	ssgac <- read.csv("ssgac.csv", header=FALSE)
	Sys.setenv( # get an API key here: https://platform.openai.com/account/api-keys
	OPENAI_API_KEY = 'YOUR_API_KEY_HERE'
	)


	### Make a text "database" to search:
	library(tm)
	library(dplyr)
	library(corpus)
	library(rjson)
	import tkinter
	import customtkinter
	from bs4 import BeautifulSoup

	# Langchain loads:
	from langchain.document_loaders import DirectoryLoader,PagedPDFSplitter
	from langchain.embeddings.openai import OpenAIEmbeddings
	from langchain.text_splitter import CharacterTextSplitter
	from langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS, Qdrant
	from langchain.chains.qa_with_sources import load_qa_with_sources_chain
	from transformers import pipeline
	from transformers import GPTJForCausalLM
	from transformers import GPTJForCausalLM, AutoTokenizer
	import torch

	model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.float16)
	tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")

	prompt = (
	"Monday Diary Entry: Had a busy weekend and started the day tired and empty, the good weather helps lift the mood. Worked fro home and spend (too much) time on learning about language models. Had 2 or 3 productive calls, tired and prob still a bit sick today, which put me in a somewhat somber mood. Had a long bath which maybe helped?"
	cat author_manuscript_txt.incr.2022-12-19//.txt > merged-file.txt



	from datasets import load_dataset
	dataset = load_dataset('text', data_files="merged-file.txt")
	print(dataset)
	dataset2 = dataset.filter(lambda x: len(x["text"]) > 500)
	print(dataset2)
	import torch
	from torch.utils.data import Dataset, DataLoader
	from transformers import GPT2Tokenizer, GPT2LMHeadModel, TextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments

	# Set the path to the text file to fine-tune on
	path_to_file = "path/to/text/file.txt"

	# Load the tokenizer and model
	tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
	model = GPT2LMHeadModel.from_pretrained('gpt2')