Russell Jurney rjurney

## conversion.py
import numpy as np
import pk
import seaborn as sns


drug = pk.Drug(hl=8, t_max=1)
# 5 day simulation
conc = drug.concentration(
    60,
    1,

## AREADME.md

      
              4 files
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                rjurney
                / AREADME.md
            
            
              Last active
              December 14, 2023 17:42
            
              
                Excellent name similarity results between sentence encoders 'sentence-transformers/all-MiniLM-L12-v2' and 'paraphrase-multilingual-MiniLM-L12-v2'
              
          
    All vs Paraphrase Mini-LM Model Comparisons

This experiment compares multiple methods of sentence encoding on people's names - including across character sets - using the following models:

sentence-transformers/all-MiniLM-L12-v2
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

Notes

Compared to the names, JSON tends to compress scores together owing to overlapping text in formatting: field names, quotes and brackets. You can see in the name pairs name length is a source of error. The dates behave well in the JSON records.

  
## gspread-gpt.py
import gspread
from gspread_dataframe import set_with_dataframe
import pandas as pd

# Assume df_users and df_companies are your DataFrames
df_users = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Profile': ['alice123', 'bob456']})
df_companies = pd.DataFrame({'Company': ['TechCorp', 'BizInc'], 'Industry': ['Tech', 'Finance']})

# Step 1: Authenticate to Google Sheets API
# (You'll need to follow the gspread authentication steps which involve creating a service account and obtaining a JSON credentials file)

## make_graphframes_nodes.py
from pyspark.sql.types import StructField, IntegerType, LongType, StringType, TimestampType


def add_missing_columns(df, all_columns):
    """Add any missing columns from any DataFrame among several we want to merge."""
    for col_name, schema_field in all_columns:
        if col_name not in df.columns:
            df = df.withColumn(col_name, F.lit(None).cast(schema_field.dataType))
    return df


## docker-compose.yml
version: "3.8"

services:

  opensearch-node1: # This is also the hostname of the container within the Docker network (i.e. https://opensearch-node1/)
    image: opensearchproject/opensearch:latest # Specifying the latest available image - modify if you want a specific version
    container_name: opensearch-node1
    environment:
      - cluster.name=opensearch-cluster # Name the cluster
      - node.name=opensearch-node1 # Name the node that will run in this container

## ChatGPT-4-prompt.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                rjurney
                / ChatGPT-4-prompt.md
            
            
              Created
              October 3, 2023 18:14
            
              
                Seeking feedback on my ChatGPT prompting. What can I do to improve this result?
              
          
    I have run the following code to compute dimension reduction with unlabeled UMAP and DBScan for clustering to group dissimilar names for the same academic journals into clusters representing each journal.
The UMAP code is:
# Step 2: Dimension Reduction with UMAP
reducer = umap.UMAP()
reduced_embeddings = reducer.fit_transform(scaled_embeddings)

  
## keyboardshortcuts.json
// Place your key bindings in this file to override the defaultsauto[]
[
    {
        "key": "cmd+l",
        "command": "workbench.action.gotoLine"
    },
    {
        "key": "ctrl+g",
        "command": "-workbench.action.gotoLine"
    },

## cluster_to_label.py
    #
    # Create a pd.DataFrame of the nodes for analysis in a notebook
    #

    # Extract nodes and their attributes into a list of dictionaries
    node_data = [{**{"node": node}, **attr} for node, attr in G.nodes(data=True)]

    # Convert the list of dictionaries into a DataFrame
    node_df = pd.DataFrame(node_data)

## networkx_matches_dgl.py
# The DGL network has sentence encoded JSON for node features.
In [26]: g
Out[26]:
[Graph(num_nodes=27770, num_edges=352807,
       ndata_schemes={'x': Scheme(shape=(384,), dtype=torch.float64)}
       edata_schemes={})]

# The networkx network parsed the records and has inidividual fields for analysis
In [27]: G
Out[27]: <networkx.classes.digraph.DiGraph at 0x14fe5c4c0>

## dmesg
# ChatGPT says:

1) The NVIDIA module is having trouble retrieving NUMA (Non-Uniform Memory Access) node information for a specific device. NUMA is a method for setting up memory in multiprocessor systems to better match memory access times with processor cores. This might be related to the multi-GPU setup you mentioned.
2) The NVIDIA kernel module for your graphics cards is being loaded. The version is 530.30.02, and it was compiled on Wed Feb 22 04:11:39 UTC 2023.
3) This is a warning from the ACPI subsystem in the kernel, related to device configuration and power management. The specific warning pertains to an argument type mismatch for a given method. Such ACPI warnings can result from inconsistencies between the computer's BIOS/UEFI firmware and the OS's expectations.
4) Lastly, the nvidia-drm, nvidia_uvm, and other NVIDIA related entries refer to the NVIDIA graphics driver's components being initialized. The Direct Rendering Manager (drm) is used in graphics pipelines, while nvidia_uvm refers to NVIDIA'
	import numpy as np
	import pk
	import seaborn as sns


	drug = pk.Drug(hl=8, t_max=1)
	# 5 day simulation
	conc = drug.concentration(
	60,
	1,
	import gspread
	from gspread_dataframe import set_with_dataframe
	import pandas as pd

	# Assume df_users and df_companies are your DataFrames
	df_users = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Profile': ['alice123', 'bob456']})
	df_companies = pd.DataFrame({'Company': ['TechCorp', 'BizInc'], 'Industry': ['Tech', 'Finance']})

	# Step 1: Authenticate to Google Sheets API
	# (You'll need to follow the gspread authentication steps which involve creating a service account and obtaining a JSON credentials file)
	from pyspark.sql.types import StructField, IntegerType, LongType, StringType, TimestampType


	def add_missing_columns(df, all_columns):
	"""Add any missing columns from any DataFrame among several we want to merge."""
	for col_name, schema_field in all_columns:
	if col_name not in df.columns:
	df = df.withColumn(col_name, F.lit(None).cast(schema_field.dataType))
	return df
	version: "3.8"

	services:

	opensearch-node1: # This is also the hostname of the container within the Docker network (i.e. https://opensearch-node1/)
	image: opensearchproject/opensearch:latest # Specifying the latest available image - modify if you want a specific version
	container_name: opensearch-node1
	environment:
	- cluster.name=opensearch-cluster # Name the cluster
	- node.name=opensearch-node1 # Name the node that will run in this container
	// Place your key bindings in this file to override the defaultsauto[]
	[
	{
	"key": "cmd+l",
	"command": "workbench.action.gotoLine"
	},
	{
	"key": "ctrl+g",
	"command": "-workbench.action.gotoLine"
	},
	#
	# Create a pd.DataFrame of the nodes for analysis in a notebook
	#

	# Extract nodes and their attributes into a list of dictionaries
	node_data = [{{"node": node}, attr} for node, attr in G.nodes(data=True)]

	# Convert the list of dictionaries into a DataFrame
	node_df = pd.DataFrame(node_data)
	# The DGL network has sentence encoded JSON for node features.
	In [26]: g
	Out[26]:
	[Graph(num_nodes=27770, num_edges=352807,
	ndata_schemes={'x': Scheme(shape=(384,), dtype=torch.float64)}
	edata_schemes={})]

	# The networkx network parsed the records and has inidividual fields for analysis
	In [27]: G
	Out[27]: <networkx.classes.digraph.DiGraph at 0x14fe5c4c0>
	# ChatGPT says:

	1) The NVIDIA module is having trouble retrieving NUMA (Non-Uniform Memory Access) node information for a specific device. NUMA is a method for setting up memory in multiprocessor systems to better match memory access times with processor cores. This might be related to the multi-GPU setup you mentioned.
	2) The NVIDIA kernel module for your graphics cards is being loaded. The version is 530.30.02, and it was compiled on Wed Feb 22 04:11:39 UTC 2023.
	3) This is a warning from the ACPI subsystem in the kernel, related to device configuration and power management. The specific warning pertains to an argument type mismatch for a given method. Such ACPI warnings can result from inconsistencies between the computer's BIOS/UEFI firmware and the OS's expectations.
	4) Lastly, the nvidia-drm, nvidia_uvm, and other NVIDIA related entries refer to the NVIDIA graphics driver's components being initialized. The Direct Rendering Manager (drm) is used in graphics pipelines, while nvidia_uvm refers to NVIDIA'