Russell Jurney rjurney

## academic.py
import logging
import os

from langchain.chains import ConversationalRetrievalChain
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory
from langchain.vectorstores import Chroma

## centrality.groovy
// Use this to start up a session
conf = new BaseConfiguration()
conf.setProperty("storage.directory", "/Users/rjurney/Software/marketing/titan/data")
conf.setProperty("storage.backend", "berkeleyje")
graph = TitanFactory.open(conf)

// Get a graph traverser
g = graph.traversal()

// Various centralities to use as features - JSONize and save

## Dockerfile
# Start from a Jupyter Docker Stacks version
FROM jupyter/scipy-notebook:python-3.10.11

# Work in the jovyan user's home directory
WORKDIR "/home/${NB_USER}"

# Needed for poetry package management: no venv, latest poetry, GRANT_SUDO don't work :(
ENV POETRY_VIRTUALENVS_CREATE=false \
    POETRY_VERSION=1.4.2 \
    GRANT_SUDO=yes

## assortativity.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                rjurney
                / assortativity.md
            
            
              Created
              July 6, 2023 06:35
            
              
                What is wrong with this Markdown? Why won't Jupyter parse it?
              
          
    Assortativity


Assortativity in networks refers to a correlation pattern observed in real-world networks where nodes are preferentially connected to other nodes that are like (or unlike) them in some way. This is essentially a bias in connection preference.

--ChatGPT4
A related term is assortative mixing:

In the study of complex networks, assortative mixing, or assortativity, is a bias in favor of connections between network nodes with similar characteristics. In the specific case of social networks, assortative mixing is also known as homophily. The rarer disassortative mixing is a bias in favor of connections between dissimilar nodes.


## poetry.toml
[virtualenvs]
create = false

## random_np_id.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                rjurney
                / random_np_id.md
            
            
              Last active
              February 4, 2023 00:58
            
          
    Add a random ID column to a pandas DataFrame using Numpy

I needed to generate random IDs to partition some data for Dask when writing a Parquet file from pandas for a less expensive operation where multiple cores were not required. I didn't like any of the answers that I found, so I decided to hack this recipe myself to remind myself I can still work from API docs :)
I think for efficiency you want to do this via [numpy.random.randint][1] and then make a column out of it via a [pandas.Series][2], since a Series is just a [numpy.ndarray][3] with some dressing added.

One-dimensional ndarray with axis labels (including time series).

import random

  
## first_try.py
poetry add "modin[ray]"
Using version ^0.15.2 for modin

Updating dependencies
Resolving dependencies... (60.3s)

Writing lock file

Package operations: 31 installs, 0 updates, 0 removals

## datasets.js
{
  "entity_id": "<UUID4>",
  "entity_type": "node",
  "entity_class": "",
  "@key": "conf\/www\/Ericsson07",
  "@cdate": "2021-01-01",
  "@mdate": "2022-08-31",
  "@publtype": NaN,
  "address": "",
  // Note: there is another form where author is just a string - must ETL

## 00README.md

      
              11 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                rjurney
                / 00README.md
            
            
              Last active
              August 25, 2022 10:41
            
              
                DBLP Types, Schemas and Example Records
              
          
    DBLP Training Data

I need to create a network with a set of edges that include a SAME_AS edge type and a NOT_SAME_AS edge type for entity resolution to serve as training data to enable @tanmoyio to proceed with training an entity resolution model in #3.
DBLP Datasets

DBLP is a database of scholarly research in computer science.
The datasets we use are the actual DBLP data and a set of labels for entity resolution of authors.

  
## test_etl.py
def test_graphlet_etl(spark_session_context) -> None:
    """Test the classes with Spark UDFs."""

    spark, sc = spark_session_context

    @F.pandas_udf("long")
    def text_runtime_to_minutes_pandas_udf(x: pd.Series) -> pd.Series:
        """text_runtime_to_minutes_pandas_udf PySpark pandas_udf to run text_runtime_to_minutes.

        Parameters
	import logging
	import os

	from langchain.chains import ConversationalRetrievalChain
	from langchain.document_loaders import PyPDFDirectoryLoader
	from langchain.embeddings import OpenAIEmbeddings
	from langchain.llms import OpenAI
	from langchain.memory import ConversationBufferMemory
	from langchain.vectorstores import Chroma
	// Use this to start up a session
	conf = new BaseConfiguration()
	conf.setProperty("storage.directory", "/Users/rjurney/Software/marketing/titan/data")
	conf.setProperty("storage.backend", "berkeleyje")
	graph = TitanFactory.open(conf)

	// Get a graph traverser
	g = graph.traversal()

	// Various centralities to use as features - JSONize and save
	# Start from a Jupyter Docker Stacks version
	FROM jupyter/scipy-notebook:python-3.10.11

	# Work in the jovyan user's home directory
	WORKDIR "/home/${NB_USER}"

	# Needed for poetry package management: no venv, latest poetry, GRANT_SUDO don't work :(
	ENV POETRY_VIRTUALENVS_CREATE=false \
	POETRY_VERSION=1.4.2 \
	GRANT_SUDO=yes
	poetry add "modin[ray]"
	Using version ^0.15.2 for modin

	Updating dependencies
	Resolving dependencies... (60.3s)

	Writing lock file

	Package operations: 31 installs, 0 updates, 0 removals
	{
	"entity_id": "<UUID4>",
	"entity_type": "node",
	"entity_class": "",
	"@key": "conf\/www\/Ericsson07",
	"@cdate": "2021-01-01",
	"@mdate": "2022-08-31",
	"@publtype": NaN,
	"address": "",
	// Note: there is another form where author is just a string - must ETL
	def test_graphlet_etl(spark_session_context) -> None:
	"""Test the classes with Spark UDFs."""

	spark, sc = spark_session_context

	@F.pandas_udf("long")
	def text_runtime_to_minutes_pandas_udf(x: pd.Series) -> pd.Series:
	"""text_runtime_to_minutes_pandas_udf PySpark pandas_udf to run text_runtime_to_minutes.

	Parameters