Skip to content

Instantly share code, notes, and snippets.

@ljnmedium
ljnmedium / add_data.py
Created September 29, 2023 07:29
add_data.py
values = embedd_model.encode([b['content'] for b in batch])
sparse_values = sparsed_model.encode([b['content'] for b in batch])
# Create unique IDs
ids = [str(b['metadata']['id']) for b in batch]
# Add all to upsert list
to_upsert = [{'id': i, 'values': v, 'metadata':m , 'sparse_values': sv} for (i,v,m,sv) in zip(ids,values, metas, sparse_values)]
# Upsert/insert these records to pinecone
@ljnmedium
ljnmedium / managing_index.py
Created September 29, 2023 07:29
managing_index.py
index.describe_index_stats()
@ljnmedium
ljnmedium / pinecone_setup.py
Created September 29, 2023 07:28
pinecone_setup.py
pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENVIRONMENT")
index = pinecone.Index("projet_esg")
@ljnmedium
ljnmedium / table_qa_models.md
Created September 29, 2023 07:23
table_qa_models.md
Direct query

Information appearing in text (entity extraction, summarization, find relevant paragraphs, etc … ).
Indirect query
Inferenced information (mathematical calculation, comparison, conclusion, etc …).
Simple text
Text containing descriptions excluding table.
Complexity: +
Accuracy: +++
Complexity: ++
Accuracy: +++
Complex textText
@ljnmedium
ljnmedium / llm_size.md
Created September 28, 2023 09:45
llm_size.md
Provider Model Number of parameters
Meta with Microsoft LLama 2 7B, 13B, 32B, 65.2B
Meta LLama 7B, 13B, 70B
Technology Innovation Institute of UAE Flacon LLM 7B, 40B
Stanford’s CRFM Alpaca 7B
Google Plan-T5 80M, 250M, 780M, 3B, 11B
MPT MosaicML 7B, 30B
@ljnmedium
ljnmedium / providers_llm.md
Last active September 28, 2023 09:43
providers_llm.md
Provider Model Cost for input Cost for output Cost per request.
OpenAI text-davinci-004 $0.03/ 1K tokens $0.06/ 1K tokens 0
OpenAI text-davinci-003 $0.02/ 1K tokens $0.02/ 1K tokens 0
OpenAI text-davinci-002 $0.002/ 1K tokens $0.002/ 1K tokens 0
OpenAI gpt-3.5-turbo $0.002/ 1K tokens $0.002/ 1K tokens 0
[Cohere](https://cohere.com/pri
@ljnmedium
ljnmedium / alex_key_feature.md
Created September 28, 2023 09:34
alex_key_feature.md
API access solution - 3rd party model. On-premise solution - open source model.
R&D developpement The low initial cost, both in terms of time and money, allows us to quickly reach a Minimum Viable Product (MVP). The procedure for model parameter optimization and MLops is overseen by a third-party e
@ljnmedium
ljnmedium / tab.md
Created July 13, 2023 15:56
tab.md

| | start | length | label | text

@ljnmedium
ljnmedium / pipeline.md
Created July 12, 2023 13:15
pipeline.md
Task Model version Comments
Voice Activity Detection Multilingual Marblenet Other versions exist trained on telephonic conversation or only on english data
Speaker Embeddings Titanet Large Smaller version of the model exists.
Multiscale Clustering Diarization MSDD Telephonic Specifically trained on telephonic conversations which makes it suitable for similar use cases.
@ljnmedium
ljnmedium / conclu.md
Created July 12, 2023 13:12
conclu.md
Model Parameter Name Value
General Input sample rate 16 000
Batch size 16
VAD Window length 0.8
Shift length 0.04
Pad onset 0.1
Pad offset -0.05
Speaker embedding Window length [1.5,1.25,1.0,0.75,0.5]
Shift length [0.75,0.625,0.5,0.375,0.25]