Skip to content

Instantly share code, notes, and snippets.

View oneryalcin's full-sized avatar

Mehmet Öner Yalçın oneryalcin

View GitHub Profile
import openai
import pinecone
from sentence_transformers import SentenceTransformer
class GPTConversationManager:
def __init__(self, api_key, pinecone_api_key, index_name):
self.api_key = api_key
openai.api_key = self.api_key
self.conversation_history = []
self.pinecone_api_key = pinecone_api_key
@akhan619
akhan619 / tokenizers.md
Last active October 31, 2023 10:22
Exploring Tokenizers from Hugging Face

Exploring Tokenizers from Hugging Face

Hugging Face (HF) has made NLP (Natural Language Processing) a breeze. In this post, we are going to take a look at tokenization using a hands on approach with the help of the Tokenizers library. We are going to load a real world dataset containing 10-K filings of public firms and see how to train a tokenizer from scratch based on the BERT tokenization scheme. In the process we will understand tokenization in detail and some gotchas to keep an eye out for.

Background on NLP (Optional)

If you already have an understanding of the NLP pipeline, you can safely skip this section.

For any NLP task, one of the first steps is pre-processing the data so that it can be fed into our NLP models. For those new to NLP, the general pipeline for any NLP task (text classification, question answering, etc.) is as follows:

@stettix
stettix / things-i-believe.md
Last active March 20, 2024 17:45
Things I believe

Things I believe

This is a collection of the things I believe about software development. I have worked for years building backend and data processing systems, so read the below within that context.

Agree? Disagree? Feel free to let me know at @JanStette. See also my blog at www.janvsmachine.net.

Fundamentals

Keep it simple, stupid. You ain't gonna need it.

@tarbaig
tarbaig / backend-architectures.md
Created April 13, 2018 08:53 — forked from ngocphamm/backend-architectures.md
Backend Architectures
@ipmb
ipmb / 0_default_tree.md
Last active May 17, 2022 00:37
Django Logging Variations

Default Django Logging Tree

app.py

#!/usr/bin/env python
import os

import django
import logging_tree
@kevin-smets
kevin-smets / 1_kubernetes_on_macOS.md
Last active May 5, 2024 10:12
Local Kubernetes setup on macOS with minikube on VirtualBox and local Docker registry

Requirements

Minikube requires that VT-x/AMD-v virtualization is enabled in BIOS. To check that this is enabled on OSX / macOS run:

sysctl -a | grep machdep.cpu.features | grep VMX

If there's output, you're good!

Prerequisites

@bhtucker
bhtucker / upsert.py
Last active April 3, 2024 15:56
A demonstration of Postgres upserts in SQLAlchemy
"""
Upsert gist
Requires at least postgres 9.5 and sqlalchemy 1.1
Initial state:
[]
Initial upsert:
@ivanleoncz
ivanleoncz / flask_app_logging.py
Last active May 23, 2021 07:25
Demonstration of logging feature for a Flask App.
#/usr/bin/python3
""" Demonstration of logging feature for a Flask App. """
from logging.handlers import RotatingFileHandler
from flask import Flask, request, jsonify
from time import strftime
__author__ = "@ivanleoncz"
import logging
@conormm
conormm / r-to-python-data-wrangling-basics.md
Last active April 24, 2024 18:22
R to Python: Data wrangling with dplyr and pandas

R to python data wrangling snippets

The dplyr package in R makes data wrangling significantly easier. The beauty of dplyr is that, by design, the options available are limited. Specifically, a set of key verbs form the core of the package. Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe. Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R. The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).

dplyr is organised around six key verbs:

@vasanthk
vasanthk / System Design.md
Last active May 16, 2024 20:21
System Design Cheatsheet

System Design Cheatsheet

Picking the right architecture = Picking the right battles + Managing trade-offs

Basic Steps

  1. Clarify and agree on the scope of the system
  • User cases (description of sequences of events that, taken together, lead to a system doing something useful)
    • Who is going to use it?
    • How are they going to use it?