Skip to content

Instantly share code, notes, and snippets.

View faustusdotbe's full-sized avatar

Simon Hengchen faustusdotbe

View GitHub Profile
@younesbelkada
younesbelkada / finetune_llama_v2.py
Last active May 14, 2024 05:46
Fine tune Llama v2 models on Guanaco Dataset
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
@sneakers-the-rat
sneakers-the-rat / clean_pdf.sh
Last active February 14, 2024 16:52
Strip PDF Metadata
# --------------------------------------------------------------------
# Recursively find pdfs from the directory given as the first argument,
# otherwise search the current directory.
# Use exiftool and qpdf (both must be installed and locatable on $PATH)
# to strip all top-level metadata from PDFs.
#
# Note - This only removes file-level metadata, not any metadata
# in embedded images, etc.
#
# Code is provided as-is, I take no responsibility for its use,
@quadrismegistus
quadrismegistus / gensim_word2vec_measure_semantic_shift_by_local_neighborhood.py
Last active January 24, 2023 14:52
This function measures the amount of semantic shift of a given word between two gensim word2vec models. It is a basic implementation of William Hamilton (@williamleif) et al's measure of semantic change proposed in their paper "Cultural Shift or Linguistic Drift?" (https://arxiv.org/abs/1606.02821), which they call the "local neighborhood measure."
def measure_semantic_shift_by_neighborhood(model1,model2,word,k=25,verbose=False):
"""
Basic implementation of William Hamilton (@williamleif) et al's measure of semantic change
proposed in their paper "Cultural Shift or Linguistic Drift?" (https://arxiv.org/abs/1606.02821),
which they call the "local neighborhood measure." They find this measure better suited to understand
the semantic change of nouns owing to "cultural shift," or changes in meaning "local" to that word,
rather than global changes in language ("linguistic drift") use that are better suited to a
Procrustes-alignment method (also described in the same paper.)
Arguments are:
@aschmu
aschmu / pypi-mirror.md
Last active May 2, 2024 12:34
Setting up a PyPi mirror

I) Introduction

Taken from: [https://groups.google.com/forum/#!topic/devpi-dev/S-3ioWILTiY]).

We wanted to do some python developpement but we had a problem : our working network is completely isolated from any internet access, only having servers providing services like distribution packages repository mirror and web servers. The question then was how to make a full pypi mirror for usig pip install and pip search.

This is possible to be done using devpi, a web server, bandersnatch and an external hard drive. The idea is to dump all pypi packages with bandersnatch, transfer them to the server with the hard drive,

@jcberthon
jcberthon / networkmanager-wifi-powersave.md
Last active April 22, 2024 14:33
NetworkManager Wi-Fi powersaving configuration

NetworkManager WiFi Power Saving

NetworkManager supports WiFi powersaving but the function is rather undocumented.

From the source code: wifi.powersave can have the following value:

  • NM_SETTING_WIRELESS_POWERSAVE_DEFAULT (0): use the default value
  • NM_SETTING_WIRELESS_POWERSAVE_IGNORE (1): don't touch existing setting
  • NM_SETTING_WIRELESS_POWERSAVE_DISABLE (2): disable powersave
@quadrismegistus
quadrismegistus / gensim_word2vec_procrustes_align.py
Last active November 16, 2023 01:57
Code for aligning two gensim word2vec models using Procrustes matrix alignment. Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>. [NOTE: This code is DEPRECATED for latest versions of gensim. Please see instead this updated version of the code <https://gist.github.com/zhicongchen/9e23…
def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
"""Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>.
(With help from William. Thank you!)
First, intersect the vocabularies (see `intersection_align_gensim` documentation).
Then do the alignment on the other_embed model.
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
Return other_embed.
@pyguerder
pyguerder / .htaccess
Last active December 3, 2023 23:24
Installation de Flask sur un hébergement mutualisé OVH
Options +ExecCGI
AddHandler cgi-script .cgi
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ flask.cgi/$1 [QSA,L]
@steve-jansen
steve-jansen / README.md
Last active February 23, 2024 22:38
Stop and start Symantec Endpoint Protection on OS X

This script enables you stop and start Symantec Endpoint Protection on OS X

Installation

sudo curl https://gist.githubusercontent.com/steve-jansen/61a189b6ab961a517f68/raw/sep -o /usr/local/bin/sep
sudo chmod 755 /usr/local/bin/sep
sudo chown root:staff /usr/local/bin/sep

Screen Quick Reference

Basic

Description Command
Start a new session with session name screen -S <session_name>
List running sessions / screens screen -ls
Attach to a running session screen -x
Attach to a running session with name screen -r
@KartikTalwar
KartikTalwar / Documentation.md
Last active May 9, 2024 12:59
Rsync over SSH - (40MB/s over 1GB NICs)

The fastest remote directory rsync over ssh archival I can muster (40MB/s over 1gb NICs)

This creates an archive that does the following:

rsync (Everyone seems to like -z, but it is much slower for me)

  • a: archive mode - rescursive, preserves owner, preserves permissions, preserves modification times, preserves group, copies symlinks as symlinks, preserves device files.
  • H: preserves hard-links
  • A: preserves ACLs