Skip to content

Instantly share code, notes, and snippets.

View shahbazsyed's full-sized avatar

Shahbaz Syed shahbazsyed

  • NEC Laboratories Europe
  • Heidelberg, Germany
  • 17:18 (UTC -12:00)
View GitHub Profile
@shahbazsyed
shahbazsyed / apple-silicon.preset.json
Created February 21, 2024 20:32 — forked from ingridstevens/apple-silicon.preset.json
Apple Metal (GPU Acceleration on) + use_mlock = off
{
"name": "Apple Silicon",
"load_params": {
"n_ctx": 2048,
"n_batch": 512,
"rope_freq_base": 10000,
"rope_freq_scale": 1,
"n_gpu_layers": 1,
"use_mlock": false,
"main_gpu": 0,

Notes on Automatic Summarizing: factors and directions

This position paper outlines the various context factors to be considered in order to develop effective methods for summarization and its evaluation. A key argument is that we cannot develop useful summarization systems unless we pay close attention to both the context (where summarization is applied), and the purpose (why is it done).

The paper analyses three key factors: (1) the input to the summarization model, (2) the purpose of the output summaries, and (3) the output format of the summaries.

What is a summary?

A summary is loosely defined as a reductive transformation of source text through content reduction by selection and/or generalization on what is important in the source. A possible three-step model to achieve this can be:

  • I : source text interpretation (to source text representation)
  • T : source representation transformation (to summary text represe

Notes on Summarizing Information

This book by Brigitte Endres-Niggemeyer (1998) details the concept of summarizing information, its connection to cognitive pyschology, how professionals summarize information, and some computational approaches to automatic summarization.

Communication and Cognition

At its core, summarizing is the process of reducing textual information to its most essential parts. It is a situationally and communicatively bound cognitive task where three principal components of human communication are employed: the storage of knowledge in memory, understanding or learning knowledge from the environment, and the generation of utterances (imparting the learnt knowledge).

Communication is tied to the principal of relevance, i.e., one communication partner expects the statements of the other to influence their cognitive state in the current situation. This forms the communicative function of a discourse. Frequent functions are to

@shahbazsyed
shahbazsyed / An overview of multi-task learning in NLP.md
Created May 8, 2023 12:20
Notes on multi-task learning for NLP

Notes on Multi-task learning for NLP

Multi-task learning (MTL) tackles the overfitting and data scarcity problems of deep learning methods by introducing useful information from related tasks to achieve simultaneous performance improvement on multiple related tasks.

MTL trains machine learning models from multiple related tasks simultaneously or enhances the model for a specific task using auxiliary tasks. Learning from multiple tasks makes it possible for learning models to capture generalized and complementary knowledge from the tasks at hand besides task-specific features. MTL architectures used in NLP tasks are categorized into four classes: the parallel, hierarchical, modular, and generative adversarial architecture.

The parallel architecture shares the bulk of the model among multiple tasks while each task has its own task-specific output layer. The hierarchical architecture models the hierarchical relationships between tasks. Such architecture can hierarchically combine features from differe

@shahbazsyed
shahbazsyed / SOTA in Summarization according to HELM benchmark.md
Last active May 8, 2023 12:17
Notes on SOTA in Summarization according to HELM benchmark

SOTA in Summarization according to the HELM benchmark

Listed here are some key points relevant to the task of text summarization by large language models and their evaluation as per the HELM benchmark.

Problem setting

Text summarization is formulated as an unstructured sequence-to-sequence problem, where a document is the input and the LM is tasked with generating a summary resembling the reference summary.

Automatic Evaluation

  • ROUGE-2 correlated with more accurate models, especially a strong correlation was found with model size.
  • Relationship between model quality and abstraction was very variable.
@shahbazsyed
shahbazsyed / Argument and Argumentation Theory.md
Last active May 8, 2023 12:17
Notes from the SEP article on Argument and Argumentation Theory

Argument and Argumentation Theory

Terminology

  1. An argument can be defined as a complex symbolic structure where some parts, known as the premises, offer support to another part, the conclusion.
  2. The relation of support between premises and conclusion can be cashed out in different ways: the premises may guarantee the truth of the conclusion, or make its truth more probable; the premises may imply the conclusion; the premises may make the conclusion more acceptable (or assertible).
  3. Argumentation is the exchange of arguments.
  4. The study of arguments and argumentation is also closely connected to the study of reasoning, understood as the process of reaching conclusions on the basis of careful, reflective consideration of the available information, i.e., by an examination of reasons.

Types of Arguments

@shahbazsyed
shahbazsyed / LLM Zoo.md
Last active May 8, 2023 09:36
Ongoing list of LLMs
@shahbazsyed
shahbazsyed / RLHF.md
Last active May 8, 2023 12:18
Running notes on reinforcement learning from human feedback

Notes on RLHF

Three stages of training a LLM

  1. Pretraining: a LLM is pretrained on indiscriminate web data
  2. Supervised finetuning (SFT): the pretrained language model (PLM) is then finetuned on higher quality data
  3. RLHF: finetuned model is further polished using RLHF to make it appropriate for the broad audience

Pretraining is the most resource-intensive phase; SFT and RLHF can be seen as unlocking the existing capabilities of the pretrained models that are hard for users to do via prompting alone.

There are two types of data required besides the scraped web data used for pretraining:

@shahbazsyed
shahbazsyed / LLM.md
Created March 29, 2023 10:34 — forked from rain-1/LLM.md
LLM Introduction: Learn Language Models

Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.

Avoid being a link dump. Try to provide only valuable well tuned information.

Prelude

Neural network links before starting with transformers.

source ~/miniconda3/bin/activate allen
LANG=en
TASK=qa_en_small
for SPLIT in train valid
do
python -m examples.roberta.multiprocessing_bpe_encoder \
--encoder-json encoder.json \
--vocab-bpe vocab.bpe \
--inputs "$TASK/$SPLIT.$LANG" \