yuinchien/glossary.md

## glossary.md

      
    Raw
  

              glossary.md
            
          
TERM
DEFINITION
SOURCE
LINK


AGI. Artificial General Intelligence
An AGI could learn to accomplish any intellectual task that human beings or animals can perform. Alternatively, AGI has been defined as an autonomous system that surpasses human capabilities in the majority of economically valuable tasks.
Source


Adversarial suffix
A string of random seeming characters, to a prompt that makes the LLM significantly more likely to return an unfiltered response.
Source
Demo


AI. Artificial Intelligence
Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of humans or animals. It is a field of study in computer science which develops and studies intelligent machines.
Source


AI Safty
An interdisciplinary field concerned with preventing accidents, misuse, or other harmful consequences that could result from artificial intelligence (AI) systems. It encompasses machine ethics and AI alignment, which aim to make AI systems moral and beneficial, and AI safety encompasses technical problems including monitoring systems for risks and making them highly reliable. Beyond AI research, it involves developing norms and policies that promote safety.
Source


Attention
A mechanism used in a neural network that indicates the importance of a particular word or part of a word. Attention compresses the amount of information a model needs to predict the next token/word. A typical attention mechanism might consist of a weighted sum over a set of inputs, where the weight for each input is computed by another part of the neural network.
Source


Alignment
AI alignment research aims to steer AI systems towards humans' intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues some objectives, but not the intended ones.
Source


Prompt Injection Attack
Using carefully crafted prompts that make the model ignore previous instructions or perform unintended actions.
Source


Backpropagation
A crucial step in a common method used to iteratively train a neural network model. It is used to calculate the necessary parameter adjustments, to gradually minimize error.
Source


Bias
An idea that machine learning algorithms can be biased when carrying out their programmed tasks, like analyzing data or producing content). AI is typically biased in ways that uphold harmful beliefs, like race and gender stereotypes.
Source


Context Window
The “context window” refers to how much text a language model can look back on and reference, when attempting to generate text. This is different from the large corpus of data the language model the was trained on, and instead represents more of a “working memory” for the model.
Source


Data poisoing
An Artificial Intelligence poisoning attack occurs when an AI model's training data is intentionally tampered with, affecting the outcomes of the model's decision-making processes. Despite the black-box nature of AI models, these attacks seek to deceive the AI system into making incorrect or harmful decisions.
Source


Deep Learning
A method in artificial intelligence (AI) that teaches computers to process data in a way that is inspired by the human brain. Deep learning models can recognize complex patterns in pictures, text, sounds, and other data to produce accurate insights and predictions.
Source


Dictionary Learning
Dictionary learning is a way to find a better sparse mapping matrix by the use of training data.
Source


Feature Engineering
The process of using domain knowledge to select and transform the most relevant variables from raw data when creating a predictive model using machine learning or statistical modeling. The goal of feature engineering and selection is to improve the performance of machine learning (ML) algorithms.
Source


Generative AI
Generative AI enables users to quickly generate new content based on a variety of inputs. Inputs and outputs to these models can include text, images, sounds, animation, 3D models, or other types of data.
Source


GAN. Generative Adversarial Network
A generative adversarial network (GAN) has two parts: The generator learns to generate plausible data. The generated instances become negative training examples for the discriminator. The discriminator learns to distinguish the generator's fake data from real data. The discriminator penalizes the generator for producing implausible results. When training begins, the generator produces obviously fake data, and the discriminator quickly learns to tell that it's fake.
Source


Hallucination
AI hallucinations are incorrect or misleading results that AI models generate. These errors can be caused by a variety of factors, including insufficient training data, incorrect assumptions made by the model, or biases in the data used to train the model. AI hallucinations can be a problem for AI systems that are used to make important decisions, such as medical diagnoses or financial trading.
Source


Interpretability
Models are interpretable when humans can readily understand the reasoning behind predictions and decisions made by the model. The more interpretable the models are, the easier it is for someone to comprehend and trust the model. Models such as deep learning and gradient boosting are not interpretable and are referred to as black-box models because they are too complex for human understanding. It is impossible for a human to comprehend the entire model at once and understand the reasoning behind each decision.
Source


LLM. Large Language Model
A large language model (LLM) is a type of artificial intelligence (AI) algorithm that uses deep learning techniques and massively large data sets to understand, summarize, generate and predict new content.
Source


Machine Learning
The study of computer algorithms that improve automatically through experience and by the use of data. Key concepts include supervised, unsupervised, and reinforcement learning.
Source


Multimodal
Multimodal AI is artificial intelligence that combines multiple types, or modes, of data to create more accurate determinations, draw insightful conclusions or make more precise predictions about real-world problems. Multimodal AI systems train with and use video, audio, speech, images, text and a range of traditional numerical data sets. Most importantly, multimodal AI means numerous data types are used in tandem to help AI establish content and better interpret context, something missing in earlier AI.
Source


Neural Networks
Neural networks (NNs) or neural nets are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains.
Source


NLP. Natural Language Processing
is a machine learning technology that gives computers the ability to interpret, manipulate, and comprehend human language.
Source


Prompt Engineering
The process of structuring text that can be interpreted and understood by a generative AI model.[1][2] A prompt is natural language text describing the task that an AI should perform.
Source


Pre-training
The initial phase of training a machine learning model where the model learns general features, patterns, and representations from the data without specific knowledge of the task it will later be applied to. This unsupervised or semi-supervised learning process enables the model to develop a foundational understanding of the underlying data distribution and extract meaningful features that can be leveraged for subsequent fine-tuning on specific tasks.
Source


RAG. Retrieval-Augmented Generation
A technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.
Source


RLHF. Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback is a means to take a pretrained language model, and encourage it to behave in ways that are consistent with with humans prefer. This can include “helping it to follow instructions” or “helping it to act more like a chat bot”. The human feedback consists of a human-ranking set of two or more examples text, and the reinforcement learning encourages the model learns to prefer outputs that are similar to the higher-ranked ones.
Source


Singularity
In the context of AI, the singularity (also known as the technological singularity) refers to a hypothetical future point in time when technological growth becomes uncontrollable and irreversible, leading to unforeseeable changes to human civilization.
Source


Transformer
A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence.
Source


Temperature
Temperature is a parameter that controls the randomness of a model's predictions during generation. Higher temperature leads to more creative samples that enable multiple variations in phrasing (and in the case of fiction, variation in answers as well), while lower temperature leads to more conservative samples that stick to the most-probable phrasing and answer. Adjusting the temperature is a way to encourage a language model to explore rare, uncommon, or surprising next words or sequences, rather than only selecting the most likely predictions.
Source


Token
In the context of AI, tokens are the basic units of text or code that AI models use to process and generate language. These tokens can be characters, words, subwords, or other segments of text or code, depending on the chosen tokenization method or scheme.
Source
TERM	DEFINITION	SOURCE	LINK
AGI. Artificial General Intelligence	An AGI could learn to accomplish any intellectual task that human beings or animals can perform. Alternatively, AGI has been defined as an autonomous system that surpasses human capabilities in the majority of economically valuable tasks.	Source
Adversarial suffix	A string of random seeming characters, to a prompt that makes the LLM significantly more likely to return an unfiltered response.	Source	Demo
AI. Artificial Intelligence	Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of humans or animals. It is a field of study in computer science which develops and studies intelligent machines.	Source
AI Safty	An interdisciplinary field concerned with preventing accidents, misuse, or other harmful consequences that could result from artificial intelligence (AI) systems. It encompasses machine ethics and AI alignment, which aim to make AI systems moral and beneficial, and AI safety encompasses technical problems including monitoring systems for risks and making them highly reliable. Beyond AI research, it involves developing norms and policies that promote safety.	Source
Attention	A mechanism used in a neural network that indicates the importance of a particular word or part of a word. Attention compresses the amount of information a model needs to predict the next token/word. A typical attention mechanism might consist of a weighted sum over a set of inputs, where the weight for each input is computed by another part of the neural network.	Source
Alignment	AI alignment research aims to steer AI systems towards humans' intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues some objectives, but not the intended ones.	Source
Prompt Injection Attack	Using carefully crafted prompts that make the model ignore previous instructions or perform unintended actions.	Source
Backpropagation	A crucial step in a common method used to iteratively train a neural network model. It is used to calculate the necessary parameter adjustments, to gradually minimize error.	Source
Bias	An idea that machine learning algorithms can be biased when carrying out their programmed tasks, like analyzing data or producing content). AI is typically biased in ways that uphold harmful beliefs, like race and gender stereotypes.	Source
Context Window	The “context window” refers to how much text a language model can look back on and reference, when attempting to generate text. This is different from the large corpus of data the language model the was trained on, and instead represents more of a “working memory” for the model.	Source
Data poisoing	An Artificial Intelligence poisoning attack occurs when an AI model's training data is intentionally tampered with, affecting the outcomes of the model's decision-making processes. Despite the black-box nature of AI models, these attacks seek to deceive the AI system into making incorrect or harmful decisions.	Source
Deep Learning	A method in artificial intelligence (AI) that teaches computers to process data in a way that is inspired by the human brain. Deep learning models can recognize complex patterns in pictures, text, sounds, and other data to produce accurate insights and predictions.	Source
Dictionary Learning	Dictionary learning is a way to find a better sparse mapping matrix by the use of training data.	Source
Feature Engineering	The process of using domain knowledge to select and transform the most relevant variables from raw data when creating a predictive model using machine learning or statistical modeling. The goal of feature engineering and selection is to improve the performance of machine learning (ML) algorithms.	Source
Generative AI	Generative AI enables users to quickly generate new content based on a variety of inputs. Inputs and outputs to these models can include text, images, sounds, animation, 3D models, or other types of data.	Source
GAN. Generative Adversarial Network	A generative adversarial network (GAN) has two parts: The generator learns to generate plausible data. The generated instances become negative training examples for the discriminator. The discriminator learns to distinguish the generator's fake data from real data. The discriminator penalizes the generator for producing implausible results. When training begins, the generator produces obviously fake data, and the discriminator quickly learns to tell that it's fake.	Source
Hallucination	AI hallucinations are incorrect or misleading results that AI models generate. These errors can be caused by a variety of factors, including insufficient training data, incorrect assumptions made by the model, or biases in the data used to train the model. AI hallucinations can be a problem for AI systems that are used to make important decisions, such as medical diagnoses or financial trading.	Source
Interpretability	Models are interpretable when humans can readily understand the reasoning behind predictions and decisions made by the model. The more interpretable the models are, the easier it is for someone to comprehend and trust the model. Models such as deep learning and gradient boosting are not interpretable and are referred to as black-box models because they are too complex for human understanding. It is impossible for a human to comprehend the entire model at once and understand the reasoning behind each decision.	Source
LLM. Large Language Model	A large language model (LLM) is a type of artificial intelligence (AI) algorithm that uses deep learning techniques and massively large data sets to understand, summarize, generate and predict new content.	Source
Machine Learning	The study of computer algorithms that improve automatically through experience and by the use of data. Key concepts include supervised, unsupervised, and reinforcement learning.	Source
Multimodal	Multimodal AI is artificial intelligence that combines multiple types, or modes, of data to create more accurate determinations, draw insightful conclusions or make more precise predictions about real-world problems. Multimodal AI systems train with and use video, audio, speech, images, text and a range of traditional numerical data sets. Most importantly, multimodal AI means numerous data types are used in tandem to help AI establish content and better interpret context, something missing in earlier AI.	Source
Neural Networks	Neural networks (NNs) or neural nets are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains.	Source
NLP. Natural Language Processing	is a machine learning technology that gives computers the ability to interpret, manipulate, and comprehend human language.	Source
Prompt Engineering	The process of structuring text that can be interpreted and understood by a generative AI model.[1][2] A prompt is natural language text describing the task that an AI should perform.	Source
Pre-training	The initial phase of training a machine learning model where the model learns general features, patterns, and representations from the data without specific knowledge of the task it will later be applied to. This unsupervised or semi-supervised learning process enables the model to develop a foundational understanding of the underlying data distribution and extract meaningful features that can be leveraged for subsequent fine-tuning on specific tasks.	Source
RAG. Retrieval-Augmented Generation	A technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.	Source
RLHF. Reinforcement Learning from Human Feedback	Reinforcement Learning from Human Feedback is a means to take a pretrained language model, and encourage it to behave in ways that are consistent with with humans prefer. This can include “helping it to follow instructions” or “helping it to act more like a chat bot”. The human feedback consists of a human-ranking set of two or more examples text, and the reinforcement learning encourages the model learns to prefer outputs that are similar to the higher-ranked ones.	Source
Singularity	In the context of AI, the singularity (also known as the technological singularity) refers to a hypothetical future point in time when technological growth becomes uncontrollable and irreversible, leading to unforeseeable changes to human civilization.	Source
Transformer	A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence.	Source
Temperature	Temperature is a parameter that controls the randomness of a model's predictions during generation. Higher temperature leads to more creative samples that enable multiple variations in phrasing (and in the case of fiction, variation in answers as well), while lower temperature leads to more conservative samples that stick to the most-probable phrasing and answer. Adjusting the temperature is a way to encourage a language model to explore rare, uncommon, or surprising next words or sequences, rather than only selecting the most likely predictions.	Source
Token	In the context of AI, tokens are the basic units of text or code that AI models use to process and generate language. These tokens can be characters, words, subwords, or other segments of text or code, depending on the chosen tokenization method or scheme.	Source