jaigouk/recap.md

## recap.md

      
    Raw
  

              recap.md
            
          
  marp
  theme
  _class
  paginate
  backgroundColor
  backgroundImage
  
  
  true
  gaia
  lead
  true
  
  url('https://marp.app/assets/hero-background.svg')
  
  
GTC 2024

The AI Conference from Nvidia

Agenda


Why NVIDIA GTC?(GPU Tech Conference)
What's new from NVIDIA?
NVIDIA Inference Microservices
GTC sessions that are interesting
Tritonserver
Architecting for the New Language Model Stack [S62702]
Summaries for other talks(OpenAI, Together, Job topic)


Why NVIDIA GTC(GPU Tech Conference)?


For AI, what are the options?
Google TPU, Groq LPU, AMD ROCm™

Ollama supports AMD
AMD firmware is not open and tinygrad is struggling
Groq does not provide options to run finetuned LLM. Not agnostic yet.
NVIDIA GPUs hold a dominant position


What's new from NVIDIA?


getting serious about cloud service(https://build.nvidia.com/explore/discover)
$4500 per GPU for 1 year. $1 for 1 hour use
docker & kubernetes: can't ignore NVIDIA for the firmware part for max performance
Benchmarking LLM via Performance Analyzer


NIM(NVIDIA Inference Microservices)

import os
from dotenv import load_dotenv
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings

# Embedding
load_dotenv()
os.environ['NVIDIA_API_KEY'] = os.getenv('NVIDIA_API_KEY')
llm = ChatNVIDIA(model="mixtral_8x7b")
document_embedder = NVIDIAEmbeddings(model="nvolveqa_40k", model_type="passage")
query_embedder = NVIDIAEmbeddings(model="nvolveqa_40k", model_type="query")

# LLM
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_messages(
    [("system", "You are a helpful AI assistant"), ("user", "{input}")]
)
user_input = st.chat_input("Can you tell me what NVIDIA is known for?")
llm = ChatNVIDIA(model="mixtral_8x7b")

chain = prompt_template | llm | StrOutputParser()


18th
19th
20th/21st


GTC 2024 Keynote [S62542]
Generally Capable Agents in Open-Ended Worlds [S62816]
⭐Transforming AI [S63046]


⭐Deploying, Optimizing, and Benchmarking Large Language Models With Triton Inference Server [S62531]
⭐Retrieval Augmented Generation: Overview of Design Systems, Data, and Customization [S62744]
⭐Architecting for the New Language Model Stack [S62702]


Check triton-inference-server tutorials


Triton server

FROM nvcr.io/nvidia/tritonserver:23.10-py3
RUN pip install transformers==4.34.0 protobuf==3.20.3 sentencepiece==0.1.99 accelerate==0.23.0 einops==0.6.1

mkdir -p model_repository
cp -r hermes_2_pro/ model_repository/
docker build -t triton_transformer_server .

docker run --gpus all -it --rm --net=host \
--shm-size=1G --ulimit memlock=-1 \
--ulimit stack=67108864 \
-v ${PWD}/model_repository:/opt/tritonserver/model_repository \
triton_transformer_server tritonserver --model-repository=model_repository

# Notice that we have models path here
curl -X POST localhost:8000/v2/models/hermes_2_pro/infer \
-d '{"inputs": [{"name":"text_input","datatype":"BYTES","shape":[1],"data":["I am going"]}]}'


Language Model is the new stack


Example of Single vs Double loop


Language Model Stack Old vs New


Self reflection with code


Foundation Model for Robots: GR00T


Transforming AI pannel talk

Speakers: Jensen Huang(NVIDIA), Ashish Vaswani(Essential AI), Noam Shazeer(Character AI), Aidan Gomez(Cohere), etc

LLM allows software to understand and generate images based on textual prompts, marking the beginning of a new Industrial Revolution.
Future: adaptive computation. universal transformers. with reasoning capability, then we don't need lots of data. then the quality of data matters.
evals. measuring progress. observing the finished task matters.


OpenAI

Speaker: Brad Ligtcap, Chief Operating Officer, OpenAI

Start small, then tackle bigger issues
Use various sizes of language models for different tasks
Monitor and swap models and agents as needed
Aim for reasoning agents for complex actions
Example: AI for patient care - from data to treatment
Adapt interfaces for changing user interactions


Together

Speaker: Percy Liang, Co-Founder, Together AI

we can think about foundation models as infrastructure.
The Center for Research on Foundation Models(CRFM)

site: https://crfm.stanford.edu/blog.html
HELM(holistic framework for evaluating foundation models): https://crfm.stanford.edu/helm/lite/latest/


anything new? KTO(Knowledge Transfer Optimization) training


Navigating AI Careers in Europe [SE62721]

AI is democratizing tech. You don't need to know the details of llm.
everything will have some AI in it. dev jobs are not secure.
all employees should be upscaled for AI wheter they are technical or not. latest "skills". if they are outdated then companies are also outdated soon. it will cripple the company's performance compare to other companies.
Reguarding to the coding. we're not there yet for AGI. the way coding is done is changing. the role will not be the same. more like orchestration. validation of biz requirement might be still matters.
18th	19th	20th/21st
GTC 2024 Keynote [S62542]	Generally Capable Agents in Open-Ended Worlds [S62816]	⭐Transforming AI [S63046]
⭐Deploying, Optimizing, and Benchmarking Large Language Models With Triton Inference Server [S62531]	⭐Retrieval Augmented Generation: Overview of Design Systems, Data, and Customization [S62744]	⭐Architecting for the New Language Model Stack [S62702]