Skip to content

Instantly share code, notes, and snippets.

@jaigouk
Last active March 23, 2024 21:30
Show Gist options
  • Save jaigouk/15c5cda46d4b16846d948162db14d69f to your computer and use it in GitHub Desktop.
Save jaigouk/15c5cda46d4b16846d948162db14d69f to your computer and use it in GitHub Desktop.
Recap GTC 2024
marp theme _class paginate backgroundColor backgroundImage
true
gaia
lead
true

bg left:40% 80% invert

GTC 2024

The AI Conference from Nvidia


Agenda

  1. Why NVIDIA GTC?(GPU Tech Conference)
  2. What's new from NVIDIA?
  3. NVIDIA Inference Microservices
  4. GTC sessions that are interesting
  5. Tritonserver
  6. Architecting for the New Language Model Stack [S62702]
  7. Summaries for other talks(OpenAI, Together, Job topic)

Why NVIDIA GTC(GPU Tech Conference)?

bg left:30% 90%

For AI, what are the options? Google TPU, Groq LPU, AMD ROCm™


bg 100%


What's new from NVIDIA?

bg right:35% 90%


NIM(NVIDIA Inference Microservices)

import os
from dotenv import load_dotenv
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings

# Embedding
load_dotenv()
os.environ['NVIDIA_API_KEY'] = os.getenv('NVIDIA_API_KEY')
llm = ChatNVIDIA(model="mixtral_8x7b")
document_embedder = NVIDIAEmbeddings(model="nvolveqa_40k", model_type="passage")
query_embedder = NVIDIAEmbeddings(model="nvolveqa_40k", model_type="query")

# LLM
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_messages(
    [("system", "You are a helpful AI assistant"), ("user", "{input}")]
)
user_input = st.chat_input("Can you tell me what NVIDIA is known for?")
llm = ChatNVIDIA(model="mixtral_8x7b")

chain = prompt_template | llm | StrOutputParser()

18th 19th 20th/21st
GTC 2024 Keynote [S62542] Generally Capable Agents in Open-Ended Worlds [S62816] Transforming AI [S63046]
Deploying, Optimizing, and Benchmarking Large Language Models With Triton Inference Server [S62531] Retrieval Augmented Generation: Overview of Design Systems, Data, and Customization [S62744] Architecting for the New Language Model Stack [S62702]

Check triton-inference-server tutorials

tritonserver2
triton_features
no-vendor-lockin

Triton server

FROM nvcr.io/nvidia/tritonserver:23.10-py3
RUN pip install transformers==4.34.0 protobuf==3.20.3 sentencepiece==0.1.99 accelerate==0.23.0 einops==0.6.1
mkdir -p model_repository
cp -r hermes_2_pro/ model_repository/
docker build -t triton_transformer_server .

docker run --gpus all -it --rm --net=host \
--shm-size=1G --ulimit memlock=-1 \
--ulimit stack=67108864 \
-v ${PWD}/model_repository:/opt/tritonserver/model_repository \
triton_transformer_server tritonserver --model-repository=model_repository

# Notice that we have models path here
curl -X POST localhost:8000/v2/models/hermes_2_pro/infer \
-d '{"inputs": [{"name":"text_input","datatype":"BYTES","shape":[1],"data":["I am going"]}]}'

software2
single_vs_double_1

Example of Single vs Double loop

single_vs_double_2

Language Model Stack Old vs New

mlstack_old_vs_new
groot-coding groot-0

Foundation Model for Robots: GR00T

groot-1 groot-2

Speakers: Jensen Huang(NVIDIA), Ashish Vaswani(Essential AI), Noam Shazeer(Character AI), Aidan Gomez(Cohere), etc

  • LLM allows software to understand and generate images based on textual prompts, marking the beginning of a new Industrial Revolution.
  • Future: adaptive computation. universal transformers. with reasoning capability, then we don't need lots of data. then the quality of data matters.
  • evals. measuring progress. observing the finished task matters.

OpenAI

Speaker: Brad Ligtcap, Chief Operating Officer, OpenAI

  • Start small, then tackle bigger issues
  • Use various sizes of language models for different tasks
  • Monitor and swap models and agents as needed
  • Aim for reasoning agents for complex actions
  • Example: AI for patient care - from data to treatment
  • Adapt interfaces for changing user interactions

Together

Speaker: Percy Liang, Co-Founder, Together AI


Navigating AI Careers in Europe [SE62721]

  • AI is democratizing tech. You don't need to know the details of llm.
  • everything will have some AI in it. dev jobs are not secure.
  • all employees should be upscaled for AI wheter they are technical or not. latest "skills". if they are outdated then companies are also outdated soon. it will cripple the company's performance compare to other companies.
  • Reguarding to the coding. we're not there yet for AGI. the way coding is done is changing. the role will not be the same. more like orchestration. validation of biz requirement might be still matters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment