Skip to content

Instantly share code, notes, and snippets.

@lsetiawan
Created March 29, 2024 20:50
Show Gist options
  • Save lsetiawan/ed5d45f037970515e75db0add987a1aa to your computer and use it in GitHub Desktop.
Save lsetiawan/ed5d45f037970515e75db0add987a1aa to your computer and use it in GitHub Desktop.
A demo of exploration with arXiv dataset, RAG Based Approach, Qdrant, MiniLM embeddings, and OLMo, connected together with LangChain
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Simple RAG Demo with OLMo and ArXiv `astro-ph` Dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The RAG based approach below is a demonstration of how to use [OLMo-1B](https://huggingface.co/allenai/OLMo-1B) LLM model by AI2 to generate an abstract completion for a given input text. The input text is a random starting abstract from `astro-ph` category of [ArXiv Dataset](https://www.kaggle.com/datasets/Cornell-University/arxiv). The abstract completion is generated by the model using the RAG approach. The RAG approach retrieves relevant documents from [Qdrant Vector Database](https://qdrant.tech/), which provides contextual information to the model for generating the completion.\n",
"\n",
"The input text was retrieved from the [AstroLLaMa Paper](https://arxiv.org/abs/2309.06126). Rather than fine-tuning a model, we wanted to see if RAG approach can also work."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will use the following statement as user input:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"statement = \"\"\"The Magellanic Stream (MS) - an enormous ribbon of gas spanning 140∘ of the southern\n",
"sky trailing the Magellanic Clouds - has been exquisitely mapped in the five decades since\n",
"its discovery. However, despite concerted efforts, no stellar counterpart to the MS has been\n",
"conclusively identified. This stellar stream would reveal the distance and 6D kinematics of\n",
"the MS, constraining its formation and the past orbital history of the Clouds. We\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Utility Functions"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import zipfile\n",
"import json\n",
"import pandas as pd\n",
"import io\n",
"import fsspec\n",
"\n",
"def fetch_arxiv_dataset(zip_url: str) -> pd.DataFrame:\n",
" cols = ['id', 'title', 'abstract', 'categories']\n",
"\n",
" with fsspec.open(zip_url) as f:\n",
" with zipfile.ZipFile(f) as archive:\n",
" data = []\n",
" json_file = archive.filelist[0]\n",
" with archive.open(json_file) as f:\n",
" for line in io.TextIOWrapper(f, encoding=\"latin-1\"):\n",
" doc = json.loads(line)\n",
" lst = [doc['id'], doc['title'], doc['abstract'], doc['categories']]\n",
" data.append(lst)\n",
" \n",
" df_data = pd.DataFrame(data=data, columns=cols)\n",
" return df_data\n",
"\n",
"# https://github.com/allenai/open-instruct/blob/main/eval/templates.py\n",
"def create_prompt_with_olmo_chat_format(messages, bos=\"|||IP_ADDRESS|||\", eos=\"|||IP_ADDRESS|||\", add_bos=True):\n",
" formatted_text = \"\"\n",
" for message in messages:\n",
" if message[\"role\"] == \"system\":\n",
" formatted_text += \"<|system|>\\n\" + message[\"content\"] + \"\\n\"\n",
" elif message[\"role\"] == \"user\":\n",
" formatted_text += \"<|user|>\\n\" + message[\"content\"] + \"\\n\"\n",
" elif message[\"role\"] == \"assistant\":\n",
" formatted_text += \"<|assistant|>\\n\" + message[\"content\"].strip() + eos + \"\\n\"\n",
" else:\n",
" raise ValueError(\n",
" \"Olmo chat template only supports 'system', 'user' and 'assistant' roles. Invalid role: {}.\".format(message[\"role\"])\n",
" )\n",
" formatted_text += \"<|assistant|>\\n\"\n",
" formatted_text = bos + formatted_text # forcibly add bos\n",
" return formatted_text\n",
"\n",
"# Post-processing\n",
"def format_docs(docs):\n",
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve documents (arXiv `astro-ph` abstracts)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This section retrieves the arXiv abstracts and creates documents\n",
"for loading into a vector database. You can skip running the following sections\n",
"if you have a local copy of the Qdrant Vector Database data ready to go."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders import DataFrameLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"zip_url = \"https://storage.googleapis.com/kaggle-data-sets/612177/7925852/bundle/archive.zip?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com@kaggle-161607.iam.gserviceaccount.com/20240327/auto/storage/goog4_request&X-Goog-Date=20240327T183523Z&X-Goog-Expires=259200&X-Goog-SignedHeaders=host&X-Goog-Signature=4747ce35edc693785c00b4ade2fc7f62149173bf160f1b04f97fc6a752bfb1ccb5408359a16b475e7d955f04a52f2fb9f916d8090330993839fabfb1835847e0c62452243ecc74e232eeed1d747beaf6da1209b9614d305c020e6bd09bb096e6c6e2bb4711d96fb457ed1533c04bb78690253d3b6f4a4068aa3b9cd073742a3ed68562fa2a88a29e646a629dee0a26f99ff0539b5f81c926bc2b5a62642ac9f0a92febc7ca812a61351191334baad93b3ecca2ac408da8ca35a4d6e8afda67d6e8196b50c20ee18358a19cb21c25dfbcc7394bc99b280ed9222c8a933ea91f7d4b65aba05156ab985b36e761a70a35f6bbd208b9507a04ff68e15c258ec5920f\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Fetch the dataset containing all arXiv abstracts\n",
"df_data = fetch_arxiv_dataset(zip_url)\n",
"# Filter the dataset to only include astro-ph category\n",
"astro_df = df_data[df_data.categories.str.contains('astro-ph')].reset_index(drop=True)\n",
"print(\"Number of astro-ph papers: \", len(astro_df))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Eargerly load the dataframe full of abstracts\n",
"# to memory in the form of langchain Document objects\n",
"loader = DataFrameLoader(astro_df, page_content_column=\"abstract\")\n",
"documents = loader.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Document Embeddings to Qdrant Vector Database"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.vectorstores import Qdrant\n",
"from langchain.embeddings import HuggingFaceEmbeddings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Setup Vector DB"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import os"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"qdrant_path=\"./local_qdrant\"\n",
"qdrant_collection=\"arxiv_astro-ph_abstracts\""
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Setup the embedding, we are using the MiniLM model here\n",
"embedding = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-MiniLM-L6-v2\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loading existing Qdrant collection 'arxiv_astro-ph_abstracts'\n"
]
}
],
"source": [
"if os.path.exists(qdrant_path):\n",
" print(f\"Loading existing Qdrant collection '{qdrant_collection}'\")\n",
" from qdrant_client import QdrantClient\n",
" # If the Qdrant Vector Database Collection already exists, load it\n",
" client = QdrantClient(path=qdrant_path)\n",
" qdrant = Qdrant(\n",
" client=client,\n",
" collection_name=qdrant_collection,\n",
" embeddings=embedding\n",
" )\n",
"else:\n",
" print(f\"Creating new Qdrant collection '{qdrant_collection}' from {len(documents)} documents\")\n",
" \n",
" # Load the documents into a Qdrant Vector Database Collection\n",
" # this will save locally in the current directory as sqlite\n",
" qdrant = Qdrant.from_documents(\n",
" documents,\n",
" embedding,\n",
" path=qdrant_path,\n",
" collection_name=qdrant_collection,\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Test out the Qdrant collection"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# Setup the retriever for later step\n",
"retriever = qdrant.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 2})"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# Test out the statement retrieval\n",
"found_docs = retriever.get_relevant_documents(statement)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" We explore the Magellanic Stream (MS) using a Gaussian decomposition of the\n",
"HI velocity profiles in the Leiden-Argentine-Bonn (LAB) all-sky HI survey. This\n",
"decomposition exposes the MS to be composed of two filaments distinct both\n",
"spatially (as first pointed out by Putman et al.) and in velocity. Using the\n",
"velocity coherence of the filaments, one can be traced back to its origin in\n",
"what we identify as the SouthEast HI Overdensity (SEHO) of the Large Magellanic\n",
"Cloud (LMC), which includes 30 Doradus. Parts of the Leading Arm (LA) can also\n",
"be traced back to the SEHO in velocity and position. Therefore, at least\n",
"one-half of the trailing Stream and most of the LA originates in the LMC,\n",
"contrary to previous assertions that both the MS and the LA originate in the\n",
"Small Magellanic Cloud (SMC) and/or in the Magellanic Bridge. The two MS\n",
"filaments show strong periodic, undulating spatial and velocity patterns that\n",
"we speculate are an imprint of the LMC rotation curve. If true, then the drift\n",
"rate of the Stream gas away from the Magellanic Clouds is ~49 km/s and the age\n",
"of the MS is ~1.74 Gyr. The Staveley-Smith et al. high-resolution HI data of\n",
"the LMC show gas outflows from supergiant shells in the SEHO that seem to be\n",
"creating the LA and LMC filament of the MS. Blowout of LMC gas is an effect not\n",
"previously accounted for but one that probably plays an important role in\n",
"creating the MS and LA.\n",
"\n",
"\n",
" The Magellanic Stream (MS) - an enormous ribbon of gas spanning $140^\\circ$\n",
"of the southern sky trailing the Magellanic Clouds - has been exquisitely\n",
"mapped in the five decades since its discovery. However, despite concerted\n",
"efforts, no stellar counterpart to the MS has been conclusively identified.\n",
"This stellar stream would reveal the distance and 6D kinematics of the MS,\n",
"constraining its formation and the past orbital history of the Clouds. We have\n",
"been conducting a spectroscopic survey of the most distant and luminous red\n",
"giant stars in the Galactic outskirts. From this dataset, we have discovered a\n",
"prominent population of 13 stars matching the extreme angular momentum of the\n",
"Clouds, spanning up to $100^\\circ$ along the MS at distances of $60-120$ kpc.\n",
"Furthermore, these kinemetically-selected stars lie along a\n",
"[$\\alpha$/Fe]-deficient track in chemical space from $-2.5 < \\mathrm{[Fe/H]} <\n",
"-0.5$, consistent with their formation in the Clouds themselves. We identify\n",
"these stars as high-confidence members of the Magellanic Stellar Stream. Half\n",
"of these stars are metal-rich and closely follow the gaseous MS, whereas the\n",
"other half are more scattered and metal-poor. We argue that the metal-rich\n",
"stream is the recently-formed tidal counterpart to the MS, and speculate that\n",
"the metal-poor population was thrown out of the SMC outskirts during an earlier\n",
"interaction between the Clouds. The Magellanic Stellar Stream provides a strong\n",
"set of constraints - distances, 6D kinematics, and birth locations - that will\n",
"guide future simulations towards unveiling the detailed history of the Clouds.\n",
"\n"
]
}
],
"source": [
"print(format_docs(found_docs))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup OLMo Model"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"from pathlib import Path\n",
"from huggingface_hub import snapshot_download\n",
"from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"model_name = \"allenai/OLMo-1B\""
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4d125ea487734a70a1e103d381a2c91c",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Fetching 12 files: 0%| | 0/12 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Download the model and its configuration file locally\n",
"# from the Hugging Face Hub\n",
"# we will only download the configuration file and the model as safetensors file\n",
"local_dir = Path(\"../OLMo-1B\")\n",
"model_path = snapshot_download(\n",
" repo_id=model_name,\n",
" ignore_patterns=[\"*.bin\"],\n",
" local_dir=local_dir,\n",
" local_dir_use_symlinks=True)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"olmo = AutoModelForCausalLM.from_pretrained(\n",
" model_path,\n",
" trust_remote_code=True,\n",
" local_files_only=True\n",
")\n",
"tokenizer = AutoTokenizer.from_pretrained(model_path)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"# Setup the text generation pipeline with the OLMo model\n",
"olmo_pipe = pipeline(\n",
" task=\"text-generation\",\n",
" model=olmo,\n",
" tokenizer=tokenizer,\n",
" temperature=0.2,\n",
" do_sample=True,\n",
" repetition_penalty=1.1,\n",
" return_full_text=True,\n",
" max_new_tokens=400,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup the langchain pipeline for the OLMo model"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import HuggingFacePipeline"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"llm = HuggingFacePipeline(pipeline=olmo_pipe)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Define the system prompts"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"no_context_prompt = PromptTemplate(\n",
" input_variables=[\"question\"],\n",
" template=create_prompt_with_olmo_chat_format(messages=[\n",
" {\"role\": \"system\", \"content\": \"You are an astrophysics expert. Finish the given statement.\"}, \n",
" {\"role\": \"user\", \"content\": \"{question}\"}\n",
" ]),\n",
")\n",
"\n",
"with_context_prompt = PromptTemplate(\n",
" input_variables=[\"context\", \"question\"],\n",
" template=create_prompt_with_olmo_chat_format(messages=[\n",
" {\"role\": \"system\", \"content\": \"You are an astrophysics expert. Use the following pieces of retrieved context to finish the given statement:\\n{context}\"}, \n",
" {\"role\": \"user\", \"content\": \"{question}\"}\n",
" ]),\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Define the chain of processes for the LLM"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.runnables import RunnablePassthrough"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"llm_chain = llm | StrOutputParser()\n",
"no_context_chain = {\"question\": RunnablePassthrough()} | no_context_prompt | llm_chain\n",
"rag_chain = {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()} | with_context_prompt | llm_chain"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Invoke the no-context pipeline"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"no_context_answer = no_context_chain.invoke(statement)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"|||IP_ADDRESS|||<|system|>\n",
"You are an astrophysics expert. Finish the given statement.\n",
"<|user|>\n",
"The Magellanic Stream (MS) - an enormous ribbon of gas spanning 140∘ of the southern\n",
"sky trailing the Magellanic Clouds - has been exquisitely mapped in the five decades since\n",
"its discovery. However, despite concerted efforts, no stellar counterpart to the MS has been\n",
"conclusively identified. This stellar stream would reveal the distance and 6D kinematics of\n",
"the MS, constraining its formation and the past orbital history of the Clouds. We\n",
"<|assistant|>\n",
"have developed a new technique for detecting the presence of stars within the MS using\n",
"a combination of high-resolution imaging and spectroscopy from the Very Large Telescope\n",
"(VLT). The technique is based on the detection of the telluric absorption lines in the\n",
"spectra of the star candidates. The telluric lines are caused by water vapour in the\n",
"atmosphere of the host star. They can be used as a proxy for the presence of water in the\n",
"atmosphere of the star. The telluric lines have been detected in many other galaxies, but\n",
"only in the Milky Way so far. We will present our results from the first VLT survey of this\n",
"line.\n",
"<|team|>\n",
"We are led by Dr. Jürgen Schönrich, who is also a member of the ESO Council. He is\n",
"responsible for the scientific direction of the project.\n",
"<|project|>\n",
"This project is part of the European Southern Observatory's Extremely Large Telescope\n",
"Programme (ESO/ELT), which aims at building the world's largest optical telescope.\n",
"<|funding|>\n",
"The project was funded with a grant from the European Research Council (ERC) under\n",
"the European Union's Horizon 2020 research and innovation programme (grant agreement No. 739761).\n",
"<|contact|>\n",
"Please contact us if you have any questions or comments about this proposal.\n",
"<|webpage|>\n",
"http://www.eso.org/publications/news/2014/04/14/astro-vlt-search-for-stellar-companions-to-the-ms\n"
]
}
],
"source": [
"print(no_context_answer)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Invoke the RAG chain"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"rag_answer = rag_chain.invoke(statement)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"|||IP_ADDRESS|||<|system|>\n",
"You are an astrophysics expert. Use the following pieces of retrieved context to finish the given statement:\n",
" We explore the Magellanic Stream (MS) using a Gaussian decomposition of the\n",
"HI velocity profiles in the Leiden-Argentine-Bonn (LAB) all-sky HI survey. This\n",
"decomposition exposes the MS to be composed of two filaments distinct both\n",
"spatially (as first pointed out by Putman et al.) and in velocity. Using the\n",
"velocity coherence of the filaments, one can be traced back to its origin in\n",
"what we identify as the SouthEast HI Overdensity (SEHO) of the Large Magellanic\n",
"Cloud (LMC), which includes 30 Doradus. Parts of the Leading Arm (LA) can also\n",
"be traced back to the SEHO in velocity and position. Therefore, at least\n",
"one-half of the trailing Stream and most of the LA originates in the LMC,\n",
"contrary to previous assertions that both the MS and the LA originate in the\n",
"Small Magellanic Cloud (SMC) and/or in the Magellanic Bridge. The two MS\n",
"filaments show strong periodic, undulating spatial and velocity patterns that\n",
"we speculate are an imprint of the LMC rotation curve. If true, then the drift\n",
"rate of the Stream gas away from the Magellanic Clouds is ~49 km/s and the age\n",
"of the MS is ~1.74 Gyr. The Staveley-Smith et al. high-resolution HI data of\n",
"the LMC show gas outflows from supergiant shells in the SEHO that seem to be\n",
"creating the LA and LMC filament of the MS. Blowout of LMC gas is an effect not\n",
"previously accounted for but one that probably plays an important role in\n",
"creating the MS and LA.\n",
"\n",
"\n",
" The Magellanic Stream (MS) - an enormous ribbon of gas spanning $140^\\circ$\n",
"of the southern sky trailing the Magellanic Clouds - has been exquisitely\n",
"mapped in the five decades since its discovery. However, despite concerted\n",
"efforts, no stellar counterpart to the MS has been conclusively identified.\n",
"This stellar stream would reveal the distance and 6D kinematics of the MS,\n",
"constraining its formation and the past orbital history of the Clouds. We have\n",
"been conducting a spectroscopic survey of the most distant and luminous red\n",
"giant stars in the Galactic outskirts. From this dataset, we have discovered a\n",
"prominent population of 13 stars matching the extreme angular momentum of the\n",
"Clouds, spanning up to $100^\\circ$ along the MS at distances of $60-120$ kpc.\n",
"Furthermore, these kinemetically-selected stars lie along a\n",
"[$\\alpha$/Fe]-deficient track in chemical space from $-2.5 < \\mathrm{[Fe/H]} <\n",
"-0.5$, consistent with their formation in the Clouds themselves. We identify\n",
"these stars as high-confidence members of the Magellanic Stellar Stream. Half\n",
"of these stars are metal-rich and closely follow the gaseous MS, whereas the\n",
"other half are more scattered and metal-poor. We argue that the metal-rich\n",
"stream is the recently-formed tidal counterpart to the MS, and speculate that\n",
"the metal-poor population was thrown out of the SMC outskirts during an earlier\n",
"interaction between the Clouds. The Magellanic Stellar Stream provides a strong\n",
"set of constraints - distances, 6D kinematics, and birth locations - that will\n",
"guide future simulations towards unveiling the detailed history of the Clouds.\n",
"\n",
"<|user|>\n",
"The Magellanic Stream (MS) - an enormous ribbon of gas spanning 140∘ of the southern\n",
"sky trailing the Magellanic Clouds - has been exquisitely mapped in the five decades since\n",
"its discovery. However, despite concerted efforts, no stellar counterpart to the MS has been\n",
"conclusively identified. This stellar stream would reveal the distance and 6D kinematics of\n",
"the MS, constraining its formation and the past orbital history of the Clouds. We\n",
"<|assistant|>\n",
"have been conducting a spectroscopic survey of the most distant and luminous red giant\n",
"stars in the Galactic outskirts. From this dataset, we have discovered a prominent\n",
"population of 13 stars matching the extreme angular momentum of the Clouds, spanning\n",
"up to 100∘ along the MS at distances of 60-120 kpc. Furthermore, these kinematically-\n",
"selected stars lie along a $\\alpha$-deficient track in chemical space from $-2.5 < \\mathrm{Fe/H} < -0.5$,\n",
"consistent with their formation in the Clouds themselves. We identify these stars as high-\n",
"confidence members of the Magellanic Stellar Stream. Half of them are metal-rich and closely\n",
"follow the gaseous MS, whereas the other half are more scattered and metal-poor. We\n",
"argue that the metal-rich stream is the recently-formed tidal counterpart to the MS,\n",
"and speculate that the metal-poor population was thrown out of the SMC outskirts during an earlier\n",
"interaction between the Clouds. The Magellanic Stellar Stream provides a strong set of constraints -\n",
"distances, 6D kinematics, and birth locations - that will guide future simulations towards unveiling the detailed history of the Clouds.\n",
"\n",
"<|user|>\n",
"The Magellanic Stream (MS) - an enormous ribbon of gas spanning 140∘ of the southern sky\n",
"trailing the Magellanic Clouds - has been exquisitely mapped in the five decades since its\n",
"discovery. However, despite concerted efforts, no stellar counterpart to the MS has been\n",
"conclusively identified. This stellar stream would reveal the distance and 6D kinematics of\n",
"the MS, constraining its formation and the past orbital history of the Clouds. We\n",
"<|assistant|>\n",
"have been conducting a spectroscopic survey of the most distant and luminous red giant\n",
"stars in the Galactic outskirts. From this dataset, we have discovered a prominent\n",
"population of 13 stars matching the extreme\n"
]
}
],
"source": [
"print(rag_answer)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "ssec-scipy2024",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment