Skip to content

Instantly share code, notes, and snippets.

@creatorrr
Last active March 10, 2025 17:48
Show Gist options
  • Save creatorrr/97393bac5a2aae4b7cc1b865fa9d5fbb to your computer and use it in GitHub Desktop.
Save creatorrr/97393bac5a2aae4b7cc1b865fa9d5fbb to your computer and use it in GitHub Desktop.
TIRA Beauty AI Assistant (Julep)

TIRA Beauty AI Assistant (Julep)

An intelligent beauty advisor built by Julep AI that helps users discover beauty products, understand ingredients, and get personalized beauty advice.

Overview

The TIRA Beauty AI Assistant is a proof-of-concept chatbot that demonstrates:

  • Product recommendations from TIRA's catalog
  • Ingredient explanations and beauty advice
  • Personalized skincare/beauty routines
  • Real-time product availability checks

Key Features

  • Smart Search: Uses hybrid search combining vector and text-based approaches.
  • RAG Pipeline: Leverages Retrieval-Augmented Generation to provide accurate, factual responses.
  • Product Knowledge: Current deployed agent contains ~15K products (Categories: Hair, Skin, Make-up)
  • Beauty Expertise: Can explain ingredients, suggest routines, and compare products.
  • Real-time Data: Integrates with TIRA's systems to check stock and availability.

Technical Implementation

  • Built using Julep AI
  • Uses Claude 3.7 Sonnet as the base model for the chatbot.
  • Used Claude 3.5 Haiku / gpt-4o mini for contextualization
  • Used openai text-embeddings-3-large for embedding
  • Implements hybrid RAG search (vector search + BM25 + trigram search) with MMR for better result diversity.
  • Automated product indexing and FAQ generation.

Try It Out

🔗 Chat with TIRA Beauty Assistant

tira-demo-screeenshot

Development

To run this project locally:

  1. Clone the repository
  2. Run the notebook to populate the document store.
  3. Chat with the session that is created in the notebook.

Setup

  1. Install dependencies

    pip install -r requirements.txt
  2. Configure the .env file

  3. Run the notebook to populate the document store.

  4. Chat with the session that is created in the notebook.

    Run the cells after Create a Julep Session

name: Tia
about: You are a helpful beautifician and friend who is designed to assist customers with their queries about products from Tira Beauty website. Your goal is to provide clear and detailed responses.
instructions: |-
**Guidelines**:
1. Assume the user is unfamiliar with the company and products.
2. Thoroughly read and comprehend the user's question.
3. Use the provided context documents to find relevant information.
4. Craft a detailed response based on the context and your understanding of the company and products.
**Response format**:
- Use simple, clear language. Keep it concise and casual. Remember, you are having a chat with the user.
- Use markdown formatting, though only when listing products, mention the product name in bold, adding links... etc.
- Always try to include relevant products or website links (urls). However, when they don't exist, do not come up with links.
**Important**:
- For questions related to the business, only use the information that are explicitly given in the documents above.
- If the user asks about the business, and it's not given in the documents above, respond with an answer that states that you don't know.
- Use the most recent and relevant data from context documents.
- Be proactive in helping users find solutions.
- Always mention product URLs (if it exists in the context documents), and in markdown format, i.e. surround the product name with the url in markdown format. e.g. **[Product Name](url)**
- Always try to mention the price of the product in your response if it's given in the context documents. However, only get the price from the relevant context product document, and always mention that "(Price may vary)" next to the price itself.
- Ask for clarification if the query is unclear.
- You can assume some basic attributes of the customer, so avoid asking too many questions, **BUT** whenever you're unsure, please feel free to ask/clarify so that you can assist them better. e.g. if a user asks for summer shorts, ask if they are looking for men or women
- Inform users if their query is unrelated to the given website.
- Avoid using the following in your response: Based on the provided documents, based on the provided information, based on the documentation... etc.
- Do not provide any URL in your response unless it is explicitly given in the context documents.
- If the user asks about a product, and it's not given in the context documents, do not answer the question, and state that you don't have information about that product.
model: claude-3.5-haiku
{%- if agent.name %}
You are {{ agent.name }}.
{% endif %}
{%- if agent.about %}
{{ agent.about }} {{NEWLINE}}
{% endif %}
{%- if docs -%}
{{NEWLINE}}
Relevant documents (based on website search):
{%- for i, doc in enumerate(docs) -%}
{{NEWLINE}}
<product>
{{NEWLINE}}**Product Name:** {{ doc.title }}{{NEWLINE}}
{%- if doc.metadata -%}
**Product Basic Details (important):** {{NEWLINE}}{{NEWLINE}}
{%- if doc.metadata.brand.name -%}
Brand: {{ doc.metadata.brand.name }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.slug -%}
URL: https://www.tirabeauty.com/product/{{ doc.metadata.slug }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.price and doc.metadata.price.min and doc.metadata.price.currency -%}
Price: {{ doc.metadata.price.min }} {{ doc.metadata.price.currency }} {{NEWLINE}}
{%- elif doc.metadata.price.effective.min and doc.metadata.price.effective.currency_symbol-%}
Price: {{ doc.metadata.price.effective.min }} {{ doc.metadata.price.effective.currency_symbol }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.rating and doc.metadata.rating != 0 -%}
Rating: {{ doc.metadata.rating }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.categories and doc.metadata.categories[0].name -%}
Categories: {{ doc.metadata.categories[0].name }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.country_of_origin -%}
Country of Origin: {{ doc.metadata.country_of_origin }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.tags -%}
Tags: {{ doc.metadata.tags | join(', ') }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.attributes.skin_type -%}
Skin Type: {{ doc.metadata.attributes.skin_type }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.attributes.gender -%}
Gender: {{ doc.metadata.attributes.gender }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.attributes.discount -%}
Discount: {{ doc.metadata.attributes.discount }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.attributes.benefits -%}
Benefits: {{ doc.metadata.attributes.benefits | join(', ') }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.attributes.concern -%}
Concerns: {{ doc.metadata.attributes.concern | join(', ') }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.attributes.formulation -%}
Formulation: {{ doc.metadata.attributes.formulation }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.attributes['super-ingredients'] -%}
Super-ingredients: {{ doc.metadata.attributes['super-ingredients'] | join(', ') }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.attributes.preference -%}
Preferences: {{ doc.metadata.attributes.preference | join(', ') }} {{NEWLINE}}
{%- endif -%}
{%- if doc.metadata.attributes['shelf-life-in-months'] -%}
Shelf Life: {{ doc.metadata.attributes['shelf-life-in-months'] }} months
{%- endif -%}
{%- endif -%}
{%- if doc.content is string -%}
{{NEWLINE}}**Description:**{{NEWLINE}}
{{ doc.content }}
{{NEWLINE}}
{%- else -%}
{%- for snippet in doc.content -%}
{{NEWLINE}}**Product Description:** {{ snippet }} {{NEWLINE}}
{%- endfor -%}
{%- endif -%}
{{NEWLINE}}</product>
{%- endfor -%}
{%- endif -%}
{{NEWLINE}}Here are your instructions that you should **strictly** follow:
{%- if agent.instructions %}
{{ agent.instructions[0] }} {{NEWLINE}}
{% endif %}
Current Date & Time: {{time.strftime('%d-%m-%Y %H:%M')}}
You are talking to a customer. They are chatting with you on the Tira website. Begin!
## yaml-language-server: $schema=https://raw.githubusercontent.com/julep-ai/julep/refs/heads/dev/schemas/create_task_request.json
# Define the needed tools for the workflow
tools:
# tool to make an api call to get the products for a given collection
- name: get_collection_products
type: api_call
api_call:
method: GET
# Placeholder link, can change when calling the tool
url: https://www.tirabeauty.com/ext/plpoffers/application/api/v1.0/collections/skin/items
# tool to create a document for the agent
- name: create_agent_doc
type: system
system:
resource: agent
subresource: doc
operation: create
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# The main workflow definition
main:
# Get the products for the given collection (the collection is embedded within the link)
- tool: get_collection_products
arguments:
url: $ f'https://www.tirabeauty.com/ext/plpoffers/application/api/v1.0/collections/{steps[0].input.collection}/items'
params:
filters: $ False
page_id: $ steps[0].input.page_id
page_size: $ steps[0].input.n_products
# Unwrap the JSON response from the tool and get the products
- evaluate:
products: $ _.get('json').get('items')
# Iterate over the products and index each one parallelly
- over: $ _.products
parallelism: 15
map:
workflow: index_product
arguments:
name: $ _.get('name')
description: $ html_to_markdown(_.get('attributes', {}).get('description', ''))
data: $ _
## yaml-language-server: $schema=https://raw.githubusercontent.com/julep-ai/julep/refs/heads/dev/schemas/create_task_request.json
# Subworkflow that, given product data, will generate FAQs for the product
# add them along with other product details to the agent's doc store
index_product:
# Extract useful information from metadata
- evaluate:
filtered_metadata: |
$ {
'name': steps[0].input.data.get('name', None), # Product name
'brand': steps[0].input.data.get('brand', {}).get('name', None), # Brand information
'price': {'min': steps[0].input.data.get('price', None).get('effective', {}).get('min', None),
'max': steps[0].input.data.get('price', None).get('effective', {}).get('max', None),
'currency': steps[0].input.data.get('price', None).get('effective', {}).get('currency_code', None)}, # Price details
'categories': steps[0].input.data.get('categories', None), # Product categories
'country_of_origin': steps[0].input.data.get('country_of_origin', None), # Manufacturing country
'tags': steps[0].input.data.get('tags', None), # Product tags
'item_type': steps[0].input.data.get('item_type', None), # Type of item
'teaser_tag': steps[0].input.data.get('teaser_tag', None), # Promotional tag
'attributes': {
'skin_type': steps[0].input.data.get('attributes', {}).get('skin-type', None),
'gender': steps[0].input.data.get('attributes', {}).get('gender', None),
'discount': steps[0].input.data.get('attributes', {}).get('discount', None),
'preference': steps[0].input.data.get('attributes', {}).get('preference', None),
'category-l1': steps[0].input.data.get('attributes', {}).get('category-l1', None),
'category-l2': steps[0].input.data.get('attributes', {}).get('category-l2', None),
'benefits': steps[0].input.data.get('attributes', {}).get('benefits', None),
'category-l3': steps[0].input.data.get('attributes', {}).get('category-l3', None),
'productaffluence': steps[0].input.data.get('attributes', {}).get('productaffluence', None),
'concern': steps[0].input.data.get('attributes', {}).get('concern', None),
'formulation': steps[0].input.data.get('attributes', {}).get('formulation', None),
'super-ingredients': steps[0].input.data.get('attributes', {}).get('super-ingredients', None),
'variants': steps[0].input.data.get('variants', None)}
}
# A prompt step that generate FAQs based on the scraped content
- prompt:
# A system prompt that instructs the LLM to generate FAQs based on the scraped content
- role: system
content: |-
You are a helpful assistant that is tasked with indexing the products from the Tira Beauty website.
You will be given:
- product name.
- a markdown content of that represents a description for a beauty product found on an e-commerce website.
- data dump of other information and metadata about the product.
<task>
- Read the markdown content very carefully.
- Pay attention to information listed in the data dump.
- Come up with a FAQ for the product based on the content.
- Answer the FAQ in a structured format.
- Return your response in JSON format has the following fields for each FAQ:
- question: The question of the FAQ.
- answer: The answer of the FAQ.
</task>
<Important>
- Only come up with FAQs that are relevant to the product, and that can be answered based on the provided content.
- Make sure your response is a valid JSON list of objects that have `question` and `answer` fields.
- Make sure to surround the JSON response with ```json``` tags.
</Important>
Example of FAQ:
- What are the benefits of using the product?
- What are the ingredients of the product?
- Does the product have any side effects?
- What are the chemicals in the product?
- What kind of skin type is the product for?
- What are the uses of the product?
- What are the precautions of using the product?
- What are the contraindications of using the product?
- What are the interactions of the product?
- What are the storage instructions of the product?
- What are the expiration date of the product?
# A user prompt that provides the product data to the LLM
- role: user
content: |-
$ f"""
<product_name>
{steps[0].input.name}
</product_name>
<product_description>
{steps[0].input.description}
</product_description>
<other_product_data>
{steps[0].output.filtered_metadata}
</other_product_data>
"""
# Unwrap (response.choices[0].message.content) and provide the response text to the next step
unwrap: true
# Extract the FAQs from the LLM's json response
- evaluate:
faqs: $ extract_json(_)
# Prompt step to rewrite everything as points (will be used as content for the document that will be created)
- prompt:
# A system prompt that instructs the LLM to rewrite the document, FAQs and context as points.
- role: system
content: |-
You are an agent who works for a Tira Beauty company, who is tasked with rewriting product description, FAQs, metadata as points.
You will be given:
- a markdown content of that represents a description for a beauty product found on an e-commerce website.
- a list of FAQs for the product.
- metadata about the product including details like name, brand, price, rating, categories, etc.
<task>
- Rewrite the document, FAQs, metadata as points. Optimize the final document for search retrieval.
- Include relevant metadata in your points to enhance searchability.
</task>
# A user prompt that provides the product data to the LLM
- role: user
content: |-
$ f"""
<document>
{steps[0].input.description}
</document>
<faqs>
{'\\n\\n'.join([f'## {faq["question"]}\\n{faq["answer"]}' for faq in _.faqs])}
</faqs>
<metadata>
{steps[0].output.filtered_metadata}
</metadata>
"""
# Unwrap `response.choices[0].message.content` and provide the response text to the next step
unwrap: true
# Create the document
- tool: create_agent_doc
arguments:
agent_id: $ str(agent.id)
data:
metadata: $ steps[0].input.data
title: $ steps[0].input.name
content: $ _
julep
python-dotenv
PyYAML
requests
notebook
ipykernel
pandas
matplotlib
seaborn
fire
litellm
tqdm
joblib
markdownify
filelock
# For tira-beauty-assistant
api_key=your_julep_dev_api_key
# For benchmark
JULEP_API_KEY=your_julep_api_key
JULEP_ENV=local_multi_tenant_or_dev_or_production
ANTHROPIC_API_KEY=your_anthropic_api_key
OPENAI_API_KEY=your_openai_api_key

Architecture

Tira Diw - Frame 3 Tira Diw - Frame 2 Tira Diw - Frame 1 (3) Tira Diw - Frame 5 (1)

Search

For a use case like Tira that requires support for spelling mistakes (non-standard as in not something a spell checker can fix) and proper nouns (brand names etc.), we need to utilize hybrid search.

Challenges

  • Special domain

  • Lots of proper nouns (unique brand names)

  • Spelling mistakes in queries (non-standard)

  • Basic multilingual (Hinglish)

  • Up-to-date index

  • Deep integrations

Hybrid Search

  • Combine:

    • Full-text search (BM25)

      • Standard keywords-based search

      • Matches Loreal, blak ...

      • Does not match spelling in queries

      • Does not match "meaning" e.g., something that's light on skin

      • Restricted to English

    • Semantic (vector) search

      • Search by embedding vector

      • Can match "meaning"

      • Robust to formatting issues in the documents indexed

      • Can match multilingual queries

      • Cannot match brand names & proper nouns

      • Inexact, noisy without contextualism

    • Trigram search

      • Search by sub-words

      • Can match spelling mistakes

      • Can match multilingual queries

      • Otherwise poor precision

Julep supports all three and we use:

  • pgvector + timescale for storage

  • openai embedding models

Preprocessing

  1. Parsing (HTML/PDF)

    • Using Llamaparee/Jina/OCR etc.
  2. Cleaning

    • Simple text cleaning, unicode, smiley, etc.

    • Using Julep workflow

  3. Contentualizing

    • Use LLM to rewrite doc in a way that's optimized for search (Optional but strongly recommended)

Benchmark Results

Trial Mode MMR Contextualization Accuracy (correct match)
1 Vector off off 0.55
2 Hybrid on off 0.73
3 Hybrid on on 0.91
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment