Skip to content

Instantly share code, notes, and snippets.

@LindaLawton
Created April 30, 2024 08:25
Show Gist options
  • Save LindaLawton/cf25e51ef28745614f3f48d91ab5ddab to your computer and use it in GitHub Desktop.
Save LindaLawton/cf25e51ef28745614f3f48d91ab5ddab to your computer and use it in GitHub Desktop.
A Python application that uses Google's Generative AI API to summarize text and extract key insights from documents, text and webpages.

Corporate Synthesizer

A Python application that uses Google's Generative AI API to summarize text and extract key insights from documents.

Get started with the Corporate Synthesizer API

Requirements:

  • Python 3.7+

  • Google Cloud Platform account with API key and project ID

  • Install required libraries:

    pip install google-cloud-generativelanguage google-api-python-client dotenv requests

Set up your environment:

  1. Create a .env file in the root of the project and add the following variables:

    API_KEY=YOUR_API_KEY MODEL_NAME_LATEST=text-bison-001

  2. Replace YOUR_API_KEY with your actual Google Cloud Platform API key.

Run the application:

python corporate_synthesizer.py

Samples

file_path = r"testData\test.txt"
url = "https://en.wikipedia.org/wiki/Pacific_parrotlet"

doc = get_document(file_path)
answer = gemini_service.single_answer(f"Give me five bullet point facts", doc)
print(answer)

doc = get_document(url)
answer = gemini_service.single_answer(f"Give me five bullet point facts", doc)
print(answer)

doc = get_document("Just some text")
answer = gemini_service.single_answer(f"Give me five bullet point facts", doc)
print(answer)

Documentation

License

The contents of this repository are licensed under the Apache License, version 2.0.

from typing import Mapping, Iterable
from io import BytesIO
import google
from dotenv import load_dotenv
import google.generativeai as genai
import os
import base64
import google.ai.generativelanguage as glm
from google.ai.generativelanguage_v1beta import HarmCategory
from google.generativeai import GenerationConfig
from google.generativeai.types import HarmBlockThreshold
from google.generativeai.types.safety_types import LooseSafetySettingDict
import requests
import re
load_dotenv()
def is_valid_url(url):
"""
Check if a given URL is valid.
Args:
url (str): The URL to be validated.
Returns:
bool: True if the URL is valid, False otherwise.
"""
# Regular expression pattern for a simple URL
url_pattern = re.compile(r"^(http|https)://[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)+([/?].*)?$")
# Use the match method to check if the URL matches the pattern
# Return True if it matches, False otherwise
return bool(url_pattern.match(url))
def read_url_page(url):
"""
Retrieve the content of a web page from a given URL.
Args:
url (str): The URL of the web page to retrieve.
Returns:
str or None: The text content of the web page if retrieval was successful,
None if an error occurred.
"""
try:
# Send a GET request to the URL
response = requests.get(url)
# Check if the request was successful (status code 200)
if response.status_code == 200:
# Return the text content of the page
return response.text
else:
# Print an error message if the request was not successful
print("Failed to retrieve page. Status code:", response.status_code)
return None
except requests.RequestException as e:
# Handle any request exceptions (e.g., network errors)
print("Error fetching page:", e)
return None
def read_file(file_path, encodings=['utf-8', 'latin-1', 'cp1252']):
"""
Read the content of a file using various encodings until successful decoding.
Args:
file_path (str): The path to the file to be read.
encodings (list, optional): List of encodings to try for decoding.
Defaults to ['utf-8', 'latin-1', 'cp1252'].
Returns:
str: The content of the file if decoding was successful.
Raises:
UnicodeDecodeError: If the file cannot be decoded using any of the specified encodings.
"""
# Iterate through each encoding in the list of encodings
for encoding in encodings:
try:
# Open the file with the specified encoding
with open(file_path, "r", encoding=encoding) as file:
# Read the content of the file
content = file.read()
# Return the content if decoding was successful
return content
except UnicodeDecodeError:
# If decoding fails with the current encoding, continue to the next encoding
pass
# If none of the encodings were successful, raise an exception
raise UnicodeDecodeError("Unable to decode the file using the specified encodings.")
def built_blob_part(data):
"""
Create a MIME part containing binary data encoded as Base64.
Args:
data (bytes): The binary data to be encoded.
Returns:
glm.Part: A MIME part containing the Base64-encoded data.
"""
# Create a BytesIO object to handle the binary data
data = BytesIO(data)
# Get the bytes from the BytesIO object
data_bytes = data.getvalue()
# Encode the binary data as Base64
base64_bytes = base64.b64encode(data_bytes).decode("utf-8")
# Create a Blob with the Base64-encoded data
blob = glm.Blob(mime_type="text/basic", data=base64_bytes)
# Create a Part with the Blob as inline data
return glm.Part(inline_data=blob)
def get_document(path_or_url):
"""
Get the content of a document, either from a URL, a file, or as plain text.
Args:
path_or_url (str): The path to a local file, a URL, or plain text.
Returns:
glm.Part: A MIME part containing the content of the document.
"""
# Check if the input is a valid URL
if is_valid_url(path_or_url):
# If it's a URL, read the web page
data = read_url_page(path_or_url)
# Encode the data as a MIME part and return
return built_blob_part(data.encode())
# Check if the input is a valid file path
elif os.path.exists(path_or_url):
# If it's a file, read the file content
with open(path_or_url, 'rb') as file:
file_content = file.read()
# Encode the file content as a MIME part and return
return built_blob_part(file_content)
else:
# If it's neither a URL nor a file, treat it as plain text
# Encode the plain text as a MIME part and return
return built_blob_part(path_or_url.encode())
class GeminiService:
generation_config: GenerationConfig = {
'temperature': 0.9,
'top_p': 1,
'top_k': 40,
'max_output_tokens': 2048,
'stop_sequences': [],
}
safety_settings: Mapping[str | int | HarmCategory, str | int | HarmBlockThreshold] | Iterable[
LooseSafetySettingDict] = [{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
{"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
{"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}]
def __init__(self, system_instruction):
genai.configure(api_key=os.getenv("API_KEY"))
self.model = genai.GenerativeModel(model_name=os.getenv("MODEL_NAME_LATEST"),
system_instruction=system_instruction,
generation_config=self.generation_config,
safety_settings=self.safety_settings)
def single_completion(self, message: str,
uploaded_document: google.ai.generativelanguage_v1beta.types.content.Part = None) -> str:
try:
content = message
if uploaded_document:
# this will add a prompt directly related to file data include in the prompt
content = [
uploaded_document,
message,
]
response = self.model.generate_content(content)
return response.text
except google.api_core.exceptions.FailedPrecondition as e:
if "User location" in str(e):
return "Error: User location not accepted"
return str(e)
class GeminiSingleCompletion:
def __init__(self, system_instruction) -> None:
self.service = GeminiService(system_instruction)
def single_answer(self, message: str,
uploaded_document: google.ai.generativelanguage_v1beta.types.content.Part = None) -> str:
return self.service.single_completion(message=message, uploaded_document=uploaded_document)
def main():
gemini_service = GeminiSingleCompletion("You are 'Corporate Synthesizer' AI: Your expert alchemist of information. "
"Trained to distill complex texts into concise summaries for management. "
"Analyzes, identifies key insights, and crafts actionable summaries. "
"Ensures executives grasp crucial details swiftly. Your strategic partner "
"in navigating the sea of data for informed decisions."
"I will give you a text and your job is to digest it down to at most a few"
"paragraphs. ")
# Loading Inline.
# load page
file_path = "test.txt"
url = "https://en.wikipedia.org/wiki/Pacific_parrotlet"
doc = get_document(file_path)
answer = gemini_service.single_answer(f"Give me five bullet point facts", doc)
print(answer)
doc = get_document(url)
answer = gemini_service.single_answer(f"Give me five bullet point facts", doc)
print(answer)
doc = get_document("Just some text")
answer = gemini_service.single_answer(f"Give me five bullet point facts", doc)
print(answer)
if __name__ == "__main__":
main()
The Pacific parrotlet (Forpus coelestis) is a small green parrot originating from South America. A typical specimen is 11–14 centimetres (4.3–5.5 in) long and typically weighs 30 grams or more. Wild Pacific parrotlets are green with a dusty grey cast over the body, a bright green mask and a pinkish beak. Legs and feet are pinkish-grey. Pacific parrotlets are sexually dimorphic: males possess shades of blue on their wings. Blue can vary in intensity from a bright cobalt blue to a pale, almost lavender shade of blue on American birds; the blue is almost non-existent on marbled birds, only being visible on the underside of the wing right on the joint. Male parrotlets also have blue streaks behind the eyes which is often referred to as "eyeshadow;" as well as blue rumps. Female parrotlets have no blue on the wings whatsoever but can have blue eye streaks as well as a blue rump.
Parrotlets are often referred to as pocket parrots because of their size; but they are known for their larger-than-life personalities and feisty attitudes. They are solitary birds in captivity due to their aggressive behavior towards other birds when confined, so it is not suggested to house one with conspecifics unless ample space is available. Pet parrotlets should never be kept with other bird species due to the likelihood of aggression.
Although the base, or wild-type color of parrotlets is green, they also come in a rainbow of mutations: American Yellow, Green Marbled, American Yellow Marbled, Green Fallow, Green Fallow Marbled, American Yellow Fallow, American Yellow Marbled Fallow, Blue, American White, Blue Marbled, American White Marbled, Blue Fallow, American White Fallow, Blue Marbled Fallow, American White Marbled Fallow, Turquoise, American Turquoise, Turquoise Marbled, American Turquoise Marbled, Turquoise Fallow, American Turquoise Fallow, Turquoise Fallow Marbled, American Turquoise Fallow Marbled, Grey, American Grey, Grey Marbled, American Grey Marbled, Grey Fallow, American Grey Fallow, Grey Fallow Marbled, American Grey Fallow Marbled, Albino, Lutino Creamino, Cinnamon, Misty, Dark Factor, and Pied. "Dilute" is a term not used in parrotlets. "Pastel" was previously a term used for edging on the feathers and wings; however, the preferred term is Marbled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment