Skip to content

Instantly share code, notes, and snippets.

@oneryalcin
Created May 18, 2023 12:47
Show Gist options
  • Save oneryalcin/9cf97ac2c1f3be03904567ffc76bb735 to your computer and use it in GitHub Desktop.
Save oneryalcin/9cf97ac2c1f3be03904567ffc76bb735 to your computer and use it in GitHub Desktop.
Extracting Entities from Articles using KOR
from langchain.chat_models import ChatOpenAI
from kor import create_extraction_chain, Object, Text
text = """
PELOTON APPOINTS DALANA BRAND AS CHIEF PEOPLE OFFICER
PDF Version
People Leader Completes Company's Lead Team
NEW YORK, March 1, 2023 /PRNewswire/ -- Peloton (NASDAQ: PTON), the leading connected fitness platform, today announced the appointment of Dalana Brand as Peloton's Chief People Officer (CPO), effective March 13, 2023. As a seasoned executive with significant global leadership experience in multiple industries, Brand joins the team with a strong reputation for organizational transformation. She will report to CEO Barry McCarthy and serve as a member of the leadership team, leading the company's Global People Team.
Dalana Brand
"Talent density has been a top priority for me at Peloton. Dalana's addition is the culmination of that strategy, rounding out and completing the leadership team," said McCarthy. "As we continue Peloton's transformation and pivot to growth, her vision and leadership will be critical to our success."
Brand previously served as Chief People and Diversity Officer at Twitter, where she led the company's global workforce and accelerated the progress to create an inclusive people experience. She joined the team in 2018, serving as Vice President of People Experience and Head of Inclusion & Diversity. Before joining Twitter, Brand was Vice President of Total Rewards for Electronic Arts and previously held senior leadership positions at the Whirlpool Corporation.
"I've made a career out of fostering inclusive employee experiences and leading with transparency and accountability," said Brand. "I'm thrilled to join the team at Peloton as the company continues striving to make fitness accessible for all."
Brand is relocating from California and will be based in New York City.
About Peloton:
Peloton (NASDAQ: PTON) is the leading connected fitness platform with a highly engaged community of nearly 7 million Members worldwide. A category innovator at the nexus of fitness, technology, and media, Peloton's first-of-its-kind subscription platform seamlessly combines innovative hardware, distinctive software, and exclusive content. Its world-renowned instructors, coach and motivate Members to be the best version of themselves anytime, anywhere. Founded in 2012 and headquartered in New York City, Peloton continues to scale across the US, UK, Canada, Germany, and Australia. For more information, visit www.onepeloton.com.
PRESS CONTACT:
Ben Boyd
press@onepeloton.com
"""
llm = ChatOpenAI(
model_name="gpt-4",
temperature=0,
max_tokens=2000,
frequency_penalty=0,
presence_penalty=0,
top_p=1.0,
)
new_company = Object(
id="new_role",
description="New Role information of the person",
attributes=[
Text(
id="company_name",
description="Name of the new company",
examples=[("TEG appoints new CTO Cameron Stone, he was previously working for Microsoft", "TEG")],
),
Text(
id="industry",
description="Industry of the new company",
examples=[("Tesla appoints new CTO John Smith, who worked many years at Google", "Automotive")],
),
Text(
id="web_page",
description="Web page of the new company, may not be available in the article. If you are confident about the company name, you populate it",
examples=[("Tesla appoints new CTO John Smith", "tesla.com")],
),
Text(
id="parent_company",
description="Parent company of the new company, If you are confident about the company name, you populate it",
examples=[("TEG appoints new CTO Cameron Stone, he was previously working for Microsoft", "Silver Lake")],
),
Text(
id="title",
description="Rew role of the person",
examples=[("John Smith is appointed as CFO of Acme Industries, he was previously Director at Tesla",
"Chief Financial Officer")],
),
]
)
old_company = Object(
id="old_role",
description="Old Role information of the person, person might have had worked in multiple roles in different companies",
attributes=[
Text(
id="company_name",
description="Name of the old company",
examples=[
("TEG appoints new CTO Cameron Stone, he was previously working for Microsoft and before for Twitter", "Microsoft"),
("TEG appoints new CTO Cameron Stone, he was previously working for Microsoft and before for Twitter", "Twitter")
],
),
Text(
id="industry",
description="Industry of the old company",
examples=[("Tesla appoints new CTO John Smith, who worked many years at Google", "Technology")],
),
Text(
id="web_page",
description="Web page of the old company, if not in the article guess it from the name",
examples=[("Tesla appoints new CTO John Smith, who was director at British Airways", "www.ba.com")],
),
Text(
id="parent_company",
description="Parent company of the old company",
examples=[
("TEG appoints new CTO Cameron Stone, he was previously working for Sky UK", "Comcast"),
("TEG appoints new CTO Cameron Stone, he was previously working for Microsoft", ""),
("Air France CEO Jens Bischof has been appointed as CIO for rival LOT Polish Airlines, he was previously working for Eurowings", "Lufthansa Group"),
],
),
Text(
id="title",
description="Old role of the person",
examples=[("John Smith is appointed as CFO of Acme Industries, he was previously Director at Tesla",
"Director")],
),
],
many=True,
)
schema = Object(
id="people",
description=(
"User is extracting entities related to company and people information from an article and if not available it enriches the article with the missing information"
),
attributes=[
Text(
id="first_name",
description="First and middle name of the person",
examples=[("John Smith", "John"), ("John Bob Smith", "John Bob")]
),
Text(
id="last_name",
description="Last name of the person",
examples=[],
),
new_company,
old_company,
Text(
id="previous_role",
description="Previous role of the person",
examples=[("John Smith is appointed as CFO of Acme Industries, he was previously Director at Tesla",
"Chief Financial Officer")],
)
],
many=True,
)
chain = create_extraction_chain(llm, schema, encoder_or_encoder_class='json')
# print(chain.prompt.format_prompt(text="[user input]").to_string())
print('-------------------------------------------------------------\n')
res = chain.predict_and_parse(text=text)
import json
print(json.dumps(res, indent=2))
@oneryalcin
Copy link
Author

Response is:

{
  "data": {
    "people": [
      {
        "first_name": "Dalana",
        "last_name": "Brand",
        "new_role": {
          "company_name": "Peloton",
          "industry": "Fitness",
          "web_page": "www.onepeloton.com",
          "title": "Chief People Officer"
        },
        "old_role": [
          {
            "company_name": "Twitter",
            "industry": "Technology",
            "web_page": "www.twitter.com",
            "title": "Chief People and Diversity Officer"
          }
        ]
      }
    ]
  },
  "raw": "<json>{\"people\": [{\"first_name\": \"Dalana\", \"last_name\": \"Brand\", \"new_role\": {\"company_name\": \"Peloton\", \"industry\": \"Fitness\", \"web_page\": \"www.onepeloton.com\", \"title\": \"Chief People Officer\"}, \"old_role\": [{\"company_name\": \"Twitter\", \"industry\": \"Technology\", \"web_page\": \"www.twitter.com\", \"title\": \"Chief People and Diversity Officer\"}]}]}</json>",
  "errors": [],
  "validated_data": {}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment