You will receive two inputs:
- A JSON Schema that defines the exact structure, field names, data types, required fields, and allowed values for a single contact‑information object.
- An unstructured text block (e.g., business card, email signature, profile) that may contain one or more persons’ details.
Your job is to extract every piece of information you can confidently identify from the text and output one JSON object that validates against the supplied schema. Follow every rule below exactly; the output must be a single JSON object without any surrounding markdown, whitespace, or explanatory text.
- Output ONLY the JSON object – no code fences, no comments, no extra whitespace.
- Use only keys that exist in the schema.
- For
social_media, allowed keys arelinkedin,twitter,github,facebook,instagram. Any other platform must go intosocial_media.otheras an array of objects{ "platform": "...", "handle": "..." }.
- For
- Never output
null, empty strings, or empty arrays/objects unless the schema explicitly permits them. - All string values must be trimmed (no leading/trailing spaces) and preserve the original case.
- Preserve the order of items in every array exactly as they appear in the source text.
- The final JSON must validate against the provided schema (you may test it mentally).
If the text mentions multiple distinct individuals, extract the one that:
- Is explicitly labeled as the primary contact (e.g., “Primary Email”, “Primary Phone”, “Contact:” etc.), or
- Has the most complete set of fields (most distinct schema properties filled).
If you cannot decide, choose the first person described in the text. Only one contact object is to be returned.
name.full_namemust be present if any name information is extracted.- All other top‑level fields are optional, but every object/array entry you include must satisfy its required sub‑fields (e.g., each
email_addressesentry must containemail).
| Schema Path | Extraction Details | Formatting / Inference |
|---|---|---|
name.full_name |
Build the complete name from any available parts (prefix, first, middle, last, suffix). | Prefix before the name, suffix after. |
name.first_name, name.middle_name, name.last_name, name.prefix, name.suffix |
Extract individual components when explicitly labeled (First Name:, Middle:, Last Name:, Title:, Prefix:, Suffix:) or infer from a full‑name line. Omit any component you cannot locate. |
|
job_title |
Professional title / position. | Use a line labeled “Title”, “Position”, “Job Title”, or the line immediately after the name if it clearly looks like a title. |
company.name |
Company or organization name. | Usually on its own line or after a label like “Company:”. |
company.department |
Department or division. | Look for a label “Department:” or a line directly under the company name that appears to be a department. |
company.industry |
Industry / sector. | Include only if explicitly labeled (e.g., “Industry: Healthcare”). |
email_addresses[] |
All email addresses. For each object: • email – exact address as in source (must match email format). • type – infer: - If the domain matches the company domain → “work”. - If the domain is a common personal provider ( gmail.com, yahoo.com, outlook.com, hotmail.com, etc.) → “personal”. - Otherwise → “other”. • primary – true for the email explicitly marked “Primary Email”, “Primary”, or similar. If no explicit primary, set true for the first email encountered and false for all others. |
Every email entry must contain primary (true/false). |
phone_numbers[] |
All phone numbers. For each object: • number – keep formatting exactly as in source (including parentheses, dashes, spaces). • type – infer from preceding label: “Mobile”, “Cell” → “mobile”; “Work”, “Direct”, “Office” → “work”; “Home”, “Residence” → “home”; “Fax” → “fax”; otherwise “other”. • extension – if the line contains “ext”, “extension”, “x”, capture the following digits (as a string). • primary – only include this field and set to true if the phone is explicitly labeled “Primary Phone”, “Primary”, or similar. If no explicit primary, omit the primary field entirely (do not add primary: false). |
|
addresses[] |
All physical address blocks. For each object: • type – infer from preceding label (“Work Address”, “Home Address”, “Mailing Address”, etc.). If unclear, default to “other”. • full_address – concatenate all address lines exactly as they appear, separated by commas. • street, city, state, postal_code, country – include only if you can parse them confidently (e.g., US “Boston, MA 02115”). Omit any component you cannot determine. |
If the source says “Same as work address”, duplicate the full_address (and any parsed components) for the new type. |
websites[] |
All URLs that are not email addresses or social‑media links. For each object: • url – raw URL text (keep “http://”/“https://” only if present). • type – infer: if the domain matches the company domain → “company”; if the URL appears under a personal label (“Personal Site”, “Portfolio”) → “personal”; otherwise “other”. |
|
social_media |
Social‑media handles. Include only the keys defined in the schema (linkedin, twitter, github, facebook, instagram). For each present key, output the exact string after the label (URL or handle). If additional platforms appear, place them in social_media.other as an array of objects { "platform": "...", "handle": "..." }. |
|
instant_messaging[] |
Instant‑messaging handles. Include an entry only if the source explicitly lists an IM service (e.g., “Skype: live:john”). Each entry: { "platform": "...", "handle": "..." }. |
|
preferred_contact_method |
Preferred method of contact. Output the exact value from the input only if it matches one of the allowed enum values (email, phone, text, social_media, other). |
|
timezone |
Time zone. Copy the string exactly as shown (e.g., “Eastern (ET)”). | |
language |
Preferred language(s). Copy the string exactly as shown (e.g., “English, Spanish”). | |
assistant |
Assistant’s contact info. Populate name, email, phone using the same inference rules as for the main contact. |
|
notes |
Free‑form notes. Include any text explicitly labeled “Notes:” (or similar). | |
tags |
Tags array. Include any comma‑separated list explicitly labeled “Tags:” (or similar). |
- Email Primary Flag – If the input contains “Primary Email”, “Primary”, or similar, that email gets
primary: true. Otherwise, the first email encountered getsprimary: true; all other emails must haveprimary: false. - Phone Primary Flag – Only set
primary: truewhen the source explicitly marks a phone as primary. If no explicit primary, do not include theprimaryfield for any phone. - Domain Matching – For email and website type inference, extract the domain part (after
@for email, after//or before first/for URL) and compare it to the company’s domain (if the company email domain is visible). If you cannot determine the company domain, default to “other”. - Country Names – Use the exact wording from the source (e.g., “USA”, “United States”). Do not expand or change abbreviations.
- Missing Sub‑fields – If a parent object is required (e.g.,
name) but some sub‑fields are missing, include only the sub‑fields you have. Do not add empty strings or placeholders. - Ambiguous Data – When you cannot confidently determine a value (e.g., you see a number but cannot tell if it’s a fax), omit that field rather than guessing.
- Array Uniqueness – Do not deduplicate entries unless they are exact duplicates; keep the order as found.
- Extension Extraction – Capture only the numeric part after “ext”, “extension”, or “x”. Do not include surrounding text.
- Address Parsing – If you cannot reliably split an address into its components, still provide
full_addressand omit the missing components. - Phone Number Formatting – Preserve the exact characters (including leading
+, parentheses, spaces, dashes) as they appear in the source. - Social Media Handles – If only the platform name is given (e.g., “LinkedIn” with no URL/handle), omit that platform from the output.
- Website vs. Social Media – URLs that point to known social‑media domains (linkedin.com, twitter.com, github.com, facebook.com, instagram.com) belong in
social_media, not inwebsites. - Preferred Contact Method Enum – The schema restricts this field to
["email","phone","text","social_media","other"]. Output only if the source value matches exactly (case‑sensitive).
- ✅ Single JSON object, no extra commas or trailing punctuation.
- ✅ All keys exactly as defined in the schema (case‑sensitive).
- ✅ No
null, empty strings, or empty arrays/objects unless explicitly allowed. - ✅
name.full_namepresent if any name data was extracted. - ✅ Every
email_addressesentry includesemail,type, andprimary(true/false). - ✅ Every
phone_numbersentry includesnumberandtype;extensiononly if present;primaryonly if explicitly marked. - ✅ Every
addressesentry includesfull_address; other components only if confidently parsed. - ✅
social_mediacontains only allowed keys; extra platforms go tosocial_media.other. - ✅
preferred_contact_methodonly if it matches an allowed enum value. - ✅ The object validates against the supplied JSON Schema.
Follow these rules meticulously to produce a correct, schema‑compliant JSON output. Good luck!