Skip to content

Instantly share code, notes, and snippets.

@kapilgarg
Created June 17, 2024 07:31
Show Gist options
  • Save kapilgarg/8f30d14851bb391b62ba23d3460889de to your computer and use it in GitHub Desktop.
Save kapilgarg/8f30d14851bb391b62ba23d3460889de to your computer and use it in GitHub Desktop.
spacy_ner_client
import spacy
_sanitizer = spacy.load('movie-name-sanitizer')
def sanitize(name:str) -> dict :
"""
Sanitize the torrent name of a movie returns a dictionary containing details extracted from name.
ex:
name : [TorrentCounter.to].Tumbbad.2018.Hindi.1080p.WEB-DL.x264.[1.5GB].[MP4]
returns :
{
'movie': 'tumbbad',
'year': '2018',
'resolution': '1080p',
'quality': 'web-dl',
}
"""
name = name.replace("."," ").lower()
doc = _sanitizer(name)
data = {}
for ent in doc.ents:
data[ent.label_] = ent.text
return data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment