Skip to content

Instantly share code, notes, and snippets.

@SurendraTamang
Created May 13, 2021 04:02
Show Gist options
  • Save SurendraTamang/b2e78de44ecaa78e330a4a9d3ded14a7 to your computer and use it in GitHub Desktop.
Save SurendraTamang/b2e78de44ecaa78e330a4a9d3ded14a7 to your computer and use it in GitHub Desktop.
Joining the extracted value in scrapy
def _clear_and_join(lst, sep=' '):
'''Returns the list with clearing unwanted things
'''
lst = [re.sub('[\xa0\r\t®Â ]+', ' ', t, flags=re.DOTALL).strip() for t in lst]
lst = list(filter(None, [re.sub('\s*\n\s*', sep, t, flags=re.DOTALL).strip() for t in lst]))
text = sep.join(lst)
return text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment