Use re
(regular expression) to split text
re.split('\W+|_',x)
We use |
to seperate symbols that we want to use as spliter. Here \W+
means words. An example is following,
>>>text = 'I:\\Textual Analysis Data\\19950131_10-K_edgar_data_69970_0000950152-95-000069_1.txt'
>>>re.split('\W+',text)
['I',
'Textual',
'Analysis',
'Data',
'19950131_10',
'K_edgar_data_69970_0000950152',
'95',
'000069_1',
'txt']