BigAN/return bigram python

## return bigram python

0
down vote
accepted
You could do this through positive lookahead,

>>> import re
>>> s = "My name is really nice. This is so awesome."
>>> m = re.findall(r'(?=(\b\w+\b \S+))', s)
>>> m
['My name', 'name is', 'is really', 'really nice.', 'This is', 'is so', 'so awesome.']
Pattern Explanation:

(?=...) Lookaheads are zero-length assertions just like the start and end of line, and start and end of word. It won't consume characters in the string, but only assert whether a match is possible or not.
() Capturing group which was used to capture characters which matches the pattern present inside the ().
\b Word boundary. It matches between a word character and a non-word character.
\w+ Matches one or more word characters.
\S+ Matches the space and the following non-space characters.
findall function usually prints the characters inside the captured groups. If there is no capturing groups then it would print the matches. In our case it would prints the characters which was present inside the group index 1. To match overlapping characters, you need to put the pattern inside a lookahead.

	0
	down vote
	accepted
	You could do this through positive lookahead,

	>>> import re
	>>> s = "My name is really nice. This is so awesome."
	>>> m = re.findall(r'(?=(\b\w+\b \S+))', s)
	>>> m
	['My name', 'name is', 'is really', 'really nice.', 'This is', 'is so', 'so awesome.']
	Pattern Explanation:

	(?=...) Lookaheads are zero-length assertions just like the start and end of line, and start and end of word. It won't consume characters in the string, but only assert whether a match is possible or not.
	() Capturing group which was used to capture characters which matches the pattern present inside the ().
	\b Word boundary. It matches between a word character and a non-word character.
	\w+ Matches one or more word characters.
	\S+ Matches the space and the following non-space characters.
	findall function usually prints the characters inside the captured groups. If there is no capturing groups then it would print the matches. In our case it would prints the characters which was present inside the group index 1. To match overlapping characters, you need to put the pattern inside a lookahead.