Last active
March 8, 2018 15:06
-
-
Save Filnor/d08dc362e05768bf31d4995628213dbe to your computer and use it in GitHub Desktop.
Regex to improve Beli's captures
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
From https://github.com/SOBotics/Belisarius/blob/433477414b13d516ed6de369d1ef1880be404f81/ini/BlackListedAnswerWords.txt converted to txt | |
'help me': | |
(?i)help\W?me[^\w] | |
'posted (working) solution', enhanced to also catch 'ed' -> 'posted' | |
(?i)(post|posted)\W?(a|)\W?(working|)\W?solution | |
'solution': | |
(?i)solution | |
'have another problem' in multiple forms: | |
(?i)(have|had|got)\W?(another|other)\W?(new|fresh|)\W?(problem|issue) | |
============ | |
TESTS | |
============ | |
RegEx: (?i)help\W?me[^\w] | |
Test Lines: | |
pls help me - match | |
please help me fix this - match | |
help method - no match | |
assume help was not - no match | |
Help me, I'm stuck - match | |
PlEaSE HeLP mE - macth | |
just a common text that's containing help or me - no match | |
waffle - no match | |
Test lines after replacing matches with a * (using the regex replace function of Notepad++): | |
pls * | |
please *fix this | |
help method | |
assume help was not | |
* I'm stuck | |
PlEaSE * | |
just a common text that's containing help or me | |
waffle |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package org.sobotics.belisarius; | |
import java.io.IOException; | |
import java.nio.charset.Charset; | |
import java.nio.file.Files; | |
import java.nio.file.Paths; | |
import java.util.*; | |
import java.util.regex.Matcher; | |
import java.util.regex.Pattern; | |
public class Main { | |
public static void main(String[] args) { | |
try { | |
//Reading Regexes from test file | |
String string = readFile("teststrings.txt", Charset.forName("UTF-8")).toLowerCase(); | |
ArrayList<Matcher> matcherList = new ArrayList<>(); | |
//Adding Regexes to the matcher list | |
matcherList.add(Pattern.compile("(?i)(have|had|got)\\W?(?:an)?other\\W?(new|fresh|)\\W?(problem|issue)").matcher(string)); | |
matcherList.add(Pattern.compile("(?i)help\\W?me\\W").matcher(string)); | |
//Find and print the matcher | |
for(int i = 0; i < matcherList.size(); i++) { | |
System.out.println("\nMatcher " + (i + 1) + "\n"); | |
while(matcherList.get(i).find()) { | |
System.out.println(matcherList.get(i).group()); | |
} | |
} | |
} catch(IOException e) { | |
System.out.println(e.getStackTrace()); | |
} | |
} | |
static String readFile(String path, Charset encoding) throws IOException | |
{ | |
byte[] encoded = Files.readAllBytes(Paths.get(path)); | |
return new String(encoded, encoding); | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
but now have another problem | |
but had another problem | |
got another issue | |
have another new problem | |
I have another issue | |
I have other issue | |
please help me | |
help me | |
help method | |
assume help was not | |
HeLP mE | |
waffle |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@tripleee Thanks for the correction, it looks like Java does support PCRE as we use it by now
@Jinx88909 Thanks for the strings, i'll add it to my regex file
@adeak
I'll test that, thanks for the regex.It works fine, everything matches. Thank you!I added a java file and the list of strings I'll use to test the regexes.