Last active
March 8, 2018 15:06
-
-
Save Filnor/d08dc362e05768bf31d4995628213dbe to your computer and use it in GitHub Desktop.
Regex to improve Beli's captures
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
From https://github.com/SOBotics/Belisarius/blob/433477414b13d516ed6de369d1ef1880be404f81/ini/BlackListedAnswerWords.txt converted to txt | |
'help me': | |
(?i)help\W?me[^\w] | |
'posted (working) solution', enhanced to also catch 'ed' -> 'posted' | |
(?i)(post|posted)\W?(a|)\W?(working|)\W?solution | |
'solution': | |
(?i)solution | |
'have another problem' in multiple forms: | |
(?i)(have|had|got)\W?(another|other)\W?(new|fresh|)\W?(problem|issue) | |
============ | |
TESTS | |
============ | |
RegEx: (?i)help\W?me[^\w] | |
Test Lines: | |
pls help me - match | |
please help me fix this - match | |
help method - no match | |
assume help was not - no match | |
Help me, I'm stuck - match | |
PlEaSE HeLP mE - macth | |
just a common text that's containing help or me - no match | |
waffle - no match | |
Test lines after replacing matches with a * (using the regex replace function of Notepad++): | |
pls * | |
please *fix this | |
help method | |
assume help was not | |
* I'm stuck | |
PlEaSE * | |
just a common text that's containing help or me | |
waffle |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package org.sobotics.belisarius; | |
import java.io.IOException; | |
import java.nio.charset.Charset; | |
import java.nio.file.Files; | |
import java.nio.file.Paths; | |
import java.util.*; | |
import java.util.regex.Matcher; | |
import java.util.regex.Pattern; | |
public class Main { | |
public static void main(String[] args) { | |
try { | |
//Reading Regexes from test file | |
String string = readFile("teststrings.txt", Charset.forName("UTF-8")).toLowerCase(); | |
ArrayList<Matcher> matcherList = new ArrayList<>(); | |
//Adding Regexes to the matcher list | |
matcherList.add(Pattern.compile("(?i)(have|had|got)\\W?(?:an)?other\\W?(new|fresh|)\\W?(problem|issue)").matcher(string)); | |
matcherList.add(Pattern.compile("(?i)help\\W?me\\W").matcher(string)); | |
//Find and print the matcher | |
for(int i = 0; i < matcherList.size(); i++) { | |
System.out.println("\nMatcher " + (i + 1) + "\n"); | |
while(matcherList.get(i).find()) { | |
System.out.println(matcherList.get(i).group()); | |
} | |
} | |
} catch(IOException e) { | |
System.out.println(e.getStackTrace()); | |
} | |
} | |
static String readFile(String path, Charset encoding) throws IOException | |
{ | |
byte[] encoded = Files.readAllBytes(Paths.get(path)); | |
return new String(encoded, encoding); | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
but now have another problem | |
but had another problem | |
got another issue | |
have another new problem | |
I have another issue | |
I have other issue | |
please help me | |
help me | |
help method | |
assume help was not | |
HeLP mE | |
waffle |
After _someone's_ ninja edit: `"(problem|error)?\s*(now\s+)?(re)?solved:?"` should catch (though untested):
* `"problem solved"`
* `"problem now solved"`
* `"error solved"`
* `"error now solved"`
* `"now resolved"`
* `"resolved:"`
But then again it also catches `"solved"` so we'll have to make this bit a bit more complicated to exclude that...
@tripleee Thanks for the correction, it looks like Java does support PCRE as we use it by now
@Jinx88909 Thanks for the strings, i'll add it to my regex file
@adeak I'll test that, thanks for the regex. It works fine, everything matches. Thank you!
I added a java file and the list of strings I'll use to test the regexes.
I've now created a Repository for this: https://github.com/pbdevch/BeliRegex
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is what we have so far to check the body. We need a regex to cover these and obviously any others we can think of:
"approval overridden"
"added solution"
"included solution"
"problem solved"
"problem now solved"
"problem fixed"
"problem now fixed"
"error solved"
"error now solved"
"error fixed"
"error now fixed"
"found my answer"
"my resolution"
"now resolved"
"resolved:"
"resolution:"
"my fix"
"my solution:"
"answer -"
"answer:"
"here is how you do it"
"i found a solution"
"finally it works"