-
-
Save Filnor/d08dc362e05768bf31d4995628213dbe to your computer and use it in GitHub Desktop.
From https://github.com/SOBotics/Belisarius/blob/433477414b13d516ed6de369d1ef1880be404f81/ini/BlackListedAnswerWords.txt converted to txt | |
'help me': | |
(?i)help\W?me[^\w] | |
'posted (working) solution', enhanced to also catch 'ed' -> 'posted' | |
(?i)(post|posted)\W?(a|)\W?(working|)\W?solution | |
'solution': | |
(?i)solution | |
'have another problem' in multiple forms: | |
(?i)(have|had|got)\W?(another|other)\W?(new|fresh|)\W?(problem|issue) | |
============ | |
TESTS | |
============ | |
RegEx: (?i)help\W?me[^\w] | |
Test Lines: | |
pls help me - match | |
please help me fix this - match | |
help method - no match | |
assume help was not - no match | |
Help me, I'm stuck - match | |
PlEaSE HeLP mE - macth | |
just a common text that's containing help or me - no match | |
waffle - no match | |
Test lines after replacing matches with a * (using the regex replace function of Notepad++): | |
pls * | |
please *fix this | |
help method | |
assume help was not | |
* I'm stuck | |
PlEaSE * | |
just a common text that's containing help or me | |
waffle |
package org.sobotics.belisarius; | |
import java.io.IOException; | |
import java.nio.charset.Charset; | |
import java.nio.file.Files; | |
import java.nio.file.Paths; | |
import java.util.*; | |
import java.util.regex.Matcher; | |
import java.util.regex.Pattern; | |
public class Main { | |
public static void main(String[] args) { | |
try { | |
//Reading Regexes from test file | |
String string = readFile("teststrings.txt", Charset.forName("UTF-8")).toLowerCase(); | |
ArrayList<Matcher> matcherList = new ArrayList<>(); | |
//Adding Regexes to the matcher list | |
matcherList.add(Pattern.compile("(?i)(have|had|got)\\W?(?:an)?other\\W?(new|fresh|)\\W?(problem|issue)").matcher(string)); | |
matcherList.add(Pattern.compile("(?i)help\\W?me\\W").matcher(string)); | |
//Find and print the matcher | |
for(int i = 0; i < matcherList.size(); i++) { | |
System.out.println("\nMatcher " + (i + 1) + "\n"); | |
while(matcherList.get(i).find()) { | |
System.out.println(matcherList.get(i).group()); | |
} | |
} | |
} catch(IOException e) { | |
System.out.println(e.getStackTrace()); | |
} | |
} | |
static String readFile(String path, Charset encoding) throws IOException | |
{ | |
byte[] encoded = Files.readAllBytes(Paths.get(path)); | |
return new String(encoded, encoding); | |
} | |
} |
but now have another problem | |
but had another problem | |
got another issue | |
have another new problem | |
I have another issue | |
I have other issue | |
please help me | |
help me | |
help method | |
assume help was not | |
HeLP mE | |
waffle |
This is what we have so far to check the body. We need a regex to cover these and obviously any others we can think of:
"approval overridden"
"added solution"
"included solution"
"problem solved"
"problem now solved"
"problem fixed"
"problem now fixed"
"error solved"
"error now solved"
"error fixed"
"error now fixed"
"found my answer"
"my resolution"
"now resolved"
"resolved:"
"resolution:"
"my fix"
"my solution:"
"answer -"
"answer:"
"here is how you do it"
"i found a solution"
"finally it works"
After _someone's_ ninja edit: `"(problem|error)?\s*(now\s+)?(re)?solved:?"` should catch (though untested):
* `"problem solved"`
* `"problem now solved"`
* `"error solved"`
* `"error now solved"`
* `"now resolved"`
* `"resolved:"`
But then again it also catches `"solved"` so we'll have to make this bit a bit more complicated to exclude that...
@tripleee Thanks for the correction, it looks like Java does support PCRE as we use it by now
@Jinx88909 Thanks for the strings, i'll add it to my regex file
@adeak I'll test that, thanks for the regex. It works fine, everything matches. Thank you!
I added a java file and the list of strings I'll use to test the regexes.
I've now created a Repository for this: https://github.com/pbdevch/BeliRegex
\W?(thing|other|)\W?
should probably be refactored to(\W?(?:thing|other))?\W?
Phrases like
(another|other)
and(post|posted)
should perhaps be refactored to(?:an)?other
andpost(?:ed)?
Your regexes look like PCRE, not sure if Java supports the full spec (maybe change
(?:thing)
to(thing)
if not?)[^\w]
is an obscure way to say\W