Skip to content

Instantly share code, notes, and snippets.

@Filnor
Last active March 8, 2018 15:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Filnor/d08dc362e05768bf31d4995628213dbe to your computer and use it in GitHub Desktop.
Save Filnor/d08dc362e05768bf31d4995628213dbe to your computer and use it in GitHub Desktop.
Regex to improve Beli's captures
From https://github.com/SOBotics/Belisarius/blob/433477414b13d516ed6de369d1ef1880be404f81/ini/BlackListedAnswerWords.txt converted to txt
'help me':
(?i)help\W?me[^\w]
'posted (working) solution', enhanced to also catch 'ed' -> 'posted'
(?i)(post|posted)\W?(a|)\W?(working|)\W?solution
'solution':
(?i)solution
'have another problem' in multiple forms:
(?i)(have|had|got)\W?(another|other)\W?(new|fresh|)\W?(problem|issue)
============
TESTS
============
RegEx: (?i)help\W?me[^\w]
Test Lines:
pls help me - match
please help me fix this - match
help method - no match
assume help was not - no match
Help me, I'm stuck - match
PlEaSE HeLP mE - macth
just a common text that's containing help or me - no match
waffle - no match
Test lines after replacing matches with a * (using the regex replace function of Notepad++):
pls *
please *fix this
help method
assume help was not
* I'm stuck
PlEaSE *
just a common text that's containing help or me
waffle
package org.sobotics.belisarius;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
try {
//Reading Regexes from test file
String string = readFile("teststrings.txt", Charset.forName("UTF-8")).toLowerCase();
ArrayList<Matcher> matcherList = new ArrayList<>();
//Adding Regexes to the matcher list
matcherList.add(Pattern.compile("(?i)(have|had|got)\\W?(?:an)?other\\W?(new|fresh|)\\W?(problem|issue)").matcher(string));
matcherList.add(Pattern.compile("(?i)help\\W?me\\W").matcher(string));
//Find and print the matcher
for(int i = 0; i < matcherList.size(); i++) {
System.out.println("\nMatcher " + (i + 1) + "\n");
while(matcherList.get(i).find()) {
System.out.println(matcherList.get(i).group());
}
}
} catch(IOException e) {
System.out.println(e.getStackTrace());
}
}
static String readFile(String path, Charset encoding) throws IOException
{
byte[] encoded = Files.readAllBytes(Paths.get(path));
return new String(encoded, encoding);
}
}
but now have another problem
but had another problem
got another issue
have another new problem
I have another issue
I have other issue
please help me
help me
help method
assume help was not
HeLP mE
waffle
@tripleee
Copy link

tripleee commented Mar 8, 2018

\W?(thing|other|)\W? should probably be refactored to (\W?(?:thing|other))?\W?

Phrases like (another|other) and (post|posted) should perhaps be refactored to (?:an)?other and post(?:ed)?

Your regexes look like PCRE, not sure if Java supports the full spec (maybe change (?:thing) to (thing) if not?)

[^\w] is an obscure way to say \W

@graemeberry
Copy link

graemeberry commented Mar 8, 2018

This is what we have so far to check the body. We need a regex to cover these and obviously any others we can think of:

"approval overridden"
"added solution"
"included solution"
"problem solved"
"problem now solved"
"problem fixed"
"problem now fixed"
"error solved"
"error now solved"
"error fixed"
"error now fixed"
"found my answer"
"my resolution"
"now resolved"
"resolved:"
"resolution:"
"my fix"
"my solution:"
"answer -"
"answer:"
"here is how you do it"
"i found a solution"
"finally it works"

@adeak
Copy link

adeak commented Mar 8, 2018


After _someone's_ ninja edit: `"(problem|error)?\s*(now\s+)?(re)?solved:?"` should catch (though untested):
 * `"problem solved"`
 * `"problem now solved"`
 * `"error solved"`
 * `"error now solved"`
 * `"now resolved"`
 * `"resolved:"`

But then again it also catches `"solved"` so we'll have to make this bit a bit more complicated to exclude that...

@Filnor
Copy link
Author

Filnor commented Mar 8, 2018

@tripleee Thanks for the correction, it looks like Java does support PCRE as we use it by now

@Jinx88909 Thanks for the strings, i'll add it to my regex file

@adeak I'll test that, thanks for the regex. It works fine, everything matches. Thank you!

I added a java file and the list of strings I'll use to test the regexes.

@Filnor
Copy link
Author

Filnor commented Mar 8, 2018

I've now created a Repository for this: https://github.com/pbdevch/BeliRegex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment