Skip to content

Instantly share code, notes, and snippets.

@Filnor

Filnor/all-regexes.txt

Last active Mar 8, 2018
Embed
What would you like to do?
Regex to improve Beli's captures
From https://github.com/SOBotics/Belisarius/blob/433477414b13d516ed6de369d1ef1880be404f81/ini/BlackListedAnswerWords.txt converted to txt
'help me':
(?i)help\W?me[^\w]
'posted (working) solution', enhanced to also catch 'ed' -> 'posted'
(?i)(post|posted)\W?(a|)\W?(working|)\W?solution
'solution':
(?i)solution
'have another problem' in multiple forms:
(?i)(have|had|got)\W?(another|other)\W?(new|fresh|)\W?(problem|issue)
============
TESTS
============
RegEx: (?i)help\W?me[^\w]
Test Lines:
pls help me - match
please help me fix this - match
help method - no match
assume help was not - no match
Help me, I'm stuck - match
PlEaSE HeLP mE - macth
just a common text that's containing help or me - no match
waffle - no match
Test lines after replacing matches with a * (using the regex replace function of Notepad++):
pls *
please *fix this
help method
assume help was not
* I'm stuck
PlEaSE *
just a common text that's containing help or me
waffle
package org.sobotics.belisarius;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
try {
//Reading Regexes from test file
String string = readFile("teststrings.txt", Charset.forName("UTF-8")).toLowerCase();
ArrayList<Matcher> matcherList = new ArrayList<>();
//Adding Regexes to the matcher list
matcherList.add(Pattern.compile("(?i)(have|had|got)\\W?(?:an)?other\\W?(new|fresh|)\\W?(problem|issue)").matcher(string));
matcherList.add(Pattern.compile("(?i)help\\W?me\\W").matcher(string));
//Find and print the matcher
for(int i = 0; i < matcherList.size(); i++) {
System.out.println("\nMatcher " + (i + 1) + "\n");
while(matcherList.get(i).find()) {
System.out.println(matcherList.get(i).group());
}
}
} catch(IOException e) {
System.out.println(e.getStackTrace());
}
}
static String readFile(String path, Charset encoding) throws IOException
{
byte[] encoded = Files.readAllBytes(Paths.get(path));
return new String(encoded, encoding);
}
}
but now have another problem
but had another problem
got another issue
have another new problem
I have another issue
I have other issue
please help me
help me
help method
assume help was not
HeLP mE
waffle
@tripleee

This comment has been minimized.

Copy link

@tripleee tripleee commented Mar 8, 2018

\W?(thing|other|)\W? should probably be refactored to (\W?(?:thing|other))?\W?

Phrases like (another|other) and (post|posted) should perhaps be refactored to (?:an)?other and post(?:ed)?

Your regexes look like PCRE, not sure if Java supports the full spec (maybe change (?:thing) to (thing) if not?)

[^\w] is an obscure way to say \W

@Jinx88909

This comment has been minimized.

Copy link

@Jinx88909 Jinx88909 commented Mar 8, 2018

This is what we have so far to check the body. We need a regex to cover these and obviously any others we can think of:

"approval overridden"
"added solution"
"included solution"
"problem solved"
"problem now solved"
"problem fixed"
"problem now fixed"
"error solved"
"error now solved"
"error fixed"
"error now fixed"
"found my answer"
"my resolution"
"now resolved"
"resolved:"
"resolution:"
"my fix"
"my solution:"
"answer -"
"answer:"
"here is how you do it"
"i found a solution"
"finally it works"

@adeak

This comment has been minimized.

Copy link

@adeak adeak commented Mar 8, 2018


After _someone's_ ninja edit: `"(problem|error)?\s*(now\s+)?(re)?solved:?"` should catch (though untested):
 * `"problem solved"`
 * `"problem now solved"`
 * `"error solved"`
 * `"error now solved"`
 * `"now resolved"`
 * `"resolved:"`

But then again it also catches `"solved"` so we'll have to make this bit a bit more complicated to exclude that...
@Filnor

This comment has been minimized.

Copy link
Owner Author

@Filnor Filnor commented Mar 8, 2018

@tripleee Thanks for the correction, it looks like Java does support PCRE as we use it by now

@Jinx88909 Thanks for the strings, i'll add it to my regex file

@adeak I'll test that, thanks for the regex. It works fine, everything matches. Thank you!

I added a java file and the list of strings I'll use to test the regexes.

@Filnor

This comment has been minimized.

Copy link
Owner Author

@Filnor Filnor commented Mar 8, 2018

I've now created a Repository for this: https://github.com/pbdevch/BeliRegex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment