galderz/gist:3563d1b23b5d50f80d82 Secret

## gistfile1.txt
[15:05]  <galderz> dberindei, sannegrinovero, pruivo, we all need to work together on this
[15:05]  <galderz> it does not help at all that we allow random tests to fail and have them carry on running
[15:06]  <galderz> dberindei, sannegrinovero, pruivo, irrespective of the cause, a failing test in core could be the reason the other tests fail
[15:06]  * jholusa has quit (Quit: Leaving)
[15:06]  <dberindei> galderz: and a test that doesn't run in core could also be the reason why other tests fail...
[15:07]  * sjacobs (~sjacobs@99-46-236-219.lightspeed.stlsmo.sbcglobal.net) has joined #infinispan
[15:07]  * sjacobs has quit (Changing host)
[15:07]  * sjacobs (~sjacobs@redhat/jboss/sjacobs) has joined #infinispan
[15:07]  <dberindei> galderz: I don't see how hiding a failure fixes anything
[15:07]  <dberindei> galderz: I meant in the general case, in this case it does look like it prevents the server tests from running
[15:07]  <dberindei> galderz: but that's just a build issue
[15:08]  <pruivo> dberindei: I think that by default, if a test fails in maven it does not create/install the jar
[15:08]  <pruivo> dberindei: and other modules may use an outdated jar or not running at all
[15:09]  <dberindei> pruivo: ok, if we can't change that then I'm ok with using fail-fast
[15:09]  <pruivo> dberindei: is the current running in CI already with -Dmaven.test.failure.ignore?
[15:09]  <dberindei> pruivo: only for jdk6, jdk7 uses -fn
[15:10]  <dberindei> pruivo: I've changed that and I've triggered some new builds
[15:10]  <pruivo> dberindei: why not in jdk7?
[15:10]  <dberindei> pruivo: no idea
[15:10]  <pruivo> dberindei: ok... let's wait for jdk6
[15:11]  <galderz> dberindei, yes, but then it's the responsibility of the lead of the module to decide what to do
[15:11]  * jbossbot (~JBossBot@redhat/jbossbot) has joined #infinispan
[15:11]  <galderz> dberindei, it has disabled tests, it either removes them or fixes them
[15:11]  <dberindei> pruivo: I meant I've changed jdk7 to also use -Dmaven.test.failure.ignore and I've triggered a new build
[15:11]   dberindei ddoyle dmlloyd dmlloyd__ dobermai
[15:11]  <galderz> dberindei, that will solve the immediate problem, but not the underlying problem
[15:12]  <pruivo> galderz: dberindei: if -Dmaven.test.failure.ignore fixes the problems (i.e. ignores the failures and build the jar) and the other modules run normally, I prefer this solution than disabling random failures
[15:13]  <galderz> pruivo, dberindei, i don't prefer it
[15:14]  <galderz> pruivo, dberindei, we're just pushing a problem around
[15:15]  <dberindei> galderz: why not? my motivation is that I can look at a random failing test's history in CI and see how it fails... if the test was disabled 1 month ago, then all logs and stacktraces are going to be outdated
[15:16]  <galderz> dberindei, if that's what you are after, there needs to a be a separate build run that runs all these randomly failing tests only
[15:16]  <galderz> dberindei, it should not polute the master builds
[15:16]  <galderz> nor the PR builds
[15:16]  <pruivo> galderz: dberindei: can't we mode the random failures to a test group / profile disable by default?
[15:17]  <dberindei> galderz: ok, then we need something else than disabling tests, 'cause you can't even run disabled tests from idea
[15:17]  <galderz> pruivo, i like that better
[15:17]  <pruivo> galderz: dberindei: my concerns are we are going to forget about the tests
[15:17]  <galderz> pruivo, no we won't for a reason:
[15:17]  <galderz> pruivo, we now have module leads
[15:17]  <pruivo> galderz: dberindei: because we have a lot of them disabled...
[15:17]  <galderz> pruivo, sannegrinovero dberindei and I are module leads
[15:18]  <dberindei> galderz: I think I could get used to having a separate build with unstable tests
[15:18]  <galderz> pruivo, and it's our responsibility to track these
[15:18]  * GiovanniMeo_Away is now known as GiovanniMeo
[15:19]  <dberindei> pruivo: if we still run the unstable tests once a day we'll keep getting a reminder from temcity that we need to fix them
[15:19]  <galderz> pruivo, dberindei, sannegrinovero, but let me clear on something, everyone needs to play ball here
[15:19]  <dberindei> pruivo: so I think that would be a good middle ground
[15:19]  <sannegrinovero> +1 if a test under my responsibility has a failure it will either be resolved within the hour, or disabled.
[15:19]  <pruivo> dberindei: +1
[15:20]  <pruivo> dberindei: because often we have PR that fixes other ones... and if the test is disable we never know :)
[15:20]  <galderz> pruivo, dberindei, +1
[15:20]  <sannegrinovero> if it's a regression, your last commit will be REVERTED and I'm not sending a PR
[15:20]  <dberindei> sannegrinovero: with random failures, it's usually not the last commit :)
[15:21]  <sannegrinovero> dberindei: that's why it's important to kill all the random failures.
[15:21]  <sannegrinovero> dberindei: otherwise you'll see me reverting random stuff :-D
[15:21]  <pruivo> dberindei: sannegrinovero: you can have a PR that causes a random failures and you may never detect until some time after it is pushed
[15:21]  <pruivo> sannegrinovero: lol
[15:22]  <dberindei> sannegrinovero: http://ci.infinispan.org/viewLog.html?buildId=5909&tab=buildResultsDiv&buildTypeId=bt2
[15:22]  <sannegrinovero> pruivo: that's a good point and I think we can only fight that by making better tests.
[15:22]  <dberindei> sannegrinovero: go :)
[15:22]  <sannegrinovero> dberindei: don't tease me :) as I said, I won't work on this project until it's green.
[15:23]  <pruivo> :D
[15:23]  <pruivo> sannegrinovero: I didn't have a course: how to make better tests :(
[15:23]  * pil-zZzzzZZ is now known as pilhuhn
[15:24]  <pruivo> sannegrinovero: the only think I had is: test border cases
[15:24]  <sannegrinovero> pruivo: that's sad actually. A very imporant skill, but np it's something we need to develop ourselves and represents a large part of the value of such a project
[15:24]  <sannegrinovero> always keep in mind that our implementations are short lived and being constantly replaced with better ideas
[15:25]  <pruivo> sannegrinovero: galderz: dberindei: anyway.... move on, so are we going to move the random failures to a particular group? and run them once a day?
[15:25]  <sannegrinovero> but the test is the long-term contract of how the overall project is meant to work
[15:25]  <pruivo> sannegrinovero: yep
[15:25]  <galderz> pruivo, yes, i think that's a good middle ground to start with
[15:25]  <galderz> pruivo, can you take this on?
[15:26]  <pruivo> galderz: ok
[15:26]  <galderz> thanks pruivo
[15:26]  <pruivo> galderz: suggestion for group name? random-failure is good enough?
[15:27]  <galderz> pruivo, dberindei, sannegrinovero, i'll make a summary of our discussion and reply to the dev thread?
[15:27]  <galderz> pruivo, unstable ?
[15:27]  <pruivo> galderz: +1
[15:27]  <pruivo> galderz: +2
[15:27]  <dberindei> galderz: +1
[15:27]  <pruivo> galderz: I'm going to create a JIRA, ok?
[15:27]  <galderz> pruivo, sure, thanks
[15:27]  <sannegrinovero> galderz: +1 thanks a lot
[15:28]  <dberindei> pruivo galderz: we also need to make sure each random failure has a JIRA and the JIRA has a log file or at least a stack trace
[15:28]  <pruivo> dberindei: that is going to be difficult to get
[15:28]  <dberindei> pruivo: well, if it's a random failure then it means it must have failed at least once
[15:28]  <galderz> dberindei, didn't you process log files?
[15:28]  <galderz> dberindei, even in CI?
[15:28]  <dberindei> pruivo: CI always gives you at least a stack trace
[15:28]  <pruivo> dberindei: ya, but it will be an old stack
[15:29]  <dberindei> galderz: for the JDK6 build you also get filtered logs
[15:29]  <dberindei> galderz: not very useful for the servers, I'm afraid
[15:29]  <galderz> dberindei, why not for JDK7?
[15:29]   dberindei ddoyle dmlloyd dmlloyd__ dobermai
[15:29]  <galderz> dberindei, yeah, for servers i need to add something to have TRACE logs enabled when running tests
[15:30]  <dberindei> galderz: the idea was to make the JDK7 build shorter, since it's a dup of the JDK6 build
[15:30]  <galderz> dberindei, ah
[15:30]  <dberindei> galderz: so we disabled trace logs there
[15:30]  <jbossbot> new jira [ISPN-3964] Move unstable tests to different group [Open (Unresolved) Enhancement, Major, Pedro Ruivo] https://issues.jboss.org/browse/ISPN-3964
[15:30]  <galderz> dberindei, ok, makes sense
[15:30]  <galderz> dberindei, but, if i add trace logging by default to server, could you process them too?
[15:30]  <galderz> dberindei, actually, thinking about it
[15:30]  <pruivo> galderz: dberindei: sannegrinovero: ^^^ jira created
[15:31]  <galderz> dberindei, pruivo, sannegrinovero, we'd only need TRACE logs in the unstable group
[15:31]  <galderz> dberindei, pruivo, sannegrinovero, master and PR builds should not have TRACE
[15:31]  <pruivo> galderz: it makes sense
[15:31]  <sannegrinovero> galderz: +1 I always disable logging locally
[15:31]  <dberindei> galderz: good point, that will motivate us to move tests to the unstable group sooner
[15:31]  <galderz> pruivo, sannegrinovero, dberindei +1
	[15:05] <galderz> dberindei, sannegrinovero, pruivo, we all need to work together on this
	[15:05] <galderz> it does not help at all that we allow random tests to fail and have them carry on running
	[15:06] <galderz> dberindei, sannegrinovero, pruivo, irrespective of the cause, a failing test in core could be the reason the other tests fail
	[15:06] * jholusa has quit (Quit: Leaving)
	[15:06] <dberindei> galderz: and a test that doesn't run in core could also be the reason why other tests fail...
	[15:07] * sjacobs (~sjacobs@99-46-236-219.lightspeed.stlsmo.sbcglobal.net) has joined #infinispan
	[15:07] * sjacobs has quit (Changing host)
	[15:07] * sjacobs (~sjacobs@redhat/jboss/sjacobs) has joined #infinispan
	[15:07] <dberindei> galderz: I don't see how hiding a failure fixes anything
	[15:07] <dberindei> galderz: I meant in the general case, in this case it does look like it prevents the server tests from running
	[15:07] <dberindei> galderz: but that's just a build issue
	[15:08] <pruivo> dberindei: I think that by default, if a test fails in maven it does not create/install the jar
	[15:08] <pruivo> dberindei: and other modules may use an outdated jar or not running at all
	[15:09] <dberindei> pruivo: ok, if we can't change that then I'm ok with using fail-fast
	[15:09] <pruivo> dberindei: is the current running in CI already with -Dmaven.test.failure.ignore?
	[15:09] <dberindei> pruivo: only for jdk6, jdk7 uses -fn
	[15:10] <dberindei> pruivo: I've changed that and I've triggered some new builds
	[15:10] <pruivo> dberindei: why not in jdk7?
	[15:10] <dberindei> pruivo: no idea
	[15:10] <pruivo> dberindei: ok... let's wait for jdk6
	[15:11] <galderz> dberindei, yes, but then it's the responsibility of the lead of the module to decide what to do
	[15:11] * jbossbot (~JBossBot@redhat/jbossbot) has joined #infinispan
	[15:11] <galderz> dberindei, it has disabled tests, it either removes them or fixes them
	[15:11] <dberindei> pruivo: I meant I've changed jdk7 to also use -Dmaven.test.failure.ignore and I've triggered a new build
	[15:11] dberindei ddoyle dmlloyd dmlloyd__ dobermai
	[15:11] <galderz> dberindei, that will solve the immediate problem, but not the underlying problem
	[15:12] <pruivo> galderz: dberindei: if -Dmaven.test.failure.ignore fixes the problems (i.e. ignores the failures and build the jar) and the other modules run normally, I prefer this solution than disabling random failures
	[15:13] <galderz> pruivo, dberindei, i don't prefer it
	[15:14] <galderz> pruivo, dberindei, we're just pushing a problem around
	[15:15] <dberindei> galderz: why not? my motivation is that I can look at a random failing test's history in CI and see how it fails... if the test was disabled 1 month ago, then all logs and stacktraces are going to be outdated
	[15:16] <galderz> dberindei, if that's what you are after, there needs to a be a separate build run that runs all these randomly failing tests only
	[15:16] <galderz> dberindei, it should not polute the master builds
	[15:16] <galderz> nor the PR builds
	[15:16] <pruivo> galderz: dberindei: can't we mode the random failures to a test group / profile disable by default?
	[15:17] <dberindei> galderz: ok, then we need something else than disabling tests, 'cause you can't even run disabled tests from idea
	[15:17] <galderz> pruivo, i like that better
	[15:17] <pruivo> galderz: dberindei: my concerns are we are going to forget about the tests
	[15:17] <galderz> pruivo, no we won't for a reason:
	[15:17] <galderz> pruivo, we now have module leads
	[15:17] <pruivo> galderz: dberindei: because we have a lot of them disabled...
	[15:17] <galderz> pruivo, sannegrinovero dberindei and I are module leads
	[15:18] <dberindei> galderz: I think I could get used to having a separate build with unstable tests
	[15:18] <galderz> pruivo, and it's our responsibility to track these
	[15:18] * GiovanniMeo_Away is now known as GiovanniMeo
	[15:19] <dberindei> pruivo: if we still run the unstable tests once a day we'll keep getting a reminder from temcity that we need to fix them
	[15:19] <galderz> pruivo, dberindei, sannegrinovero, but let me clear on something, everyone needs to play ball here
	[15:19] <dberindei> pruivo: so I think that would be a good middle ground
	[15:19] <sannegrinovero> +1 if a test under my responsibility has a failure it will either be resolved within the hour, or disabled.
	[15:19] <pruivo> dberindei: +1
	[15:20] <pruivo> dberindei: because often we have PR that fixes other ones... and if the test is disable we never know :)
	[15:20] <galderz> pruivo, dberindei, +1
	[15:20] <sannegrinovero> if it's a regression, your last commit will be REVERTED and I'm not sending a PR
	[15:20] <dberindei> sannegrinovero: with random failures, it's usually not the last commit :)
	[15:21] <sannegrinovero> dberindei: that's why it's important to kill all the random failures.
	[15:21] <sannegrinovero> dberindei: otherwise you'll see me reverting random stuff :-D
	[15:21] <pruivo> dberindei: sannegrinovero: you can have a PR that causes a random failures and you may never detect until some time after it is pushed
	[15:21] <pruivo> sannegrinovero: lol
	[15:22] <dberindei> sannegrinovero: http://ci.infinispan.org/viewLog.html?buildId=5909&tab=buildResultsDiv&buildTypeId=bt2
	[15:22] <sannegrinovero> pruivo: that's a good point and I think we can only fight that by making better tests.
	[15:22] <dberindei> sannegrinovero: go :)
	[15:22] <sannegrinovero> dberindei: don't tease me :) as I said, I won't work on this project until it's green.
	[15:23] <pruivo> :D
	[15:23] <pruivo> sannegrinovero: I didn't have a course: how to make better tests :(
	[15:23] * pil-zZzzzZZ is now known as pilhuhn
	[15:24] <pruivo> sannegrinovero: the only think I had is: test border cases
	[15:24] <sannegrinovero> pruivo: that's sad actually. A very imporant skill, but np it's something we need to develop ourselves and represents a large part of the value of such a project
	[15:24] <sannegrinovero> always keep in mind that our implementations are short lived and being constantly replaced with better ideas
	[15:25] <pruivo> sannegrinovero: galderz: dberindei: anyway.... move on, so are we going to move the random failures to a particular group? and run them once a day?
	[15:25] <sannegrinovero> but the test is the long-term contract of how the overall project is meant to work
	[15:25] <pruivo> sannegrinovero: yep
	[15:25] <galderz> pruivo, yes, i think that's a good middle ground to start with
	[15:25] <galderz> pruivo, can you take this on?
	[15:26] <pruivo> galderz: ok
	[15:26] <galderz> thanks pruivo
	[15:26] <pruivo> galderz: suggestion for group name? random-failure is good enough?
	[15:27] <galderz> pruivo, dberindei, sannegrinovero, i'll make a summary of our discussion and reply to the dev thread?
	[15:27] <galderz> pruivo, unstable ?
	[15:27] <pruivo> galderz: +1
	[15:27] <pruivo> galderz: +2
	[15:27] <dberindei> galderz: +1
	[15:27] <pruivo> galderz: I'm going to create a JIRA, ok?
	[15:27] <galderz> pruivo, sure, thanks
	[15:27] <sannegrinovero> galderz: +1 thanks a lot
	[15:28] <dberindei> pruivo galderz: we also need to make sure each random failure has a JIRA and the JIRA has a log file or at least a stack trace
	[15:28] <pruivo> dberindei: that is going to be difficult to get
	[15:28] <dberindei> pruivo: well, if it's a random failure then it means it must have failed at least once
	[15:28] <galderz> dberindei, didn't you process log files?
	[15:28] <galderz> dberindei, even in CI?
	[15:28] <dberindei> pruivo: CI always gives you at least a stack trace
	[15:28] <pruivo> dberindei: ya, but it will be an old stack
	[15:29] <dberindei> galderz: for the JDK6 build you also get filtered logs
	[15:29] <dberindei> galderz: not very useful for the servers, I'm afraid
	[15:29] <galderz> dberindei, why not for JDK7?
	[15:29] dberindei ddoyle dmlloyd dmlloyd__ dobermai
	[15:29] <galderz> dberindei, yeah, for servers i need to add something to have TRACE logs enabled when running tests
	[15:30] <dberindei> galderz: the idea was to make the JDK7 build shorter, since it's a dup of the JDK6 build
	[15:30] <galderz> dberindei, ah
	[15:30] <dberindei> galderz: so we disabled trace logs there
	[15:30] <jbossbot> new jira [ISPN-3964] Move unstable tests to different group [Open (Unresolved) Enhancement, Major, Pedro Ruivo] https://issues.jboss.org/browse/ISPN-3964
	[15:30] <galderz> dberindei, ok, makes sense
	[15:30] <galderz> dberindei, but, if i add trace logging by default to server, could you process them too?
	[15:30] <galderz> dberindei, actually, thinking about it
	[15:30] <pruivo> galderz: dberindei: sannegrinovero: ^^^ jira created
	[15:31] <galderz> dberindei, pruivo, sannegrinovero, we'd only need TRACE logs in the unstable group
	[15:31] <galderz> dberindei, pruivo, sannegrinovero, master and PR builds should not have TRACE
	[15:31] <pruivo> galderz: it makes sense
	[15:31] <sannegrinovero> galderz: +1 I always disable logging locally
	[15:31] <dberindei> galderz: good point, that will motivate us to move tests to the unstable group sooner
	[15:31] <galderz> pruivo, sannegrinovero, dberindei +1