finbarrtimbers/466.md

## 466.md

      
    Raw
  

              466.md
            
          
forbidden set is a huge cluster of dups ==> randomly select a subset of the forbidden ones


test it on never before seen data


test without Anahita's data


which of Anahita's data did we use? ==> did we use Anahita's word lists?


make sure that there's a clear demarcation between Anahita's


read decision tree and make sure that we're actually making legit decisions


make sure that it's not learning off identifiers


analyse features, explain which contribute most, can we identify where it didn't work ==> what's useful/useless?


Find table from Anahita's thesis, false positive, true positive, etc.


Same problems as Anahita?


Find marginal effects of stack trace evaluations, only word lists, etc.


Use a set of only stack traces to perform stack trace evaluations


Use this on eclipse?


try removing stack traces => easy or difficult predictors?


Look at crash repositories


"oblation studies"