total # positive classes <<< total # negative classes
example: identifying fradulent claims
There may not be many fradulent claims, so the classifier will tend to classify fraudulent claims as genuine.
- Model 1: classified 7/10 fraudulent transactions as genuine. 10/10,000 genuine transactions as fraudulent = 17 "mistakes"
- Model 2: classified 2/10 fraudulent transactions as genuine. 100/10,000 genuine transactions as fraudulent = 102 "mistakes"
Since we want to minimize
fraudulent transactions as genuine, model 2 actually performs better even though it made more "mistakes". Therefore, it is good to not base performance on mistakes, but on true positive (TP) rate, true negative (TN) rate, FP rate, FN rate.
Formula | Performance |
---|---|
TP Rate = TP / (TP + FP) | Close to 1 = good |
TN Rate = TN / (TN + FN) | Close to 1 = good |
FP Rate = FP / (FP + TN) | Close to 0 = good |
FN Rate = FN / (FN + TP) | Close to 0 = good |
-
Cost Function Based Approach - think one false negative as worse than one false positive. (weigh false negatives more)
- i.e. thinking a claim was genuine but it was actually a fraud would be weighted with a larger cost than one that thought a claim was a fraud but it was actually geunine is less bad and therefore has lower cost
-
Sampling Based Approach
- oversampling: adding more of the minority class - might have to deal with overfitting of minority class
- undersampling: removing more of the majority class - may risk moving more representative instances of majority class
- Reduces # of pixels in the image, i.e.
shrinking the image
. Then, when you want to make the image the same size as it was previously, you will need toincrease the pixel size
- Example: reduce a 512x512 image to 256x256 =
factor of 2 downsampling
in horizontal and vertical directions
- Increases the # of pixels in the image, i.e.
enlarging the image
. The added pixels are estimated from surrounding samples.
- Used for
recognizing objects
atvastly different scales
Scale-Invariant
because the object's scale change is offset byshifting its level in the pyramid
- Feature maps
close to the image layer
are composed oflow-level structures
not effective for accurate object detection
Feature Pyramid Network (FPN)
is composed of abottom-up
andtop-down
pathwaybottom-up
is useful forfeature extraction
(spatial resolution decreases
as you go up to the top layers of the pyramid and view a smaller version of the object, i.e. thesemantic value increases
)
FPN
uses atop-down pathway
to constructhigher resolution layers
from asemantic rich layer
- The
bottom-up pathway
usesResNet
- Because a CNN has shared weights, it is
not able to estimate the absolute position
in an image,anchor boxes
make it possible so the CNN only needs topredict the relative transformation for each anchor box
(anchor box is thebounding box
)
- RetinaNet can match the speed of
one-stage detectors
and surpass the accuracy of thetwo-stage detectors
. one-stage detectors
have typically had worse accuracy thantwo-stage detectors
- why? ->class imbalance problem
- RetinaNet addresses problem that
one-stage detectors
have withclass imbalance
between foreground and background of the image during training of dense detectors - how? ->reshaping the standard cross entropy loss
, i.e. itdown-weights
theloss assigned to well-classified examples
. (want to minimize loss, now well-classified examples don't help as much for the loss) - The
loss
will focus training on asparse set of hard examples
and prevent the large number of easy negatives from overwhelming the detector. This loss is calledFocal Loss
. - Uses a
dense sampling
of object locations in an input image and anin-network feature pyramid
andanchor boxes
- C_i is just a type of convolution, for example,
conv5 = 256 3x3 filters at stride 1, pad 1
- In the
top-down pathway
, apply a1x1 convolution filter
well-classified examples
:p_t > 0.5
Scaling factor
decays to0
asconfidence
in thecorrect
classincreases
(loss low atwell-classified examples
)
- gamma = 5, p_t = 0.1
bad classified
, then -(1-0.1)^5 * log(0.1) =1.36 loss
- gamma = 5, p_t = 0.9
well classified
, then -(1-0.9)^5 * log(0.9) =1.05E-6 loss ~ 0 loss
- RetinaNet outperforms Faster R-CNN, a two-stage detector
- SSD does
not select bottom layers of the pyramid
for object detection, since thesemantic value
is not high enough to justify its use as itsignificantly reduces speed
(SSD uses upper layers for detection - performsworse
onsmall objects
)
-
Must process a much larger set of candidate object locations regularly sampled across an image (background part of image still dominates even if using a sampling heuristic)
-
RetinaNet
-
YOLO
-
SSD
-
Stage 1:
Class imbalance
is addressed through the proposal stage (Selective Search, Edge Boxes, DeepMask, RPN) tonarrow down # of candidate object locations, filtering most background samples
-
Stage 2: sampling heuristics like a fixed foreground-to-background ratio are performed to maintain a balance between foreground and background
-
Faster R-CNN
-
Mask R-CNN