罗马共和国前三巨头:庞培,克拉苏,Caesar
后三巨头:安东尼,屋大维,雷必达
罗马共和国前三巨头:庞培,克拉苏,Caesar
后三巨头:安东尼,屋大维,雷必达
Typically, we’ll set the pd.get_dummies() argument drop_first to True to avoid the so-called “dummy variable trap,” in which independent variables being closely correlated violates assumptions of independence in regression.
Grubbs’ outlier test
Grubbs’ test is an algorithm that finds a single outlier in a normally distributed dataset by considering the current minimum or maximum value in the series. The algorithm is applied iteratively, removing the previously detected outlier between each iteration. Although we do not go into the details here, a common way to use Grubbs’ outlier test to detect anomalies is to calculate the Grubbs’ test statistic and Grubbs’ critical value, and mark the point as an outlier if the test statistic is greater than the critical value. This approach is only suitable for normal distributions, and can be inefficient because it only detects one anomaly in each iteration.
Local outlier factor
The LOF is an anomaly score that you can generate using the scikit-learn class sklearn.neighbors.LocalOutlierFactor. Similar to the aforementioned k-NN and k-means anomaly detection methods, LOF classifies anomalies using local density around a sample. The local density of a data point refers to the concentration of other points in the immediate surrounding region, where the size of this region can be defined either by a fixed distance threshold or by the closest n neighboring points. LOF measures the isolation of a single data point with respect to its closest n neighbors. Data points with a significantly lower local density than that of their closest n neighbors are considered to be anomalies.
Precision-Recall
Precision-Recall is a useful measure of success of prediction when the classes are very imbalanced. In information retrieval, precision is a measure of result relevancy, while recall is a measure of how many truly relevant results are returned.
The precision-recall curve shows the tradeoff between precision and recall for different threshold. A high area under the curve represents both high recall and high precision, where high precision relates to a low false positive rate, and high recall relates to a low false negative rate. High scores for both show that the classifier is returning accurate results (high precision), as well as returning a majority of all positive results (high recall).
A system with high recall but low precision returns many results, but most of its predicted labels are incorrect when compared to the training labels. A system with high precision but low recall is just the opposite, returning very few results, but most of its predicted labels are correct when compared t
This code sample demonstrates how to compute a Laguerre-Voronoi diagram (also known as power diagram) in 2d.
Power diagrams have a wonderful property : they decompose the union of (overlapping) circles into clipped circles that don't overlap. The cells have
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | |
% Matlab code to produce PCA animations shown here: | |
% http://stats.stackexchange.com/questions/2691 | |
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | |
% Static image | |
clear all | |
rng(42) |
#include <iostream> | |
#include <lemon/smart_graph.h> | |
#include <lemon/network_simplex.h> | |
using namespace lemon; | |
using namespace std; | |