Skip to content

Instantly share code, notes, and snippets.

* https://math.stackexchange.com/questions/2349026/why-is-the-approximation-of-hessian-jtj-reasonable
![](https://cdn.cacher.io/attachments/u/3c58dri65mn3m/5jUYZjnV5X3sTkQ1LbWmwAlvneSFe5mJ/r8x6oeg8x.png)
* https://tianle.mit.edu/sites/default/files/documents/Tianle_GGN_PKU.pdf
![](https://cdn.cacher.io/attachments/u/3c58dri65mn3m/fX6Ga3THjqj0Fd3Q9Q3DxyOzLtEmwy1-/wic5da93x.png)
from pathlib import Path
import importlib, warnings
import os, sys, time, numpy as np
import torch, random, PIL, copy
from os import path as osp
from shutil import copyfile
if sys.version_info.major == 2: # Python 2.x
from StringIO import StringIO as BIO
else: # Python 3.x
from io import BytesIO as BIO
* https://en.wikipedia.org/wiki/Doubly_stochastic_matrix
* ![](https://cdn.cacher.io/attachments/u/3c58dri65mn3m/MuO7MOT6RSe9l70Rvlu6ydS2KS_uy6Tj/mx1cdeso2.png)
* https://www.zhihu.com/question/272091285/answer/365470821
* https://zhuanlan.zhihu.com/p/45848975
* ![](https://cdn.cacher.io/attachments/u/3c58dri65mn3m/msIh0F8yB8H48oFxV3DVaW7ZdV83jaV3/o1f3un7u2.png)
![](https://cdn.cacher.io/attachments/u/3c58dri65mn3m/dRYU-DSBVYK_0Dua1y9QHc9c41zopUYa/099w3hfx0.png)
https://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/viewFile/8275/8787
![](https://cdn.cacher.io/attachments/u/3c58dri65mn3m/82HbKvT57FaxV9vLzkKA9hyMAV-4SlGM/0owt52mst.png)
https://www.zhihu.com/question/24623031
* 函数图象每一点上各个主曲率的大小
* Hessian矩阵的特征值就是形容其在该点附近特征向量方向的凹凸性
- 特征值越大,凸性越强
- 把函数想想成一个小山坡,陡的那面是特征值大的方向,平缓的是特征值小的方向
- 凸性和优化方法的收敛速度有关: 如果正定Hessian矩阵的特征值都差不多,那么梯度下降的收敛速度越快,反之如果其特征值相差很大,那么收敛速度越慢。
- ![](https://cdn.cacher.io/attachments/u/3c58dri65mn3m/LrbrbToDKmlq3H9y4j2zpeFGNLMjxIbg/eypirlsz3.png)
---
def pairwise_distance(data1, data2, device=torch.device('cuda')):
# transfer to device
data1, data2 = data1.to(device), data2.to(device)
# N*1*M
A = data1.unsqueeze(dim=1)
# 1*N*M
B = data2.unsqueeze(dim=0)
* https://arxiv.org/pdf/2005.00570.pdf
* ![](https://cdn.cacher.io/attachments/u/3c58dri65mn3m/oYdGqI_qEWjHo-Gxc515-3Zwzsj-T_6h/lbsbitn79.png)
* Since the softmax applies a transformation in log-space, a geometric mean respects the relationship. We notice slightly improved ensemble
accuracy when compared to an arithmetic mean.
???存疑
* Information Geometry of Orthogonal Initializations and Training
![](https://cdn.cacher.io/attachments/u/3c58dri65mn3m/Hq92OS5TcAAj-9qLGZVW-8srxy2qdfSH/9jqc3yb9z.png)
通信信道的香农极限(Shannon limit)或香农容量(Shannon capacity)是针对特定噪声水平的信道的理论最大信息传输速率。著名的香农定理用公式给出: C=Blog2(1+S/N)。其中C是可得到的链路速度(信道容量),B是链路的带宽,S是平均信号功率,N是平均噪声功率,信噪比(S/N)通常用分贝(dB)表示,分贝数=10lg(S/N)。 香农限就是其极限值。。 [1]
https://www.zhihu.com/question/277417439/answer/434431697
* https://spacevim.org/documentation/#custom-configuration
* https://github.com/Jackiexiao/10-minutes-to-SpaceVim/blob/master/README.md
```bash
# allow mouse select & copy-paste
# https://github.com/SpaceVim/SpaceVim/issues/695
mkdir -p ~/.SpaceVim.d/autoload
cat <<EOF >>~/.SpaceVim.d/autoload/custom_init.vim
function! custom_init#before() abort
set mouse=r