1 综述
-
Deep Learning in Neural Networks: An Overview
2014-04-30 paper
$\bullet \bullet$ overview -
Deep Learning
2015 Hinton PPT
$\bullet \bullet$ - DL -
Review of Deep Learning
2018-04-05 paper
$\bullet \bullet$ - review -
Democratisation of Usable Machine Learning in Computer Vision
2019-02-18 paper
$\bullet \bullet$ usable -
A Selective Overview of Deep Learning
2019-04-10 paper
$\bullet \bullet$ selective
文章回答了与经典方法相比,深度学习有哪些新特征,理论基础是什么?
从统计的角度介绍了常见的神经网络模型(例如卷积神经网络,递归神经网络,生成对抗网络)和训练技术(例如随机梯度下降,辍学,批量归一化);并针对深度和参数化进行了解释; - A Survey on Distributed Machine Learning
2019-12-20 paper
$\bullet \bullet$ Distributed
分布式; - The Deep Learning Compiler: A Comprehensive Survey
2020-02-06 paper
2 基础
2.1 卷积
2.1.1 标准卷积
2.1.2 反卷积
2.1.3 深度可分离卷积
2.1.4 维度卷积
2.1.5 形变卷积
- Deformable Convolutional Networks
ICCV 2017 oral 2017-03-17
2.1.6 其他
-
VarGNet: Variable Group Convolutional Neural Network for Efficient Embedded Computing
2019-07-12 paper
在深度可分离卷积之上做了改进,使其在嵌入式设备上更易优化; -
Random Shifting for CNN: a Solution to Reduce Information Loss in Down-Sampling Layers
IJCAI 2017 2017 paper
2.2 激活函数
2.3 神经网络
2.3.1 神经网络
-
Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks
2019-02-07 paper -
Loss Surface Modality of Feed-Forward Neural Network Architectures
2019-05-24 paper -
What Can ResNet Learn Efficiently, Going Beyond Kernels?
2019-05-24 paper
2.3.2 反向传播
-
Learning Internal Representations by Error Propagation
1985-09-05 Hinton paper -
Learning representations by back-propagating errors
1986 Hinton paper -
Memorized Sparse Backpropagation
2019-05-24 paper -
Fully Decoupled Neural Network Learning Using Delayed Gradients
2019-06-21 paper
延迟梯度下降:为了解决反向传播的串行执行方式;
3 实践
3.1 初始化
- How to start training: The effect of initialization and architecture
NIPS 2018 2018-03-05 paper
3.2 训练
3.2.1 分类
- A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification
2015-10-13 paper
$\bullet \bullet \bullet \bullet \bullet$
-
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
2017-06-08 paper
$\bullet \bullet$ - large batch
证明了大批次收敛困难,但是效果好;大批次的时候 SGD 和 moment 不好用; -
Accelerated Training for Massive Classification via Dynamic Class Selection
2018-01-05 paper
3.2.2 检测
-
Bag of Freebies for Training Object Detection Neural Networks
2019-02-11 paper
$\bullet \bullet$ - freebies
GluonCV 检测调参技巧; -
Training Object Detectors With Noisy Data
2019-05-17 paper
教师网络处理噪声图像;
3.2.3 其他
-
Large Scale Distributed Deep Networks
NIPS 2012 2012 paper
大规模分布式训练; -
Joint Training of Neural Network Ensembles
2019-02-12 paper | pytorch
模型集成及其联合训练问题; -
Sequential training algorithm for neural networks
2019-05-17 paper
$\bullet \bullet$
网络逐层单独训练,最终融合在一起;虽然效果不如整体训练好,但是对于算力有限情况下的大型网络训练很有帮助;
3.3 梯度爆炸
-
On the difficulty of training Recurrent Neural Networks
2012-11-21 paper -
Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?
NIPS 2018 2018-01-11 paper -
Products of Many Large Random Matrices and Gradients in Deep Neural Networks
2018-12-14 paper
RELU 梯度问题测量;
3.4 梯度消失
-
On the difficulty of training Recurrent Neural Networks
2012-11-21 paper -
Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?
NIPS 2018 2018-01-11 paper
3.5 过拟合
-
Dropout:A Simple Way to Prevent Neural Networks from Overfitting
2014 paper | [blog](https://zhuanlan.zhihu.com/p/38200980 -
One-Step or Two-Step Optimization and the Overfitting Phenomenon: A Case Study on Time Series Classification
2014-07-16 paper -
How much does your data exploration overfit? Controlling bias via information usage
2015-11-16 paper -
Detecting Overfitting of Deep Generative Networks via Latent Recovery
2019-01-09 paper -
Overfitting Mechanism and Avoidance in Deep Neural Networks
2019-01-19 paper -
The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent
2019-04-15 paper -
GradMask: Reduce Overfitting by Regularizing Saliency
2019-04-16 paper -
Overfitting in Synthesis: Theory and Practice
2019-05-17 paper
$\bullet \bullet$ -
The advantages of multiple classes for reducing overfitting from test set reuse
2019-05-24 paper -
Exploiting the Full Capacity of Deep Neural Networks while Avoiding Overfitting by Targeted Sparsity Regularization
2020-02-21 paper
稀疏正则避免过拟合,效果比 dropout 和 batchnorm 效果好;
3.6 欠拟合
3.7 模型检验
验证模型相关性,并检验过拟合和欠拟合;
- Perturbed Model Validation: A New Framework to Validate Model Relevance
2019-05-24 paper
扰动模型验证;
4 进阶
4.1 优化器
4.1.1 综述
-
An overview of gradient descent optimization algorithms
2016-09-15 paper | bolg
博客的 paper 版;主要绍了 Batch gradient descent,Stochastic gradient descent,Mini-batch gradient descent 等方法;简洁明了的介绍了 Momentum,Nesterov accelerated gradient,Adagrad,Adadelta,RMSprop,Adam,AdaMax,Nadam 等; -
A Survey of Optimization Methods from a Machine Learning Perspective
2019-06-17 paper
4.1.2 经典
-
Two problems with backpropagation and other steepest-descent learning procedures for networks
1986 paper
指出SGD搜索的效率十分低下的缺陷; -
Improving the convergence of back-propagation learning with second-order methods
1988 paper -
Acceleration of stochastic approximation by averaging
1992-07 paper
平均随机梯度下降; -
Analysis of Natural Gradient Descent for Multilayer Neural Networks
1999-01-21 paper -
On the momentum term in gradient descent learning algorithms
1999 paper
动量应用于SGD; -
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
2011-07 paper
Adagrade:对每个参数有单独的学习速率;但是学习速度会快速下降; -
ADADELTA: An Adaptive Learning Rate Method
2012-12-22 paper
AdadDelta -
Adam: A Method for Stochastic Optimization
ICLR 2014 2014-12-22 paper
Adam:改进 RMSProp;可以为每个参数单独指定自适应学习速率,除了存储像RMSprop之类的过去梯度的平方的指数加权平均值之外,Adam还计算过去梯度的指数加权平均值,类似于动量; -
Kalman-Based Stochastic Gradient Method with Stop Condition and Insensitivity to Conditioning
2015-12-03 paper
kSGD:基于卡尔曼的随机梯度下降,对超参不敏感,但是计算量太大; -
Small steps and giant leaps: Minimal Newton solvers for Deep Learning
2018-05-21 paper | openreview
直接替换当前使用的深度学习求解器的快速二阶方法;
4.1.2 其他
-
No More Pesky Learning Rates
2012-06-06 paper
自适应调整学习率; -
YellowFin and the Art of Momentum Tuning
2017-06-12 paper
SGD 收敛慢,但是准确度比较高; -
Aggregated Momentum: Stability Through Passive Damping
2018-04-01 paper -
Adaptive Gradient Methods With Dynamic Bound Of Learning Rate
ICLR 2019 2019-02-26 paper | pytorch-offical | openreview
4.1.3 SGD
-
Deep learning with Elastic Averaging SGD
NIPS 2015 2014-12-20 paper -
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
2017-06-08
证明了大批次下 SGD 不稳定; -
The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent
2019-04-15 paper
$\bullet \bullet$ - Gradient Confusion
SGD 对于模型收敛的影响,及相应网络结构的探索; -
Time-Smoothed Gradients for Online Forecasting
ICML 2019 2019-05-21 paper
PTS-SGD: 对学习率不敏感;计算快速; -
Fine-grained Optimization of Deep Neural Networks
2019-05-22 paper
FG-SGD:细粒度优化,确保收敛到最小值; -
Momentum-Based Variance Reduction in Non-Convex SGD
2019-05-24 paper
4.2 归一化
-
Self-Normalizing Neural Networks
2017-06-08 Paper | tensorflow | zhihu | reddit -
Fixup Initialization: Residual Learning Without Normalization
ICLR 2019 2019-01-27 paper | openreview
归一化的理论和有效性的探讨; -
ROI Regularization for Semi-supervised and Supervised Learning
2019-05-15 paper
4.3 模型可解释性
-
Explainable Machine Learning for Scientific Insights and Discoveries
2019-05-21 paper -
On the Learning Dynamics of Two-layer Nonlinear Convolutional Neural Networks
2019-05-24 paper
4.4 分布式
- A Quick Survey on Large Scale Distributed Deep Learning Systems
2018
深度学习分布式问题汇总;