「DL」深度学习基本技能资源汇总

1 综述

Deep Learning in Neural Networks: An Overview
2014-04-30 paper
$\bullet \bullet$ overview
Deep Learning
2015 Hinton PPT
$\bullet \bullet$ - DL
Review of Deep Learning
2018-04-05 paper
$\bullet \bullet$ - review
Democratisation of Usable Machine Learning in Computer Vision
2019-02-18 paper
$\bullet \bullet$ usable
A Selective Overview of Deep Learning
2019-04-10 paper
$\bullet \bullet$ selective
文章回答了与经典方法相比，深度学习有哪些新特征，理论基础是什么？
从统计的角度介绍了常见的神经网络模型（例如卷积神经网络，递归神经网络，生成对抗网络）和训练技术（例如随机梯度下降，辍学，批量归一化）；并针对深度和参数化进行了解释；
A Survey on Distributed Machine Learning
2019-12-20 paper
$\bullet \bullet$ Distributed
分布式；
The Deep Learning Compiler: A Comprehensive Survey
2020-02-06 paper

2 基础

2.1 卷积

2.1.1 标准卷积

A guide to convolution arithmetic for deep learning
2016-03-23 paper

2.1.2 反卷积

2.2 激活函数

Neurons Activation Visualization and Information Theoretic Analysis
2019-05-14 paper

2.3 神经网络

2.3.1 神经网络

Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks
2019-02-07 paper
Loss Surface Modality of Feed-Forward Neural Network Architectures
2019-05-24 paper
What Can ResNet Learn Efficiently, Going Beyond Kernels?
2019-05-24 paper

2.3.2 反向传播

Learning Internal Representations by Error Propagation
1985-09-05 Hinton paper
Learning representations by back-propagating errors
1986 Hinton paper
Memorized Sparse Backpropagation
2019-05-24 paper
Fully Decoupled Neural Network Learning Using Delayed Gradients
2019-06-21 paper
延迟梯度下降：为了解决反向传播的串行执行方式；

3 实践

3.1 初始化

How to start training: The effect of initialization and architecture
NIPS 2018 2018-03-05 paper

3.2 训练

3.2.1 分类

A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification
2015-10-13 paper
$\bullet \bullet \bullet \bullet \bullet$

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
2017-06-08 paper
$\bullet \bullet$ - large batch
证明了大批次收敛困难，但是效果好；大批次的时候 SGD 和 moment 不好用；
Accelerated Training for Massive Classification via Dynamic Class Selection
2018-01-05 paper

3.2.2 检测

Bag of Freebies for Training Object Detection Neural Networks
2019-02-11 paper
$\bullet \bullet$ - freebies
GluonCV 检测调参技巧；
Training Object Detectors With Noisy Data
2019-05-17 paper
教师网络处理噪声图像；

3.2.3 其他

Large Scale Distributed Deep Networks
NIPS 2012 2012 paper
大规模分布式训练；
Joint Training of Neural Network Ensembles
2019-02-12 paper | pytorch
模型集成及其联合训练问题；
Sequential training algorithm for neural networks
2019-05-17 paper
$\bullet \bullet$
网络逐层单独训练，最终融合在一起；虽然效果不如整体训练好，但是对于算力有限情况下的大型网络训练很有帮助；
深度学习参数怎么调优，这12个trick告诉你

3.3 梯度爆炸

On the difficulty of training Recurrent Neural Networks
2012-11-21 paper
Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?
NIPS 2018 2018-01-11 paper
Products of Many Large Random Matrices and Gradients in Deep Neural Networks
2018-12-14 paper
RELU 梯度问题测量；

3.4 梯度消失

On the difficulty of training Recurrent Neural Networks
2012-11-21 paper
Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?
NIPS 2018 2018-01-11 paper

3.5 过拟合

Dropout:A Simple Way to Prevent Neural Networks from Overfitting
2014 paper | [blog]（https://zhuanlan.zhihu.com/p/38200980
One-Step or Two-Step Optimization and the Overfitting Phenomenon: A Case Study on Time Series Classification
2014-07-16 paper
How much does your data exploration overfit? Controlling bias via information usage
2015-11-16 paper
Detecting Overfitting of Deep Generative Networks via Latent Recovery
2019-01-09 paper
Overfitting Mechanism and Avoidance in Deep Neural Networks
2019-01-19 paper
The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent
2019-04-15 paper
GradMask: Reduce Overfitting by Regularizing Saliency
2019-04-16 paper
Overfitting in Synthesis: Theory and Practice
2019-05-17 paper
$\bullet \bullet$
The advantages of multiple classes for reducing overfitting from test set reuse
2019-05-24 paper
Exploiting the Full Capacity of Deep Neural Networks while Avoiding Overfitting by Targeted Sparsity Regularization
2020-02-21 paper
稀疏正则避免过拟合，效果比 dropout 和 batchnorm 效果好；

3.6 欠拟合

3.7 模型检验

验证模型相关性，并检验过拟合和欠拟合；

Perturbed Model Validation: A New Framework to Validate Model Relevance
2019-05-24 paper
扰动模型验证；

4 进阶

4.1 优化器

4.1.1 综述

An overview of gradient descent optimization algorithms
2016-09-15 paper | bolg
博客的 paper 版；主要绍了 Batch gradient descent，Stochastic gradient descent，Mini-batch gradient descent 等方法；简洁明了的介绍了 Momentum，Nesterov accelerated gradient，Adagrad，Adadelta，RMSprop，Adam，AdaMax，Nadam 等；
A Survey of Optimization Methods from a Machine Learning Perspective
2019-06-17 paper

4.1.2 经典

Two problems with backpropagation and other steepest-descent learning procedures for networks
1986 paper
指出SGD搜索的效率十分低下的缺陷；
Improving the convergence of back-propagation learning with second-order methods
1988 paper
Acceleration of stochastic approximation by averaging
1992-07 paper
平均随机梯度下降；
Analysis of Natural Gradient Descent for Multilayer Neural Networks
1999-01-21 paper
On the momentum term in gradient descent learning algorithms
1999 paper
动量应用于SGD；
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
2011-07 paper
Adagrade：对每个参数有单独的学习速率；但是学习速度会快速下降；
ADADELTA: An Adaptive Learning Rate Method
2012-12-22 paper
AdadDelta
Adam: A Method for Stochastic Optimization
ICLR 2014 2014-12-22 paper
Adam：改进 RMSProp；可以为每个参数单独指定自适应学习速率，除了存储像RMSprop之类的过去梯度的平方的指数加权平均值之外，Adam还计算过去梯度的指数加权平均值，类似于动量；
Kalman-Based Stochastic Gradient Method with Stop Condition and Insensitivity to Conditioning
2015-12-03 paper
kSGD：基于卡尔曼的随机梯度下降，对超参不敏感，但是计算量太大；
Small steps and giant leaps: Minimal Newton solvers for Deep Learning
2018-05-21 paper | openreview
直接替换当前使用的深度学习求解器的快速二阶方法；

4.1.2 其他

No More Pesky Learning Rates
2012-06-06 paper
自适应调整学习率；
YellowFin and the Art of Momentum Tuning
2017-06-12 paper
SGD 收敛慢，但是准确度比较高；
Aggregated Momentum: Stability Through Passive Damping
2018-04-01 paper
Adaptive Gradient Methods With Dynamic Bound Of Learning Rate
ICLR 2019 2019-02-26 paper | pytorch-offical | openreview

4.1.3 SGD

Deep learning with Elastic Averaging SGD
NIPS 2015 2014-12-20 paper
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
2017-06-08
证明了大批次下 SGD 不稳定；
The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent
2019-04-15 paper
$\bullet \bullet$ - Gradient Confusion
SGD 对于模型收敛的影响，及相应网络结构的探索；
Time-Smoothed Gradients for Online Forecasting
ICML 2019 2019-05-21 paper
PTS-SGD: 对学习率不敏感；计算快速；
Fine-grained Optimization of Deep Neural Networks
2019-05-22 paper
FG-SGD:细粒度优化，确保收敛到最小值；
Momentum-Based Variance Reduction in Non-Convex SGD
2019-05-24 paper

4.2 归一化

Self-Normalizing Neural Networks
2017-06-08 Paper | tensorflow | zhihu | reddit
Fixup Initialization: Residual Learning Without Normalization
ICLR 2019 2019-01-27 paper | openreview
归一化的理论和有效性的探讨；
ROI Regularization for Semi-supervised and Supervised Learning
2019-05-15 paper

4.3 模型可解释性

Explainable Machine Learning for Scientific Insights and Discoveries
2019-05-21 paper
On the Learning Dynamics of Two-layer Nonlinear Convolutional Neural Networks
2019-05-24 paper

4.4 分布式

A Quick Survey on Large Scale Distributed Deep Learning Systems
2018
深度学习分布式问题汇总；

End

附录

A 参考资料

上篇「CV」交通信息检测资源汇总

下篇「DL」 AutoML 资源汇总

1 综述

2 基础

2.1 卷积

2.1.1 标准卷积

2.1.2 反卷积

2.1.3 深度可分离卷积

2.1.4 维度卷积

2.1.5 形变卷积

2.1.6 其他

2.2 激活函数

2.3 神经网络

2.3.1 神经网络

2.3.2 反向传播

3 实践

3.1 初始化

3.2 训练

3.2.1 分类

3.2.2 检测

3.2.3 其他

3.3 梯度爆炸

3.4 梯度消失

3.5 过拟合

3.6 欠拟合

3.7 模型检验

4 进阶

4.1 优化器

4.1.1 综述

4.1.2 经典

4.1.2 其他

4.1.3 SGD

4.2 归一化

4.3 模型可解释性

4.4 分布式

附录

A 参考资料