「DL」 深度学习基本技能资源汇总

 

1 综述

  1. Deep Learning in Neural Networks: An Overview
    2014-04-30 paper
    $\bullet \bullet$ overview

  2. Deep Learning
    2015 Hinton PPT
    $\bullet \bullet$ - DL

  3. Review of Deep Learning
    2018-04-05 paper
    $\bullet \bullet$ - review

  4. Democratisation of Usable Machine Learning in Computer Vision
    2019-02-18 paper
    $\bullet \bullet$ usable

  5. A Selective Overview of Deep Learning
    2019-04-10 paper
    $\bullet \bullet$ selective
    文章回答了与经典方法相比,深度学习有哪些新特征,理论基础是什么?
    从统计的角度介绍了常见的神经网络模型(例如卷积神经网络,递归神经网络,生成对抗网络)和训练技术(例如随机梯度下降,辍学,批量归一化);并针对深度和参数化进行了解释;

  6. A Survey on Distributed Machine Learning
    2019-12-20 paper
    $\bullet \bullet$ Distributed
    分布式;
  7. The Deep Learning Compiler: A Comprehensive Survey
    2020-02-06 paper

2 基础

2.1 卷积

2.1.1 标准卷积

  1. A guide to convolution arithmetic for deep learning
    2016-03-23 paper

2.1.2 反卷积

2.1.3 深度可分离卷积

  1. Network Decoupling: From Regular to Depthwise Separable Convolutions
    2018-08-16 paper

2.1.4 维度卷积

  1. DiCENet: Dimension-wise Convolutions for Efficient Networks
    2019-06-08 paper | pytorch

2.1.5 形变卷积

  1. Deformable Convolutional Networks
    ICCV 2017 oral 2017-03-17

2.1.6 其他

  1. VarGNet: Variable Group Convolutional Neural Network for Efficient Embedded Computing
    2019-07-12 paper
    在深度可分离卷积之上做了改进,使其在嵌入式设备上更易优化;

  2. Random Shifting for CNN: a Solution to Reduce Information Loss in Down-Sampling Layers
    IJCAI 2017 2017 paper

2.2 激活函数

  1. Neurons Activation Visualization and Information Theoretic Analysis
    2019-05-14 paper

2.3 神经网络

2.3.1 神经网络

  1. Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks
    2019-02-07 paper

  2. Loss Surface Modality of Feed-Forward Neural Network Architectures
    2019-05-24 paper

  3. What Can ResNet Learn Efficiently, Going Beyond Kernels?
    2019-05-24 paper

2.3.2 反向传播

  1. Learning Internal Representations by Error Propagation
    1985-09-05 Hinton paper

  2. Learning representations by back-propagating errors
    1986 Hinton paper

  3. Memorized Sparse Backpropagation
    2019-05-24 paper

  4. Fully Decoupled Neural Network Learning Using Delayed Gradients
    2019-06-21 paper
    延迟梯度下降:为了解决反向传播的串行执行方式;

3 实践

3.1 初始化

  1. How to start training: The effect of initialization and architecture
    NIPS 2018 2018-03-05 paper

3.2 训练

3.2.1 分类

  1. A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification
    2015-10-13 paper
    $\bullet \bullet \bullet \bullet \bullet$

  1. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
    2017-06-08 paper
    $\bullet \bullet$ - large batch
    证明了大批次收敛困难,但是效果好;大批次的时候 SGD 和 moment 不好用;

  2. Accelerated Training for Massive Classification via Dynamic Class Selection
    2018-01-05 paper

3.2.2 检测

  1. Bag of Freebies for Training Object Detection Neural Networks
    2019-02-11 paper
    $\bullet \bullet$ - freebies
    GluonCV 检测调参技巧​;

  2. Training Object Detectors With Noisy Data
    2019-05-17 paper
    教师网络处理噪声图像;

3.2.3 其他

  1. Large Scale Distributed Deep Networks
    NIPS 2012 2012 paper
    大规模分布式训练;

  2. Joint Training of Neural Network Ensembles
    2019-02-12 paper | pytorch
    模型集成及其联合训练问题;

  3. Sequential training algorithm for neural networks
    2019-05-17 paper
    $\bullet \bullet$
    网络逐层单独训练,最终融合在一起;虽然效果不如整体训练好,但是对于算力有限情况下的大型网络训练很有帮助;

  4. 深度学习参数怎么调优,这12个trick告诉你

3.3 梯度爆炸

  1. On the difficulty of training Recurrent Neural Networks
    2012-11-21 paper

  2. Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?
    NIPS 2018 2018-01-11 paper

  3. Products of Many Large Random Matrices and Gradients in Deep Neural Networks
    2018-12-14 paper
    RELU 梯度问题测量;

3.4 梯度消失

  1. On the difficulty of training Recurrent Neural Networks
    2012-11-21 paper

  2. Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients?
    NIPS 2018 2018-01-11 paper

3.5 过拟合

  1. Dropout:A Simple Way to Prevent Neural Networks from Overfitting
    2014 paper | [blog](https://zhuanlan.zhihu.com/p/38200980

  2. One-Step or Two-Step Optimization and the Overfitting Phenomenon: A Case Study on Time Series Classification
    2014-07-16 paper

  3. How much does your data exploration overfit? Controlling bias via information usage
    2015-11-16 paper

  4. Detecting Overfitting of Deep Generative Networks via Latent Recovery
    2019-01-09 paper

  5. Overfitting Mechanism and Avoidance in Deep Neural Networks
    2019-01-19 paper

  6. The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent
    2019-04-15 paper

  7. GradMask: Reduce Overfitting by Regularizing Saliency
    2019-04-16 paper

  8. Overfitting in Synthesis: Theory and Practice
    2019-05-17 paper
    $\bullet \bullet$

  9. The advantages of multiple classes for reducing overfitting from test set reuse
    2019-05-24 paper

  10. Exploiting the Full Capacity of Deep Neural Networks while Avoiding Overfitting by Targeted Sparsity Regularization
    2020-02-21 paper
    稀疏正则避免过拟合,效果比 dropout 和 batchnorm 效果好;

3.6 欠拟合

3.7 模型检验

验证模型相关性,并检验过拟合和欠拟合;

  1. Perturbed Model Validation: A New Framework to Validate Model Relevance
    2019-05-24 paper
    扰动模型验证;

4 进阶

4.1 优化器

4.1.1 综述

  1. An overview of gradient descent optimization algorithms
    2016-09-15 paper | bolg
    博客的 paper 版;主要绍了 Batch gradient descent,Stochastic gradient descent,Mini-batch gradient descent 等方法;简洁明了的介绍了 Momentum,Nesterov accelerated gradient,Adagrad,Adadelta,RMSprop,Adam,AdaMax,Nadam 等;

  2. A Survey of Optimization Methods from a Machine Learning Perspective
    2019-06-17 paper

4.1.2 经典

  1. Two problems with backpropagation and other steepest-descent learning procedures for networks
    1986 paper
    指出SGD搜索的效率十分低下的缺陷;

  2. Improving the convergence of back-propagation learning with second-order methods
    1988 paper

  3. Acceleration of stochastic approximation by averaging
    1992-07 paper
    平均随机梯度下降;

  4. Analysis of Natural Gradient Descent for Multilayer Neural Networks
    1999-01-21 paper

  5. On the momentum term in gradient descent learning algorithms
    1999 paper
    动量应用于SGD;

  6. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
    2011-07 paper
    Adagrade:对每个参数有单独的学习速率;但是学习速度会快速下降;

  7. ADADELTA: An Adaptive Learning Rate Method
    2012-12-22 paper
    AdadDelta

  8. Adam: A Method for Stochastic Optimization
    ICLR 2014 2014-12-22 paper
    Adam:改进 RMSProp;可以为每个参数单独指定自适应学习速率,除了存储像RMSprop之类的过去梯度的平方的指数加权平均值之外,Adam还计算过去梯度的指数加权平均值,类似于动量;

  9. Kalman-Based Stochastic Gradient Method with Stop Condition and Insensitivity to Conditioning
    2015-12-03 paper
    kSGD:基于卡尔曼的随机梯度下降,对超参不敏感,但是计算量太大;

  10. Small steps and giant leaps: Minimal Newton solvers for Deep Learning
    2018-05-21 paper | openreview
    直接替换当前使用的深度学习求解器的快速二阶方法;

4.1.2 其他

  1. No More Pesky Learning Rates
    2012-06-06 paper
    自适应调整学习率;

  2. YellowFin and the Art of Momentum Tuning
    2017-06-12 paper
    SGD 收敛慢,但是准确度比较高;

  3. Aggregated Momentum: Stability Through Passive Damping
    2018-04-01 paper

  4. Adaptive Gradient Methods With Dynamic Bound Of Learning Rate
    ICLR 2019 2019-02-26 paper | pytorch-offical | openreview

4.1.3 SGD

  1. Deep learning with Elastic Averaging SGD
    NIPS 2015 2014-12-20 paper

  2. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
    2017-06-08
    证明了大批次下 SGD 不稳定;

  3. The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent
    2019-04-15 paper
    $\bullet \bullet$ - Gradient Confusion
    SGD 对于模型收敛的影响,及相应网络结构的探索;

  4. Time-Smoothed Gradients for Online Forecasting
    ICML 2019 2019-05-21 paper
    PTS-SGD: 对学习率不敏感;计算快速;

  5. Fine-grained Optimization of Deep Neural Networks
    2019-05-22 paper
    FG-SGD:细粒度优化,确保收敛到最小值;

  6. Momentum-Based Variance Reduction in Non-Convex SGD
    2019-05-24 paper

4.2 归一化

  1. Self-Normalizing Neural Networks
    2017-06-08 Paper | tensorflow | zhihu | reddit

  2. Fixup Initialization: Residual Learning Without Normalization
    ICLR 2019 2019-01-27 paper | openreview
    归一化的理论和有效性的探讨;

  3. ROI Regularization for Semi-supervised and Supervised Learning
    2019-05-15 paper

4.3 模型可解释性

  1. Explainable Machine Learning for Scientific Insights and Discoveries
    2019-05-21 paper

  2. On the Learning Dynamics of Two-layer Nonlinear Convolutional Neural Networks
    2019-05-24 paper

4.4 分布式

  1. A Quick Survey on Large Scale Distributed Deep Learning Systems
    2018
    深度学习分布式问题汇总;

End

附录

A 参考资料

  1. Machine Learning for Beginners: An Introduction to Neural Networks
  2. A Beginner’s Guide to Neural Networks and Deep Learning