Skip to content

Latest commit

 

History

History
396 lines (389 loc) · 39.1 KB

README.md

File metadata and controls

396 lines (389 loc) · 39.1 KB

Applied Deep Learning (YouTube Playlist)

Course Objectives & Prerequisites:

This is a two-semester-long course primarily designed for graduate students. However, undergraduate students with demonstrated strong backgrounds in probability, statistics (e.g., linear & logistic regressions), numerical linear algebra and optimization are also welcome to register. We will be pursuing the objective of familiarizing the students with state-of-the-art deep learning techniques employed in the industry. Deep learning is a field that has been witnessing a mini-revolution every few months. It is therefore very important that the students registering for this course are eager to learn new concepts. So much of deep learning is just software engineering. Consequently, the students should be able to write clean code while doing their assignments. Python will be the programming language used in this course. Familiarity with TensorFlow and PyTorch is a plus but is not a requirement. However, it is very important that the students are willing to do the hard work to learn and use these two frameworks as the course progresses.

Part I Topics (Fall Semester)

Part II Topics (Spring Semester)

References

Training Deep Neural Networks

  • An overview of gradient descent optimization algorithms

Computer Vision; Image Classification; Large Networks

  • Multi-column Deep Neural Networks for Image Classification
  • ImageNet Classification with Deep Convolutional Neural Networks (code)
  • Dropout: A Simple Way to Prevent Neural Networks from Overfitting (code)
  • Network In Network
  • Very Deep Convolutional Networks for Large-Scale Image Recognition (code)
  • Going Deeper with Convolutions
  • Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
  • Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
  • Rethinking the Inception Architecture for Computer Vision
  • Training Very Deep Networks
  • Deep Residual Learning for Image Recognition (code)
  • Identity Mappings in Deep Residual Networks (code)
  • Wide Residual Networks (code)
  • Aggregated Residual Transformations for Deep Neural Networks (code)
  • Densely Connected Convolutional Networks (code)
  • Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
  • mixup: Beyond Empirical Risk Minimization (code)
  • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (code)
  • Residual Attention Network for Image Classification
  • Squeeze-and-Excitation Networks (code)
  • CBAM: Convolutional Block Attention Module (code)
  • Random Erasing Data Augmentation (code)
  • Spatial Transformer Networks
  • Dynamic Routing Between Capsules
  • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (code)
  • MLP-Mixer: An all-MLP Architecture for Vision (code)
  • High-Performance Large-Scale Image Recognition Without Normalization (code)

Computer Vision; Image Classification; Small Networks

  • Distilling the Knowledge in a Neural Network
  • Learning both Weights and Connections for Efficient Neural Networks
  • Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (code)
  • SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (code)
  • XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks (code)
  • MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (code)
  • Xception: Deep Learning with Depthwise Separable Convolutions (code)
  • MobileNetV2: Inverted Residuals and Linear Bottlenecks (code)
  • ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (code)
  • ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Computer Vision; Image Classification; AutoML

  • Neural Architecture Search With Reinforcement Learning (code)
  • Learning Transferable Architectures for Scalable Image Recognition
  • Regularized Evolution for Image Classifier Architecture Search (code)
  • Evolving Deep Neural Networks
  • Efficient Neural Architecture Search via Parameter Sharing (code)
  • DARTS: Differentiable Architecture Search (code)
  • EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (code)
  • MnasNet: Platform-Aware Neural Architecture Search for Mobile (code)
  • Searching for MobileNetV3
  • AutoAugment: Learning Augmentation Strategies from Data
  • RandAugment: Practical Automated Data Augmentation with a Reduced Search Space

Computer Vision; Image Classification; Robustness

  • Intriguing properties of neural networks
  • Explaining and harnessing adversarial examples
  • Adversarial Examples in the Physical World
  • The Limitations of Deep Learning in Adversarial Settings
  • Practical Black-Box Attacks against Machine Learning
  • Towards Evaluating the Robustness of Neural Networks (code)
  • Towards Deep Learning Models Resistant to Adversarial Attacks (code)
  • Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples (code)
  • One Pixel Attack for Fooling Deep Neural Networks

Computer Vision; Image Classification; Visualizing & Understanding

  • Visualizing and Understanding Convolutional Networks
  • Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
  • Striving for Simplicity: The All Convolutional Net
  • “Why Should I Trust You?” Explaining the Predictions of Any Classifier (code)
  • Learning Deep Features for Discriminative Localization (code)
  • Understanding Deep Learning Requires Rethinking Generalization
  • Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (code)
  • A Unified Approach to Interpreting Model Predictions (code)
  • On Calibration of Modern Neural Networks (code)

Computer Vision; Image Classification; Transfer Learning

  • How transferable are features in deep neural networks? (code)
  • DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition (code)
  • CNN Features off-the-shelf: an Astounding Baseline for Recognition
  • Return of the Devil in the Details: Delving Deep into Convolutional Nets (code)
  • Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks (code)

Computer Vision; Image Classification; Domain Adaptation

  • Learning Transferable Features with Deep Adaptation Networks (code)
  • Domain-Adversarial Training of Neural Networks (code)
  • Adversarial Discriminative Domain Adaptation
  • CyCADA: Cycle-Consistent Adversarial Domain Adaptation (code)

Computer Vision; Image Classification; Few-shot Learning

  • Matching Networks for One Shot Learning
  • Prototypical Networks for Few-shot Learning (code)
  • Learning to Compare: Relation Network for Few-Shot Learning

Computer Vision; Image Classification; Federated Learning

  • Communication-Efficient Learning of Deep Networks from Decentralized Data

Computer Vision; Image Classification; Data-efficient Learning

  • Self-training with Noisy Student improves ImageNet classification (code)
  • Deep Clustering for Unsupervised Learning of Visual Features (code)
  • A Simple Framework for Contrastive Learning of Visual Representations (code)
  • Momentum Contrast for Unsupervised Visual Representation Learning (code)
  • Contrastive Multiview Coding (code)
  • Data-Efficient Image Recognition with Contrastive Predictive Coding
  • Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (code)

Computer Vision; Image Transformation; Semantic Segmentation

  • Fully Convolutional Networks for Semantic Segmentation (code)
  • Learning Deconvolution Network for Semantic Segmentation (code)
  • U-Net: Convolutional Networks for Biomedical Image Segmentation (code)
  • DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs (code)
  • Conditional Random Fields as Recurrent Neural Networks (code)
  • Multi-scale Context Aggregation by Dilated Convolutions (code)
  • SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
  • Pyramid Scene Parsing Network (code)
  • Rethinking Atrous Convolution for Semantic Image Segmentation
  • What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
  • RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation (code)
  • Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (code)
  • Dual Attention Network for Scene Segmentation (code)

Computer Vision; Image Transformation; Super-Resolution, Denoising, and Colorization

  • Learning a Deep Convolutional Network for Image Super-Resolution (code)
  • Perceptual Losses for Real-Time Style Transfer and Super-Resolution
  • Image Style Transfer Using Convolutional Neural Networks (code)
  • Accurate Image Super-Resolution Using Very Deep Convolutional Networks (code)
  • Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
  • Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising (code)
  • Enhanced Deep Residual Networks for Single Image Super-Resolution (code)
  • The Unreasonable Effectiveness of Deep Features as a Perceptual Metric (code)

Computer Vision; Pose Estimation

  • Convolutional Pose Machines (code)
  • Stacked Hourglass Networks for Human Pose Estimation (code)
  • Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields (code)
  • Deep High-Resolution Representation Learning for Human Pose Estimation (code)

Computer Vision; Image Transformation; Optical Flow and Depth Estimation

  • Unsupervised Monocular Depth Estimation with Left-Right Consistency (code)
  • FlowNet: Learning Optical Flow with Convolutional Networks
  • FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks (code)

Computer Vision; Object Detection; Two Stage Detectors

  • A Survey on Performance Metrics for Object-Detection Algorithms (code)
  • Rich feature hierarchies for accurate object detection and semantic segmentation (code)
  • Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
  • Fast R-CNN (code)
  • Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (code)
  • R-FCN: Object Detection via Region-based Fully Convolutional Networks (code)
  • Feature Pyramid Networks for Object Detection
  • Deformable Convolutional Networks (code)
  • Mask R-CNN (code)
  • Cascade R-CNN: Delving into High Quality Object Detection (code)

Computer Vision; Object Detection; One Stage Detectors

  • OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks (code)
  • You Only Look Once: Unified, Real-Time Object Detection (code)
  • SSD: Single Shot MultiBox Detector (code)
  • YOLO9000: Better, Faster, Stronger (code)
  • Focal Loss for Dense Object Detection (code)
  • Speed/Accuracy Trade-Offs For Modern Convolutional Object Detectors
  • YOLOv3: An Incremental Improvement (code)
  • CornerNet: Detecting Objects as Paired Keypoints (code)
  • FCOS: Fully Convolutional One-Stage Object Detection (code)
  • Objects as Points (code)
  • EfficientDet: Scalable and Efficient Object Detection (code)
  • YOLOv4: Optimal Speed and Accuracy of Object Detection (code)
  • End-to-End Object Detection with Transformers (code)

Computer Vision; Face Recognition and Detection

  • DeepFace: Closing the Gap to Human-Level Performance in Face Verification
  • FaceNet: A Unified Embedding for Face Recognition and Clustering
  • Deep Face Recognition
  • Deep Learning Face Attributes in the Wild
  • Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks (code)
  • A Discriminative Feature Learning Approach for Deep Face Recognition
  • ArcFace: Additive Angular Margin Loss for Deep Face Recognition (code)

Computer Vision; Video

  • 3D Convolutional Neural Networks for Human Action Recognition
  • Large-scale Video Classification with Convolutional Neural Networks (code)
  • Two-Stream Convolutional Networks for Action Recognition in Videos
  • Learning Spatiotemporal Features with 3D Convolutional Networks (code)
  • Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors (code)
  • Temporal Segment Networks: Towards Good Practices for Deep Action Recognition (code)
  • Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (code)
  • Non-local Neural Networks (code)
  • Group Normalization (code)
  • Fully-Convolutional Siamese Networks for Object Tracking (code)
  • Robust Consistent Video Depth Estimation (code)

Computer Vision; 3D

  • V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation (code)
  • PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (code)
  • PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space (code)
  • Dynamic Graph CNN for Learning on Point Clouds (code)
  • VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

Natural Language Processing; Word Representations

  • Linguistic Regularities in Continuous Space Word Representations
  • Distributed Representations of Words and Phrases and their Compositionality
  • Efficient Estimation of Word Representations in Vector Space (code)
  • GloVe: Global Vectors for Word Representation (code)
  • Enriching Word Vectors with Subword Information (code)

Natural Language Processing; Text Classification

  • Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank (code)
  • Convolutional Neural Networks for Sentence Classification (code)
  • Distributed Representations of Sentences and Documents
  • Effective Use of Word Order for Text Categorization with Convolutional Neural Networks (code)
  • A Convolutional Neural Network for Modelling Sentences
  • A Sensitivity Analysis Of (And Practitioners' Guide To) Convolutional Neural Networks For Sentence Classification
  • Character-level Convolutional Networks for Text Classification (code)
  • Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks (code)
  • Bag Of Tricks For Efficient Text Classification (code)
  • Hierarchical Attention Networks for Document Classification
  • Neural Architectures For Named Entity Recognition (code) (code)
  • Universal Language Model Fine-tuning for Text Classification (code)

Natural Language Processing; Neural Machine Translation

  • Neural Machine Translation by Jointly Learning to Align and Translate
  • Sequence to Sequence Learning with Neural Networks
  • Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
  • On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
  • Effective Approaches to Attention-based Neural Machine Translation (code)
  • Neural Machine Translation Of Rare Words With Subword Units (code)
  • Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
  • Convolutional Sequence to Sequence Learning (code)
  • Attention Is All You Need (code)
  • Reformer: The Efficient Transformer (code)
  • Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (code)

Natural Language Processing; Language Modeling

  • Deep contextualized word representations (code)
  • An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (code)
  • Improving Language Understanding by Generative Pre-Training (code)
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (code)
  • Language Models are Unsupervised Multitask Learners (code)
  • ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (code)
  • RoBERTa: A Robustly Optimized BERT Pretraining Approach (code)
  • Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (code)
  • XLNet: Generalized Autoregressive Pretraining for Language Understanding (code)
  • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (code)
  • Cross-lingual Language Model Pretraining (code)
  • SpanBERT: Improving Pre-training by Representing and Predicting Spans (code)
  • BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
  • Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (code)
  • Language Models are Few-Shot Learners (code)
  • ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (code)
  • Pay Attention to MLPs

Multimodal Learning

  • Long-term Recurrent Convolutional Networks for Visual Recognition and Description
  • Show and Tell: A Neural Image Caption Generator
  • Deep Visual-Semantic Alignments for Generating Image Descriptions
  • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (code)
  • Layer Normalization
  • Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering (code)
  • Generative Adversarial Text to Image Synthesis (code)
  • StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks (code)
  • Zero-Shot Text-to-Image Generation (code)

Generative Networks; Variational Auto-Encoders

  • Auto-Encoding Variational Bayes
  • Stochastic Backpropagation and Approximate Inference in Deep Generative Models
  • Categorical Reparameterization with Gumbel-Softmax

Generative Networks; Unconditional GANs

  • Generative Adversarial Nets (code)
  • Unsupervised representation learning with deep convolutional generative adversarial networks (code)
  • Improved Techniques for Training GANs (code)
  • InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets (code)
  • Least Squares Generative Adversarial Networks (code)
  • Wasserstein GAN (code)
  • Improved Training of Wasserstein GANs (code)
  • Progressive growing of GANs for improved quality, stability, and variation (code)
  • GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (code)
  • Spectral Normalization for Generative Adversarial Networks (code)
  • Large Scale GAN Training for High Fidelity Natural Image Synthesis (code)
  • A Style-Based Generator Architecture for Generative Adversarial Networks (code)
  • Self-Attention Generative Adversarial Networks (code)
  • Analyzing and Improving the Image Quality of StyleGAN (code)

Generative Networks; Conditional GANs

  • Conditional Generative Adversarial Nets
  • Context Encoders: Feature Learning by Inpainting (code)
  • Conditional Image Synthesis with Auxiliary Classifier GANs
  • Image-to-Image Translation with Conditional Adversarial Networks (code)
  • Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (code)
  • Unsupervised Image-to-Image Translation Networks (code)
  • Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
  • High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs (code)
  • StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation (code)

Speech & Music; Recognition

  • Mel-Spectrogram and Mel-Frequency Cepstral Coefficients (MFCCs)
  • Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks
  • Speech Recognition with Deep Recurrent Neural Networks
  • Towards End-to-End Speech Recognition with Recurrent Neural Networks
  • Deep Speech: Scaling up end-to-end speech recognition
  • LSTM: A Search Space Odyssey
  • Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
  • X-vectors: Robust DNN Embeddings for Speaker Recognition (code)
  • SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
  • Jasper: An End-to-End Convolutional Neural Acoustic Model (code)

Speech & Music; Synthesis

  • Generating Sequences With Recurrent Neural Networks
  • Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling (code)
  • WaveNet: A Generative Model for Raw Audio

Speech & Music; Modeling

  • Representation Learning with Contrastive Predictive Coding
  • wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (code)

Reinforcement Learning; Games

  • Playing Atari with Deep Reinforcement Learning
  • Human-level Control through Deep Reinforcement Learning
  • Deep Reinforcement Learning with Double Q-Learning
  • Prioritized Experience Replay
  • Mastering the game of Go with deep neural networks and tree search
  • Mastering the game of Go without human knowledge
  • A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
  • Grandmaster level in StarCraft II using multi-agent reinforcement learning (code)

Reinforcement Learning; Simulated Environments

  • Continuous Control with Deep Reinforcement Learning
  • Trust Region Policy Optimization (code)
  • Conjugate Gradient Method
  • Asynchronous Methods for Deep Reinforcement Learning
  • Proximal Policy Optimization Algorithms
  • Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (code)

Reinforcement Learning; Real Environments

  • End to End Learning for Self-Driving Cars
  • End-To-End Training Of Deep Visuomotor Policies
  • Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection
  • Learning Dexterous In-Hand Manipulation

Reinforcement Learning; Uncertainty Quantification & Multitask Learning

  • Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (code)
  • Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (code) (code)
  • Overcoming catastrophic forgetting in neural networks

Graph Neural Networks

  • Translating Embeddings for Modeling Multi-relational Data (code)
  • DeepWalk: Online Learning of Social Representations (code)
  • LINE: Large-scale Information Network Embedding (code)
  • node2vec: Scalable Feature Learning for Networks (code)
  • Semi-Supervised Classification with Graph Convolutional Networks (code)
  • Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (code)
  • Inductive Representation Learning on Large Graphs (code)
  • Graph Attention Networks (code)
  • How Powerful Are Graph Neural Networks? (code)
  • Modeling Relational Data with Graph Convolutional Networks (code)

Recommender Systems

  • Session-based Recommendations with Recurrent Neural Networks (code)
  • AutoRec: Autoencoders Meet Collaborative Filtering
  • Wide & Deep Learning for Recommender Systems
  • Neural Collaborative Filtering (code)
  • Neural Factorization Machines for Sparse Predictive Analytics (code)
  • DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
  • Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks (code)
  • Variational Autoencoders for Collaborative Filtering (code)
  • Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding (code)
  • Deep Learning Recommendation Model for Personalization and Recommendation Systems (code)

Computational Biology

  • Improved Protein Structure Prediction using Potentials from Deep Learning (code)
  • Highly Accurate Protein Structure Prediction with AlphaFold (code)