- Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation - [Arxiv] [QA]
- Objects that Sound - [Arxiv] [QA]
- Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks - [Arxiv] [QA]
- Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions - [Arxiv] [QA]
- A Probability Monad as the Colimit of Spaces of Finite Samples - [Arxiv] [QA]
- Geometry-Aware Learning of Maps for Camera Localization - [Arxiv] [QA]
- Self-supervised Learning of Motion Capture - [Arxiv] [QA]
- Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) - [Arxiv] [QA]
- Deep Image Prior - [Arxiv] [QA]
- Backprop as Functor: A compositional perspective on supervised learning - [Arxiv] [QA]
- Population Based Training of Neural Networks - [Arxiv] [QA]
- Distilling a Neural Network Into a Soft Decision Tree - [Arxiv] [QA]
- Cross-Domain Self-supervised Multi-task Feature Learning using Synthetic Imagery - [Arxiv] [QA]
- Contextual-based Image Inpainting: Infer, Match, and Translate - [Arxiv] [QA]
- Improvements to context based self-supervised learning - [Arxiv] [QA]
- Dual-Path Convolutional Image-Text Embeddings with Instance Loss - [Arxiv] [QA]
- Denotational validation of higher-order Bayesian inference - [Arxiv] [QA]
- Attentional Pooling for Action Recognition - [Arxiv] [QA]
- Hierarchical Representations for Efficient Architecture Search - [Arxiv] [QA]
- Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks - [Arxiv] [QA]
- Generalized End-to-End Loss for Speaker Verification - [Arxiv] [QA]
- Dynamic Routing Between Capsules - [Arxiv] [QA]
- Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention - [Arxiv] [QA]
- Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation - [Arxiv] [QA]
- Searching for Activation Functions - [Arxiv] [QA]
- Generalization in Deep Learning - [Arxiv] [QA]
- A systematic study of the class imbalance problem in convolutional neural networks - [Arxiv] [QA]
- Recent Advances in Zero-shot Recognition - [Arxiv] [QA]
- Deep Learning for Case-Based Reasoning through Prototypes: A Neural Network that Explains Its Predictions - [Arxiv] [QA]
- Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces - [Arxiv] [QA]
- Information structures and their cohomology - [Arxiv] [QA]
- N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning - [Arxiv] [QA]
- AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline - [Arxiv] [QA]
- ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids - [Arxiv] [QA]
- Disintegration and Bayesian Inversion via String Diagrams - [Arxiv] [QA]
- Multi-task Self-Supervised Visual Learning - [Arxiv] [QA]
- Twin Networks: Matching the Future for Sequence Generation - [Arxiv] [QA]
- Representation Learning by Learning to Count - [Arxiv] [QA]
- SMASH: One-Shot Model Architecture Search through HyperNetworks - [Arxiv] [QA]
- Transitive Invariance for Self-supervised Visual Representation Learning - [Arxiv] [QA]
- PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN - [Arxiv] [QA]
- Localizing Moments in Video with Natural Language - [Arxiv] [QA]
- CASSL: Curriculum Accelerated Self-Supervised Learning - [Arxiv] [QA]
- Unsupervised Representation Learning by Sorting Sequences - [Arxiv] [QA]
- Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering - [Arxiv] [QA]
- Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback - [Arxiv] [QA]
- Zero-Shot Learning -- A Comprehensive Evaluation of the Good, the Bad and the Ugly - [Arxiv] [QA]
- A Channel-Based Perspective on Conjugate Priors - [Arxiv] [QA]
- Bolt: Accelerated Data Mining with Fast Vector Compression - [Arxiv] [QA]
- Methods for Interpreting and Understanding Deep Neural Networks - [Arxiv] [QA]
- SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability - [Arxiv] [QA]
- SmoothGrad: removing noise by adding noise - [Arxiv] [QA]
- Attention Is All You Need - [Arxiv] [QA]
- Deep reinforcement learning from human preferences - [Arxiv] [QA]
- Self-supervised learning of visual features through embedding images into text topic spaces - [Arxiv] [QA]
- Look, Listen and Learn - [Arxiv] [QA]
- Learning how to explain neural networks: PatternNet and PatternAttribution - [Arxiv] [QA]
- Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems - [Arxiv] [QA]
- TALL: Temporal Activity Localization via Language Query - [Arxiv] [QA]
- Dense-Captioning Events in Videos - [Arxiv] [QA]
- DeepArchitect: Automatically Designing and Training Deep Architectures - [Arxiv] [QA]
- Unsupervised Learning of Depth and Ego-Motion from Video - [Arxiv] [QA]
- Detecting and Recognizing Human-Object Interactions - [Arxiv] [QA]
- Bandit Structured Prediction for Neural Sequence-to-Sequence Learning - [Arxiv] [QA]
- Equivalence Between Policy Gradients and Soft Q-Learning - [Arxiv] [QA]
- Learning to Fly by Crashing - [Arxiv] [QA]
- Unsupervised Learning by Predicting Noise - [Arxiv] [QA]
- TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering - [Arxiv] [QA]
- DeepPermNet: Visual Permutation Learning - [Arxiv] [QA]
- Towards Building Large Scale Multimodal Domain-Aware Conversation Systems - [Arxiv] [QA]
- Tacotron: Towards End-to-End Speech Synthesis - [Arxiv] [QA]
- Towards Automatic Learning of Procedures from Web Instructional Videos - [Arxiv] [QA]
- Where to put the Image in an Image Caption Generator - [Arxiv] [QA]
- Mask R-CNN - [Arxiv] [QA]
- Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation - [Arxiv] [QA]
- All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation - [Arxiv] [QA]
- Towards A Rigorous Science of Interpretable Machine Learning - [Arxiv] [QA]
- Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations - [Arxiv] [QA]
- Visualizing Deep Neural Network Decisions: Prediction Difference Analysis - [Arxiv] [QA]
- Face Aging With Conditional Generative Adversarial Networks - [Arxiv] [QA]