- PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection - [Arxiv] [QA]
- RC-DARTS: Resource Constrained Differentiable Architecture Search - [Arxiv] [QA]
- NAS evaluation is frustratingly hard - [Arxiv] [QA]
- Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks - [Arxiv] [QA]
- Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering - [Arxiv] [QA]
- Image Processing Using Multi-Code GAN Prior - [Arxiv] [QA]
- ClusterFit: Improving Generalization of Visual Representations - [Arxiv] [QA]
- Self-Supervised Visual Terrain Classification from Unsupervised Acoustic Feature Learning - [Arxiv] [QA]
- Infinite products and zero-one laws in categorical probability - [Arxiv] [QA]
- Generating Videos of Zero-Shot Compositions of Actions and Objects - [Arxiv] [QA]
- 15 Keypoints Is All You Need - [Arxiv] [QA]
- 12-in-1: Multi-Task Vision and Language Representation Learning - [Arxiv] [QA]
- Prioritized Unit Propagation with Periodic Resetting is (Almost) All You Need for Random SAT Solving - [Arxiv] [QA]
- Self-Supervised Learning of Pretext-Invariant Representations - [Arxiv] [QA]
- Lost-customers approximation of semi-open queueing networks with backordering -- An application to minimise the number of robots in robotic mobile fulfilment systems - [Arxiv] [QA]
- Just Go with the Flow: Self-Supervised Scene Flow Estimation - [Arxiv] [QA]
- ASR is all you need: cross-modal distillation for lip reading - [Arxiv] [QA]
- Single Headed Attention RNN: Stop Thinking With Your Head - [Arxiv] [QA]
- Binarized Neural Architecture Search - [Arxiv] [QA]
- Binarized Neural Architecture Search - [Arxiv] [QA]
- Breaking the cycle -- Colleagues are all you need - [Arxiv] [QA]
- Region Normalization for Image Inpainting - [Arxiv] [QA]
- All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting - [Arxiv] [QA]
- Automatic Text-based Personality Recognition on Monologues and Multiparty Dialogues Using Attentive Networks and Contextual Embeddings - [Arxiv] [QA]
- Generating Persona Consistent Dialogues by Exploiting Natural Language Inference - [Arxiv] [QA]
- Momentum Contrast for Unsupervised Visual Representation Learning - [Arxiv] [QA]
- A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data - [Arxiv] [QA]
- Effectiveness of self-supervised pre-training for speech recognition - [Arxiv] [QA]
- Contextualized Sparse Representations for Real-Time Open-Domain Question Answering - [Arxiv] [QA]
- Fast Transformer Decoding: One Write-Head is All You Need - [Arxiv] [QA]
- Attention Is All You Need for Chinese Word Segmentation - [Arxiv] [QA]
- Multi-Stage Document Ranking with BERT - [Arxiv] [QA]
- Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning - [Arxiv] [QA]
- Stabilizing DARTS with Amended Gradient Estimation on Architectural Parameters - [Arxiv] [QA]
- Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders - [Arxiv] [QA]
- Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram - [Arxiv] [QA]
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer - [Arxiv] [QA]
- Generative Pre-Training for Speech with Autoregressive Predictive Coding - [Arxiv] [QA]
- KnowIT VQA: Answering Knowledge-Based Questions about Videos - [Arxiv] [QA]
- Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis - [Arxiv] [QA]
- Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video - [Arxiv] [QA]
- Understanding Deep Networks via Extremal Perturbations and Smooth Masks - [Arxiv] [QA]
- ALOHA: Artificial Learning of Human Attributes for Dialogue Agents - [Arxiv] [QA]
- Reverse derivative categories - [Arxiv] [QA]
- Understanding the Limitations of Variational Mutual Information Estimators - [Arxiv] [QA]
- Self-supervised Label Augmentation via Input Transformations - [Arxiv] [QA]
- vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations - [Arxiv] [QA]
- A cost-effective method for improving and re-purposing large, pre-trained GANs by fine-tuning their class-embeddings - [Arxiv] [QA]
- Explaining image classifiers by removing input features using generative models - [Arxiv] [QA]
- Probability, valuations, hyperspace: Three monads on Top and the support as a morphism - [Arxiv] [QA]
- Bayesian open games - [Arxiv] [QA]
- MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis - [Arxiv] [QA]
- Continual Learning in Neural Networks - [Arxiv] [QA]
- Continual Learning in Neural Networks - [Arxiv] [QA]
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models - [Arxiv] [QA]
- Is Fast Adaptation All You Need? - [Arxiv] [QA]
- Interpretations are useful: penalizing explanations to align neural networks with prior knowledge - [Arxiv] [QA]
- Visual Explanation for Deep Metric Learning - [Arxiv] [QA]
- Joint-task Self-supervised Learning for Temporal Correspondence - [Arxiv] [QA]
- UNITER: UNiversal Image-TExt Representation Learning - [Arxiv] [QA]
- High Fidelity Speech Synthesis with Adversarial Networks - [Arxiv] [QA]
- Improving Generative Visual Dialog by Answering Diverse Questions - [Arxiv] [QA]
- On Model Stability as a Function of Random Seed - [Arxiv] [QA]
- Understanding and Robustifying Differentiable Architecture Search - [Arxiv] [QA]
- Self-Training for End-to-End Speech Recognition - [Arxiv] [QA]
- Pose-aware Multi-level Feature Network for Human Object Interaction Detection - [Arxiv] [QA]
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism - [Arxiv] [QA]
- An Internal Learning Approach to Video Inpainting - [Arxiv] [QA]
- Learning to Deceive with Attention-Based Explanations - [Arxiv] [QA]
- Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset - [Arxiv] [QA]
- Specifying Object Attributes and Relations in Interactive Scene Generation - [Arxiv] [QA]
- CTRL: A Conditional Transformer Language Model for Controllable Generation - [Arxiv] [QA]
- ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons - [Arxiv] [QA]
- Image Inpainting with Learnable Bidirectional Attention Maps - [Arxiv] [QA]
- Identifying Personality Traits Using Overlap Dynamics in Multiparty Dialogue - [Arxiv] [QA]
- All You Need is Ratings: A Clustering Approach to Synthetic Rating Datasets Generation - [Arxiv] [QA]
- Copy-and-Paste Networks for Deep Video Inpainting - [Arxiv] [QA]
- Accelerating Large-Scale Inference with Anisotropic Vector Quantization - [Arxiv] [QA]
- Onion-Peel Networks for Deep Video Completion - [Arxiv] [QA]
- VL-BERT: Pre-training of Generic Visual-Linguistic Representations - [Arxiv] [QA]
- Efficient Deep Neural Networks - [Arxiv] [QA]
- Efficient Deep Neural Networks - [Arxiv] [QA]
- A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics - [Arxiv] [QA]
- Unsupervised Learning of Landmarks by Descriptor Vector Exchange - [Arxiv] [QA]
- Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training - [Arxiv] [QA]
- StructureFlow: Image Inpainting via Structure-aware Appearance Flow - [Arxiv] [QA]
- Approximating the Convex Hull via Metric Space Magnitude - [Arxiv] [QA]
- ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks - [Arxiv] [QA]
- On the Existence of Simpler Machine Learning Models - [Arxiv] [QA]
- Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models - [Arxiv] [QA]
- Generative Image Inpainting with Submanifold Alignment - [Arxiv] [QA]
- On Mutual Information Maximization for Representation Learning - [Arxiv] [QA]
- Benchmarking Attribution Methods with Relative Feature Importance - [Arxiv] [QA]
- Forward-Backward Decoding for Regularizing End-to-End TTS - [Arxiv] [QA]
- Compositional Deep Learning - [Arxiv] [QA]
- PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search - [Arxiv] [QA]
- Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning - [Arxiv] [QA]
- Generative Counterfactual Introspection for Explainable Deep Learning - [Arxiv] [QA]
- Large Scale Adversarial Representation Learning - [Arxiv] [QA]
- Generalizing from a few environments in safety-critical reinforcement learning - [Arxiv] [QA]
- Learnable Gated Temporal Shift Module for Deep Video Inpainting - [Arxiv] [QA]
- Self-Supervised Dialogue Learning - [Arxiv] [QA]
- Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty - [Arxiv] [QA]
- Improving performance of deep learning models with axiomatic attribution priors and expected gradients - [Arxiv] [QA]
- Unsupervised State Representation Learning in Atari - [Arxiv] [QA]
- Sample-Efficient Neural Architecture Search by Learning Action Space - [Arxiv] [QA]
- One Epoch Is All You Need - [Arxiv] [QA]
- Stand-Alone Self-Attention in Vision Models - [Arxiv] [QA]
- Contrastive Multiview Coding - [Arxiv] [QA]
- Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index - [Arxiv] [QA]
- Factorized Mutual Information Maximization - [Arxiv] [QA]
- Factorized Mutual Information Maximization - [Arxiv] [QA]
- Topology-Preserving Deep Image Segmentation - [Arxiv] [QA]
- Self-Supervised Learning for Contextualized Extractive Summarization - [Arxiv] [QA]
- Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis - [Arxiv] [QA]
- HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips - [Arxiv] [QA]
- Selfie: Self-supervised Pretraining for Image Embedding - [Arxiv] [QA]
- XRAI: Better Attributions Through Regions - [Arxiv] [QA]
- Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers - [Arxiv] [QA]
- Image Synthesis with a Single (Robust) Classifier - [Arxiv] [QA]
- Automated Machine Learning: State-of-The-Art and Open Challenges - [Arxiv] [QA]
- Learning Representations by Maximizing Mutual Information Across Views - [Arxiv] [QA]
- Zero-Shot Semantic Segmentation - [Arxiv] [QA]
- Rethinking Loss Design for Large-scale 3D Shape Retrieval - [Arxiv] [QA]
- Latent Retrieval for Weakly Supervised Open Domain Question Answering - [Arxiv] [QA]
- Learning to Generate Grounded Visual Captions without Localization Supervision - [Arxiv] [QA]
- Attention Is (not) All You Need for Commonsense Reasoning - [Arxiv] [QA]
- MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms - [Arxiv] [QA]
- Align-and-Attend Network for Globally and Locally Coherent Video Inpainting - [Arxiv] [QA]
- Let's Agree to Agree: Neural Networks Share Classification Order on Real Datasets - [Arxiv] [QA]
- Why do These Match? Explaining the Behavior of Image Similarity Models - [Arxiv] [QA]
- Countering Noisy Labels By Learning From Auxiliary Clean Labels - [Arxiv] [QA]
- Data-Efficient Image Recognition with Contrastive Predictive Coding - [Arxiv] [QA]
- FastSpeech: Fast, Robust and Controllable Text to Speech - [Arxiv] [QA]
- Deeper Text Understanding for IR with Contextual Neural Language Modeling - [Arxiv] [QA]
- PEPSI++: Fast and Lightweight Network for Image Inpainting - [Arxiv] [QA]
- Evolving Rewards to Automate Reinforcement Learning - [Arxiv] [QA]
- Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization - [Arxiv] [QA]
- Deep Flow-Guided Video Inpainting - [Arxiv] [QA]
- Frame-Recurrent Video Inpainting by Robust Optical Flow Inference - [Arxiv] [QA]
- Characterizing the invariances of learning algorithms using category theory - [Arxiv] [QA]
- Deep Video Inpainting - [Arxiv] [QA]
- Unsupervised Pre-Training of Image Features on Non-Curated Data - [Arxiv] [QA]
- Scaling and Benchmarking Self-Supervised Visual Representation Learning - [Arxiv] [QA]
- Visualizing Deep Networks by Optimizing with Integrated Gradients - [Arxiv] [QA]
- Full-Gradient Representation for Neural Network Visualization - [Arxiv] [QA]
- Segmentation is All You Need - [Arxiv] [QA]
- A critical analysis of self-supervision, or what we can learn from a single image - [Arxiv] [QA]
- TVQA+: Spatio-Temporal Grounding for Video Question Answering - [Arxiv] [QA]
- DynamoNet: Dynamic Action and Motion Network - [Arxiv] [QA]
- Free-form Video Inpainting with 3D Gated Convolution and Temporal PatchGAN - [Arxiv] [QA]
- GraphNAS: Graph Neural Architecture Search with Reinforcement Learning - [Arxiv] [QA]
- Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring - [Arxiv] [QA]
- SelFlow: Self-Supervised Learning of Optical Flow - [Arxiv] [QA]
- Self-Supervised Audio-Visual Co-Segmentation - [Arxiv] [QA]
- Understanding Neural Networks via Feature Visualization: A survey - [Arxiv] [QA]
- Document Expansion by Query Prediction - [Arxiv] [QA]
- Deep Fusion Network for Image Completion - [Arxiv] [QA]
- Semantically Aligned Bias Reducing Zero Shot Learning - [Arxiv] [QA]
- HARK Side of Deep Learning -- From Grad Student Descent to Automated Machine Learning - [Arxiv] [QA]
- Understanding the Behaviors of BERT in Ranking - [Arxiv] [QA]
- Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting - [Arxiv] [QA]
- Counterfactual Visual Explanations - [Arxiv] [QA]
- The Geometry of Bayesian Programming - [Arxiv] [QA]
- Focus Is All You Need: Loss Functions For Event-based Vision - [Arxiv] [QA]
- CEDR: Contextualized Embeddings for Document Ranking - [Arxiv] [QA]
- VORNet: Spatio-temporally Consistent Video Inpainting for Object Removal - [Arxiv] [QA]
- wav2vec: Unsupervised Pre-training for Speech Recognition - [Arxiv] [QA]
- ThumbNet: One Thumbnail Image Contains All You Need for Recognition - [Arxiv] [QA]
- On zero-shot recognition of generic objects - [Arxiv] [QA]
- Leveraging the Invariant Side of Generative Zero-Shot Learning - [Arxiv] [QA]
- Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics - [Arxiv] [QA]
- Detecting Human-Object Interactions via Functional Generalization - [Arxiv] [QA]
- Data Shapley: Equitable Valuation of Data for Machine Learning - [Arxiv] [QA]
- VideoBERT: A Joint Model for Video and Language Representation Learning - [Arxiv] [QA]
- Creativity Inspired Zero-Shot Learning - [Arxiv] [QA]
- Interpreting Black Box Models via Hypothesis Testing - [Arxiv] [QA]
- Wasserstein Dependency Measure for Representation Learning - [Arxiv] [QA]
- Self-Supervised Learning via Conditional Motion Propagation - [Arxiv] [QA]
- Simple Applications of BERT for Ad Hoc Document Retrieval - [Arxiv] [QA]
- Generalized Convolution and Efficient Language Recognition - [Arxiv] [QA]
- sharpDARTS: Faster and More Accurate Differentiable Architecture Search - [Arxiv] [QA]
- Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set - [Arxiv] [QA]
- Learning Correspondence from the Cycle-Consistency of Time - [Arxiv] [QA]
- A Deep Look into Neural Ranking Models for Information Retrieval - [Arxiv] [QA]
- Turbo Learning Framework for Human-Object Interactions Recognition and Human Pose Estimation - [Arxiv] [QA]
- All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification - [Arxiv] [QA]
- Pluralistic Image Completion - [Arxiv] [QA]
- Deep Reinforcement Learning of Volume-guided Progressive View Inpainting for 3D Point Scene Completion from a Single Depth Image - [Arxiv] [QA]
- CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog - [Arxiv] [QA]
- Self-Supervised Learning of 3D Human Pose using Multi-view Geometry - [Arxiv] [QA]
- High-Fidelity Image Generation With Fewer Labels - [Arxiv] [QA]
- Learning Latent Plans from Play - [Arxiv] [QA]
- Lenses and Learners - [Arxiv] [QA]
- Change Detection with the Kernel Cumulative Sum Algorithm - [Arxiv] [QA]
- Stabilizing the Lottery Ticket Hypothesis - [Arxiv] [QA]
- Stabilizing the Lottery Ticket Hypothesis - [Arxiv] [QA]
- Differentiable Causal Computations via Delayed Trace - [Arxiv] [QA]
- Semantic-Guided Multi-Attention Localization for Zero-Shot Learning - [Arxiv] [QA]
- Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels - [Arxiv] [QA]
- A Theoretical Analysis of Contrastive Unsupervised Representation Learning - [Arxiv] [QA]
- From open learners to open games - [Arxiv] [QA]
- Evaluating the Search Phase of Neural Architecture Search - [Arxiv] [QA]
- Predicting city safety perception based on visual image content - [Arxiv] [QA]
- SC-FEGAN: Face Editing Generative Adversarial Network with User's Sketch and Color - [Arxiv] [QA]
- CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model - [Arxiv] [QA]
- Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey - [Arxiv] [QA]
- LS-Tree: Model Interpretation When the Data Are Linguistic - [Arxiv] [QA]
- Towards Automatic Concept-based Explanations - [Arxiv] [QA]
- Depthwise Convolution is All You Need for Learning Multiple Visual Domains - [Arxiv] [QA]
- Collaborative Sampling in Generative Adversarial Networks - [Arxiv] [QA]
- Parameter-Efficient Transfer Learning for NLP - [Arxiv] [QA]
- Compositionality for Recursive Neural Networks - [Arxiv] [QA]
- Personalized Dialogue Generation with Diversified Traits - [Arxiv] [QA]
- On the (In)fidelity and Sensitivity for Explanations - [Arxiv] [QA]
- Revisiting Self-Supervised Visual Representation Learning - [Arxiv] [QA]
- Diffusion Variational Autoencoders - [Arxiv] [QA]
- Diffusion Variational Autoencoders - [Arxiv] [QA]
- Self-Supervised Generalisation with Meta Auxiliary Learning - [Arxiv] [QA]
- Improving Sequence-to-Sequence Learning via Optimal Transport - [Arxiv] [QA]
- Foreground-aware Image Inpainting - [Arxiv] [QA]
- Passage Re-ranking with BERT - [Arxiv] [QA]
- Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions - [Arxiv] [QA]
- Detecting Overfitting of Deep Generative Networks via Latent Recovery - [Arxiv] [QA]
- A Comprehensive Survey on Graph Neural Networks - [Arxiv] [QA]
- Visualizing Deep Similarity Networks - [Arxiv] [QA]
- EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning - [Arxiv] [QA]
- A Theoretical Analysis of Deep Q-Learning - [Arxiv] [QA]
- A Theoretical Analysis of Deep Q-Learning - [Arxiv] [QA]