Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
-
Updated
Feb 3, 2023 - Jupyter Notebook
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
An easy implementation of Faster R-CNN (https://arxiv.org/pdf/1506.01497.pdf) in PyTorch.
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)
An easy implementation of FPN (https://arxiv.org/pdf/1612.03144.pdf) in PyTorch.
Real-time semantic image segmentation on mobile devices
Using LSTM or Transformer to solve Image Captioning in Pytorch
A Clone version from Original SegCaps source code with enhancements on MS COCO dataset.
Pytorch implementation of image captioning using transformer-based model.
Adds SPICE metric to coco-caption evaluation server codes
Convert segmentation binary mask images to COCO JSON format.
PyTorch implementation of paper: "Self-critical Sequence Training for Image Captioning"
We aim to generate realistic images from text descriptions using GAN architecture. The network that we have designed is used for image generation for two datasets: MSCOCO and CUBS.
The pytorch implementation on “Fine-Grained Image Captioning with Global-Local Discriminative Objective”
Clone of COCO API - Dataset @ http://cocodataset.org/ - with changes to support Windows build and python3
A demo for mapping class labels from ImageNet to COCO.
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
MS COCO captions in Arabic
Karpathy Splits json files for image captioning
Image caption generation using GRU-based attention mechanism
An end-to-end vision and language model incorporating explicit knowledge graphs and OOD-detection.
Add a description, image, and links to the mscoco-dataset topic page so that developers can more easily learn about it.
To associate your repository with the mscoco-dataset topic, visit your repo's landing page and select "manage topics."