Image-Captioning-with-ViT-GPT2

Summary

Engineered an advanced image captioning system that seamlessly combines a Vision Transformer (ViT) and GPT-2 to generate highly descriptive and contextually relevant captions for images. This project showcases the integration of state-of-the-art deep learning models to bridge the gap between visual data and natural language, enabling the automatic generation of meaningful descriptions from images.

GitHub Repository: Image Captioning with ViT-GPT2

PROJECTS

Pranathi Goli

CONTACT

Image-Captioning-with-ViT-GPT2

Summary

Any trademark remains the property of its respective owner(s)