Skip to content

🏠 Home

This repository contains a resource for Multi-Modal Learning in Visual RAGs and Collaborative Filtering with side information.

RAG-VisualRec

⚙️ Contributions

Our work offers:

  • Formal Framework for Multimodal RAG: a method to integrate textual and visual embeddings into RAG-based pipelines for movie recommenders,
  • LLM-Based Data Augmentation: a pipeline to generate rich textual descriptions from minimal metadata,
  • Multi-Modal Item Embeddings and Fusion: precomputed vector representations from OpenAI, SentenceTransformers, and Llama 3.0 for multimodal retrieval.

Title 2

Sample