🏠 Home
This repository contains a resource for Multi-Modal Learning in Visual RAGs and Collaborative Filtering with side information.
⚙️ Contributions
Our work offers:
- Formal Framework for Multimodal RAG: a method to integrate textual and visual embeddings into RAG-based pipelines for movie recommenders,
- LLM-Based Data Augmentation: a pipeline to generate rich textual descriptions from minimal metadata,
- Multi-Modal Item Embeddings and Fusion: precomputed vector representations from OpenAI, SentenceTransformers, and Llama 3.0 for multimodal retrieval.
Title 2
Sample