Movies Visual Features Extracted (MoViFex) Dataset

A comprehensive dataset of visual features extracted from movies in various formats, including full movies, movie shots, and trailers.

Download the Dataset

About the Dataset

The MoViFex dataset is a comprehensive collection of visual embeddings extracted from movies in various aspects. It is designed to facilitate research in the field of movie recommendation and related areas.

Check the Details

Hierarchical Visual Embeddings

Explore visual features extracted at three distinct levels: full-length movies, individual movie shots, and trailers. This milti-level data design supports granular or holistic analysis, catering to diverse applications.

Various Feature Extractors

Use the features extracted using state-of-the-art Convolutional Neural Networks (CNNs), including Inception-v3 and VGG19, providing rich and varied visual representations.

Dissimilar Spatial Dimensions

Choose between Atomic (frame-level) features for detailed study or aggregated representations for higher-level insights.

Versatile Applications

Designed for versatility, the dataset empowers tasks such as movie recommendation, scene analysis, video classification, visual storytelling, etc.

Stats

Quick Facts

Total Number of Movies

Average Frames Extracted per Movie

Total Number of Frames

Average Movie Ratings (exact: 3.88/5.0)

Total Number of Users

Total Number of Interactions

Details

Data Structure Description

Aspect	Value
Total number of movies	274
Average frames extracted per movie	7,732
Total number of frames (or feature vectors)	2,118,647
Accumulative number of genres	723
Average movie ratings	3.88/5.0
Total number of users	158,146
Accumulative number of interactions	2,869,024

Level I. Primary Categories

The dataset is organized into six folders and a stats.json file containing the meta-data for the sources. It is collected from 274 movie videos, where the identifiers and meta-data of the movies are obtained from the MovieLenz-25M dataset. The visual frame (and therefor, feature) extraction rate is 1 FPS.

full_movies containing the visual features extracted from full-length movies (frame-level)
movie_shots containing the visual features extracted from full-length movies' shots (shot-level)
movie_trailers containing the visual features extracted from movie trailers (frame-level)
full_movies_agg containing the aggregated embeddings extracted from full-length movies
movie_shots_agg containing the aggregated embeddings extracted from full-length movies' shots
movie_trailers_agg containing the aggregated embeddings extracted from movie trailers

Level II. Visual Feature Extractors

Inside each folder, there are two folders titled incp3 and vgg19, referring to the feature extractor used to generate the visual features, i.e., Inception-v3 (GoogleNet) and VGG-19, respectively.

[sample]: Inception-v3 and VGG-19 extracted from full-length movies

Level III. Contents (Movies & Trailers)

A: Atomic Features (e.g., full_movies)

Regarding the Atomic visual features (frame- and shot-level), (i.e., full_movies, movie_shots, and movie_trailers), each embedding folder (e.g., /incp3 or /vgg19) contains a set of folders with unique title (e.g., 0000008985) indicating the movie-id in MovieLenz-25M dataset. Each folder has a set of packet files, which will be further discussed in the next section.

B: Aggregated Features (e.g., full_movies_agg)

Regarding the aggregated visual features, (i.e., full_movies_agg, movie_shots_agg, and movie_trailers_agg), each embedding folder (e.g., /incp3 or /vgg19) contains a a set of JSON files with unique titles (e.g., 0000002023) indicating the movie-id in MovieLenz-25M dataset. Each JSON file has a set of embeddings, aggregated using Maximum and Mean methods.

Level IV. Packets (Atomic Feature Folders)

To better organize visual features, each movie folder (e.g., 0000001997) has a set of packets named as packet0001.json to packet000N.json in JSON format. Each packet contains a set of objects with frameId and embeddings. In general, every 25 object (frameId-embedding pair) form a packet, except the last packet that can have less objects.

Dataset File Structure

The dataset is structured in a hierarchical manner, where the top-level folders represent the primary categories of the data. Each folder contains the visual features extracted using different CNN models, i.e., Inception-v3 and VGG-19. The extracted features are stored in the form of packets, each containing a set of frame-level embeddings. You can see the data structure in the image on the left.

Download

Download the Dataset

Note#1

The dataset is comprehensive due to providing thousands of visual features. Hence, you can consider downloading the whole or a portion of it.

Note#2

The dataset can be downloaded directly or accessed via the HuggingFace link for easier integration (read more).

🤗 Download from HuggingFace

Required Space

The detailed size and specifications of the dataset are outlined in the table below.

Feature Type	Total File Count		Size on Disk
Feature Type	Inception-3.0	VGG-19	Inception-3.0	VGG-19
Full Movies (Atomic)	84,872	84,872	35.8 GB	46.1 GB
Movie Shots (Atomic)	16,713	24,598	7.01 GB	13.3 GB
Trailers (Atomic)	1,725	1,725	681 MB	885 MB
Full Movies (Aggregated)	84,872	84,872	10 MB	19 MB
Movie Shots (Aggregated)	16,713	24,598	10 MB	19 MB
Trailers (Aggregated)	1,725	1,725	10 MB	19 MB
Total	214,505		~103.9 GB

Benchmark

Benchmarking the Dataset

Metadata Analysis

Working with the meta-data JSON file for a general analysis.

Check the Code in Colab

Dataset Instances

Working with the dataset instances (frame-level extracted visual features) for various tasks.

Check the Code in Colab

Dataset Instances

Working with the dataset instances (aggregated visual features) for various tasks.

Check the Code in Colab

Team

Collaborators

Ali Tourani

Doctoral Researcher - SnT, University of Luxembourg

Yashar Deldjoo

Tenure-Track Assistant Prof. - Polytechnic University of Bari

Athena Nazary

Scientific Researcher - Polytechnic University of Bari