Moviescope: Large-scale Analysis of
Movies using Multiple Modalities

Film media is a rich form of artistic expression.
Unlike photography, and short videos, movies contain a storyline that
is deliberately complex and intricate in order to engage its audience.

In this paper we introduce Moviescope, a new large-scale dataset of
5,000 movies with corresponding video trailers, posters, plots and metadata.

We present a large scale study comparing the effectiveness of visual, audio, text, and metadata-based features for predicting high-level information about movies such as their genre or estimated budget. We demonstrate the usefulness of content-based methods in this domain in contrast to human-based and metadata-based predictions in the era of deep learning. Additionally, we provide a comprehensive study of temporal feature aggregation methods for representing video and text and find that simple pooling operations are effective in this domain. We also show to what extent different modalities are complementary to each other.

Moviescope is based on the IMDB 5000 dataset consisting of 5.043 movie records. This dataset was released under an Open Database License as part of a Kaggle Competition. We augmented this dataset by crawling video trailers associated with each movie from YouTube and text plots from Wikipedia.

movie_filter Video Features

photo Poster Features

receipt Train, Validation and Test Splits with Plots

link Links to Movie Trailers

Modal Representations
We used video, text, audio, posters and metadata representations for movie genre prediction and movie budget estimation.

Multimodal Fusion
In order to combine multiple modalities, we use the output scores from the models associated with each individual modality as inputs to a weighted regression in order to obtain final movie genre predictions. Intuitively, these weights in the linear combination can be interpreted as the contribution of each modality toward predicting a genre.

Some results:

+Automatic Budget Estimation & Human-based vs Content-based Predictions

Video Features
Poster Features
Train, Validation and Test Splits with Plots

title={Moviescope: Large-scale Analysis of Movies using Multiple Modalities},
author={Paola Cascante-Bonilla and Kalpathy Sitaraman and Mengjia Luo and Vicente Ordonez},

[38] H. Zhou, T. Hermans, A. V. Karandikar, and J. M. Rehg. Movie genre classification via scene categorization. In Proceedings of the 18th ACM International Conference on Multimedia, MM โ€™10, pages 747โ€“750, New York, NY, USA, 2010. ACM
[27] G. S. Simoes, J. Wehrmann, R. C. Barros, and D. D. Ruiz. Movie genre classification with convolutional neural networks. In 2016 International Joint Conference on Neural Networks, IJCNN 2016, Vancouver, BC, Canada, July 24-29, 2016, pages 259โ€“266, 2016.