Moviescope

Moviescope: Large-scale Analysis of
Movies using Multiple Modalities

We present a large scale study comparing the effectiveness of visual, audio, text, and metadata-based features for predicting high-level information about movies such as their genre or estimated budget. We demonstrate the usefulness of content-based methods in this domain in contrast to human-based and metadata-based predictions in the era of deep learning. Additionally, we provide a comprehensive study of temporal feature aggregation methods for representing video and text and find that simple pooling operations are effective in this domain. We also show to what extent different modalities are complementary to each other.

Moviescope is based on the IMDB 5000 dataset consisting of 5.043 movie records. This dataset was released under an Open Database License as part of a Kaggle Competition. We augmented this dataset by crawling video trailers associated with each movie from YouTube and text plots from Wikipedia.

Video Features

Poster Features

Train, Validation and Test Splits with Plots

Links to Movie Trailers

We used video, text, audio, posters and metadata representations for movie genre prediction and movie budget estimation.

In order to combine multiple modalities, we use the output scores from the models associated with each individual modality as inputs to a weighted regression in order to obtain final movie genre predictions. Intuitively, these weights in the linear combination can be interpreted as the contribution of each modality toward predicting a genre.

Some results:

Paper

Video Features

Poster Features

Train, Validation and Test Splits with Plots

@article{2019Moviescope,
title={Moviescope: Large-scale Analysis of Movies using Multiple Modalities},
author={Paola Cascante-Bonilla and Kalpathy Sitaraman and Mengjia Luo and Vicente Ordonez},
journal={ArXiv},
year={2019},
volume={abs/1908.03180}
}

[38] H. Zhou, T. Hermans, A. V. Karandikar, and J. M. Rehg. Movie genre classification via scene categorization. In Proceedings of the 18th ACM International Conference on Multimedia, MM ’10, pages 747–750, New York, NY, USA, 2010. ACM
[27] G. S. Simoes, J. Wehrmann, R. C. Barros, and D. D. Ruiz. Movie genre classification with convolutional neural networks. In 2016 International Joint Conference on Neural Networks, IJCNN 2016, Vancouver, BC, Canada, July 24-29, 2016, pages 259–266, 2016.

Moviescope: Large-scale Analysis of
Movies using Multiple Modalities

Dataset

Modal Representations

Multimodal Fusion

+Automatic Budget Estimation & Human-based vs Content-based Predictions