Video Scene Location Recognition Using AI: Overview and Summary

AI Video & Visuals


author:

(1) Lukas Korel, Faculty of Information Technologies, Czech Technical University, Prague, Czech Republic

(2) Petr Pruc, Faculty of Information Technology, Czech Technical University, Prague, Czech Republic

(3) Jiri Tumpac, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic

(4) Martin Holena, Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague, Czech Republic.

Overview and Introduction

ANN-based scene classification

methodology

experiment

Conclusions and future work, Acknowledgements and References

AbstractThis paper considers the possibility of using artificial neural networks to recognize scenes from video sequences containing a small number of repeated filming locations (e.g., TV series). The basic idea of ​​the presented approach is to select a set of frames from each scene, transform them with a pre-trained single-image preprocessing convolutional network, and classify the scene locations in subsequent layers of the neural network. The considered networks are tested and compared on a dataset taken from The Big Bang Theory TV series. We investigated different neural network layers for combining individual frames, in particular AveragePooling, MaxPooling, Product, Flatten, LSTM, and Bidirectional LSTM layers. Only a few approaches proved to be suitable for the task at hand.

1.First of all

When watching a video, people can recognize where the current scene is located. When watching a movie or TV series, they can recognize that a new scene will be in the same place as one they've already seen. Finally, they can understand the hierarchy of scenes. All of this underpins human comprehension of video.

The role of location identification in human scene perception has inspired research into scene location classification with artificial neural networks (ANNs). A more ambitious goal is to be able to memorize unknown video locations and use this data to identify the video scene at that location and mark it with the same label. In this paper we report on ongoing research in that direction. We describe the methodology adopted and present first experimental results obtained with six different neural networks.

The remainder of the paper is structured as follows: In the next section, we discuss existing approaches to solving this problem. Section 3 is divided into two parts. The first part is about data preparation before using it in an ANN. The second part is about the design of the ANN in the experiments. Finally, in Section 4, the last section before the conclusion, we present the results of the experiments with these ANNs.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *