Advancements in Video Forgery Detection: Novel Methods for Object and Facial Forgery Detection Using Temporal and Spatial Analysis

30 May

Advancements in Video Forgery Detection: Novel Methods for Object and Facial Forgery Detection Using Temporal and Spatial Analysis

Authors- Kasarla. Rajiv Reddy, Kaipa Roopesh Kumar Reddy, Sathineni. Sai Pranav Reddy

Abstract- With the advent of sophisticated yet easy to use video editing and forgery tools, detection of malicious editing and forgery in digital videos is becoming increasingly challenging as development of forensic investigation tools for authenticating the integrity of digital videos has lagged behind. The work reported in thesis explores four novel methods aimed towards detecting object and facial forgeries in video: temporal-RNN, spatial-RNN, fRNN, and lightweight 3DRNN. The temporal-RNN and spatial-RNN methods are designed for comprehensive detection of object-based forgeries. They analyse temporal and spatial features within a video in order to detect forged frames within a video and to mark forged regions within forged frames. The benefits of proposed detectors were exhaustively verified using recent object based forged video datasets under various testing scenarios. Significant improvements in detection performance and forged region localization were observed in comparison to existing detection methods. A frequency-based RNN is developed to identify facially manipulated videos (such as Deep Fakes, Face Swap andFace2Face). The shallow fRNN architecture was verified for binary and multi-class forgery detection on recent datasets. The fRNN was also benchmarked on the Face Forensics++ platform and binary detection performance of fRNN was found to be better than existing machine learning and deep learning based detectors. A lightweight 3DRNN is designed to detect the facially manipulated videos. The detector utilizes the combined effect of spatial and temporal features in a unique manner to label the given video as forged. The 3DRNN architecture ensures low computational complexity in terms of number of trainable parameters, making it a good choice for deployment in memory and resource constraint devices such as smartphones. The method also performed well against low quality, highly compressed videos that are commonly found across social media.

DOI: /10.61463/ijset.vol.12.issue3.179