WHY THIS MATTERS IN BRIEF

Deepfakes and fake content pose a serious threat to social stability and democracy, and they’re getting better and more wepaonised.

 

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trendsconnect, watch a keynote, or browse my blog.

The RSA Conference is known as the “Oscars of Cybersecurity,” and the RSAC Innovation Sandbox has become a benchmark for innovation in the cybersecurity industry. Reality Defender, established in 2021, is a startup specializing in detecting deepfakes and synthetic media. The company offers deepfake detection services across various modalities, with tools designed to identify artificially synthesized and forged text, images, videos, and audio. These solutions cater to government agencies, financial enterprises, media, and other large organizations. According to Reality Defender’s website, the company has assisted public broadcasting companies in Asian countries and multinational banks in dealing with the spread of false information and identity fraud caused by deepfakes.

 

 

Currently, Reality Defender has secured $15 million in Series A funding, led by the venture capital firm DCVC, with participation from Comcast, Ex/ante, Parameter Ventures, and Nat Friedman’s AI Grant, among others.

Reality Defender offers a range of deepfake detection tools capable of identifying common deepfake threats such as face swapping, cloned voices, and fraudulent texts. These tools are provided in the form of APIs or web applications and are used to analyze and search for signs of deepfake modifications in samples submitted for detection, thereby determining whether the data in the uploaded files is synthetic or impersonated.

 

 

To construct a more robust and high-precision deepfake detection system, Reality Defender has integrated a set of artificial intelligence detection models rather than relying on a single model. By examining uploaded files from multiple perspectives, the system ultimately outputs a prediction probability and visualization results.

Reality Defender also provides real-time deepfake screening tools capable of capturing potential artificially synthesized content instantly. This feature is crucial for timely prevention of financial fraud and information leaks. By embedding Reality Defender’s detection tools into their toolkits, users can ensure real-time monitoring of the identity of the person they are communicating with, preventing them from falling into identity theft traps.

In summary, the deepfake detection tools provided by Reality Defender can identify false media content and harmful information, helping teams to guard against and respond to the threats posed by artificial intelligence-generated deepfakes.

In 2017, a Reddit user named “deepfakes” released the first deepfake algorithm, which could project the faces of well-known actors onto pornographic videos. In 2018, BuzzFeed used the FakeApp software to create a deepfake video of a speech by former President Obama. In 2019, an attacker successfully impersonated the voice of a German company’s CEO using artificial intelligence software, tricking the CEO of a British energy company into transferring over $243,000 to the attacker’s account.

 

 

Since the emergence of deepfakes, applications for creating them, such as FakeApp and FaceSwap, have proliferated on the Internet, allowing non-technical individuals to easily and inexpensively produce various types of forged videos. While deepfakes have had a positive impact in the film industry, they have also raised public concerns about the spread of fake news on social media, identity impersonation, and telecom fraud, severely undermining social trust and information order.

Deepfakes are a type of attack that uses deep learning algorithms to capture a person’s facial expressions, movements, and voice characteristics, and learn how to replace faces in images or videos and synthesize deceptively realistic voices. Deepfake content is often indistinguishable to the naked eye.

For visual forgery tasks involving images and videos, deepfake technology requires the alteration of faces, which can be done in two main ways: one is face swapping, where faces are replaced or new faces are synthesized; the other is facial modification, which involves modifying certain attributes of the original face without changing its identifying features, such as faking expressions or specific actions.

Visual deepfake technology typically uses Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GAN) as foundational architectures for style transfer and face stitching, while also incorporating other deep learning techniques to enhance the realism and stability of the generated content, such as the introduction of a Kalman filter and high reconstruction loss in Faceswap-GAN to eliminate jitter caused by face swapping and improve the quality of eye region generation.

 

 

The technology can perform Text-to-Speech synthesis (TTS) and voice conversion. TTS generates voice output for specified text, while voice conversion allows for the conversion of the original voice to a target voice without changing the content of the voice. Audio deepfake technology often involves Artificial Intelligence (AI) models such as hidden Markov models, Gaussian mixture models, autoencoders, autoregressive models, and generative adversarial networks, along with techniques like spectral conversion and dilated convolution to ensure high fidelity of the audio samples.

In recent years, the breakthroughs in generative large models have also become a catalyst for deepfake technology, enabling more accurate simulation of human facial expressions, body movements, and voice tones, making deepfake content more realistic and harder to distinguish, and leading to wider and more uncontrollable dissemination of such content.

The most common techniques for detecting deepfakes in images and videos are based on the detection of feature differences before and after forgery or the detection of features specific to GAN-generated images.

The former typically has better generalization and uses deep learning models to extract changing features, such as detecting artifacts in images or single video frames, or detecting temporal features between video frames.

The latter is more targeted, as GAN models used in deepfakes have distinct color processing methods and chromatic space statistical features compared to images captured by cameras, often leaving stable model fingerprints in the generated images. Therefore, detecting features specific to GAN-generated images has also become a separate direction of research.

 

 

For one-dimensional audio forgery signals, existing detection schemes mainly rely on the detection of special noise information and forgery traces in forged voice through the differentiation of biological information such as speech rate, voiceprint, and spectral distribution. Since relying on a single feature is difficult to fully grasp the clues of forgery, some state-of-the-art (SOTA) detection schemes integrate multiple complementary feature information to enhance the robustness of the detection algorithm.

Although research in deepfake detection technology has made progress, there are still some urgent issues to be addressed as deepfake technology rapidly advances such as enhancing the Generalization and Robustness of Detection Algorithms: Current detection methods lack learning of common features across various types of forgeries and are overly dependent on the distribution and characteristics of training data. When faced with forgery types outside the training set or unknown forged data, their detection capabilities significantly decline. Therefore, it is necessary to improve the diversity of training data for detection algorithms while also strengthening their attention to and extraction of various features. This includes considering multiple aspects such as changes in lighting, facial texture characteristics, changes in facial contours, the position of facial features, voiceprint, phonemes, and speech rate to enhance the generalization and robustness of detection algorithms.

As well as advancing Multimodal Detection Algorithm Research: Most existing detection solutions only focus on a single type of forged sample (image or voice), which is considered monomodal forgery detection. However, to achieve more realistic effects, deepfake technology in real life may simultaneously forge video and audio. Therefore, researching multimodal forged data detection technology has become urgent. It is also possible to improve the detection accuracy of actual complex deepfake technology by integrating various detection algorithms, similar to what Reality Defender does.

 

 

The explosive growth of generative models like ChatGPT and Sora has become a new weapon for deepfake technology, increasing the difficulty of detection. In addition to Reality Defender, many tech giants and startups are investing in research and tool development in the field of artificial intelligence deepfake detection, such as Intel’s FakeCatcher, the deepfake detection startup Optic, and Blackbird.ai, which was established in 2014 and is dedicated to identifying artificially altered data.

It can be anticipated that deepfake detection products will focus more on real-time detection and multimodal detection needs, improving detection reliability and accuracy.

Regulations and industry standards will also be introduced to address more mature and complex deepfake technology, providing a favorable market environment for the development of deepfake detection products.

The post Deepfake detection startups clean up at RSAC 2024 awards ceremony appeared first on Matthew Griffin | Keynote Speaker & Master Futurist.

By