On-device real-time few-shot face stylization – Google Research Blog

Posted by Haolin Jia, Software program Engineer, and Qifei Wang, Senior Software program Engineer, Core ML

Lately, we’ve got witnessed rising curiosity throughout customers and researchers in built-in augmented actuality (AR) experiences utilizing real-time face function technology and modifying capabilities in cell functions, together with brief movies, digital actuality, and gaming. Consequently, there’s a rising demand for light-weight, but high-quality face technology and modifying fashions, which are sometimes based mostly on generative adversarial community (GAN) methods. Nevertheless, the vast majority of GAN fashions undergo from excessive computational complexity and the necessity for a big coaching dataset. As well as, additionally it is necessary to make use of GAN fashions responsibly.

On this publish, we introduce MediaPipe FaceStylizer, an environment friendly design for few-shot face stylization that addresses the aforementioned mannequin complexity and knowledge effectivity challenges whereas being guided by Google’s accountable AI Ideas. The mannequin consists of a face generator and a face encoder used as GAN inversion to map the picture into latent code for the generator. We introduce a mobile-friendly synthesis community for the face generator with an auxiliary head that converts options to RGB at every stage of the generator to generate prime quality pictures from coarse to effective granularities. We additionally rigorously designed the loss capabilities for the aforementioned auxiliary heads and mixed them with the frequent GAN loss capabilities to distill the scholar generator from the instructor StyleGAN mannequin, leading to a light-weight mannequin that maintains excessive technology high quality. The proposed answer is out there in open supply by MediaPipe. Customers can fine-tune the generator to be taught a method from one or a couple of pictures utilizing MediaPipe Mannequin Maker, and deploy to on-device face stylization functions with the custom-made mannequin utilizing MediaPipe FaceStylizer.

Few-shot on-device face stylization

An end-to-end pipeline

Our objective is to construct a pipeline to help customers to adapt the MediaPipe FaceStylizer to completely different types by fine-tuning the mannequin with a couple of examples. To allow such a face stylization pipeline, we constructed the pipeline with a GAN inversion encoder and environment friendly face generator mannequin (see under). The encoder and generator pipeline can then be tailored to completely different types through a few-shot studying course of. The person first sends a single or a couple of related samples of the model pictures to MediaPipe ModelMaker to fine-tune the mannequin. The fine-tuning course of freezes the encoder module and solely fine-tunes the generator. The coaching course of samples a number of latent codes near the encoding output of the enter model pictures because the enter to the generator. The generator is then educated to reconstruct a picture of an individual’s face within the model of the enter model picture by optimizing a joint adversarial loss operate that additionally accounts for model and content material. With such a fine-tuning course of, the MediaPipe FaceStylizer can adapt to the custom-made model, which approximates the person’s enter. It could actually then be utilized to stylize check pictures of actual human faces.

Generator: BlazeStyleGAN

The StyleGAN mannequin household has been broadly adopted for face technology and varied face modifying duties. To help environment friendly on-device face technology, we based mostly the design of our generator on StyleGAN. This generator, which we name BlazeStyleGAN, is much like StyleGAN in that it additionally comprises a mapping community and synthesis community. Nevertheless, for the reason that synthesis community of StyleGAN is the foremost contributor to the mannequin’s excessive computation complexity, we designed and employed a extra environment friendly synthesis community. The improved effectivity and technology high quality is achieved by:

Lowering the latent function dimension within the synthesis community to 1 / 4 of the decision of the counterpart layers within the instructor StyleGAN,

Designing a number of auxiliary heads to rework the downscaled function to the picture area to type a coarse-to-fine picture pyramid to guage the perceptual high quality of the reconstruction, and

Skipping all however the last auxiliary head at inference time.

With the newly designed structure, we prepare the BlazeStyleGAN mannequin by distilling it from a instructor StyleGAN mannequin. We use a multi-scale perceptual loss and adversarial loss within the distillation to switch the excessive constancy technology functionality from the instructor mannequin to the scholar BlazeStyleGAN mannequin and likewise to mitigate the artifacts from the instructor mannequin.

Extra particulars of the mannequin structure and coaching scheme will be present in our paper.

Visible comparability between face samples generated by StyleGAN and BlazeStyleGAN. The photographs on the primary row are generated by the instructor StyleGAN. The photographs on the second row are generated by the scholar BlazeStyleGAN. The face generated by BlazeStyleGAN has related visible high quality to the picture generated by the instructor mannequin. Some outcomes show the scholar BlazeStyleGAN suppresses the artifacts from the instructor mannequin within the distillation.

Within the above determine, we show some pattern outcomes of our BlazeStyleGAN. By evaluating with the face picture generated by the instructor StyleGAN mannequin (high row), the pictures generated by the scholar BlazeStyleGAN (backside row) keep excessive visible high quality and additional cut back artifacts produced by the instructor because of the loss operate design in our distillation.

An encoder for environment friendly GAN inversion

To help image-to-image stylization, we additionally launched an environment friendly GAN inversion because the encoder to map enter pictures to the latent area of the generator. The encoder is outlined by a MobileNet V2 spine and educated with pure face pictures. The loss is outlined as a mixture of picture perceptual high quality loss, which measures the content material distinction, model similarity and embedding distance, in addition to the L1 loss between the enter pictures and reconstructed pictures.

On-device efficiency

We documented mannequin complexities by way of parameter numbers and computing FLOPs within the following desk. In comparison with the instructor StyleGAN (33.2M parameters), BlazeStyleGAN (generator) considerably reduces the mannequin complexity, with solely 2.01M parameters and 1.28G FLOPs for output decision 256×256. In comparison with StyleGAN-1024 (producing picture measurement of 1024×1024), the BlazeStyleGAN-1024 can cut back each mannequin measurement and computation complexity by 95% with no notable high quality distinction and may even suppress the artifacts from the instructor StyleGAN mannequin.

Mannequin

Picture Measurement

#Params (M)

FLOPs (G)

StyleGAN

1024

33.17

74.3

BlazeStyleGAN

1024

2.07

4.70

BlazeStyleGAN

512

2.05

1.57

BlazeStyleGAN

256

2.01

1.28

Encoder

256

1.44

0.60

Mannequin complexity measured by parameter numbers and FLOPs.

We benchmarked the inference time of the MediaPipe FaceStylizer on varied high-end cell gadgets and demonstrated the leads to the desk under. From the outcomes, each BlazeStyleGAN-256 and BlazeStyleGAN-512 achieved real-time efficiency on all GPU gadgets. It could actually run in lower than 10 ms runtime on a high-end telephone’s GPU. BlazeStyleGAN-256 also can obtain real-time efficiency on the iOS gadgets’ CPU.

Mannequin

BlazeStyleGAN-256 (ms)

Encoder-256 (ms)

iPhone 11

12.14

11.48

iPhone 12

11.99

12.25

iPhone 13 Professional

7.22

5.41

Pixel 6

12.24

11.23

Samsung Galaxy S10

17.01

12.70

Samsung Galaxy S20

8.95

8.20

Latency benchmark of the BlazeStyleGAN, face encoder, and the end-to-end pipeline on varied cell gadgets.

Equity analysis

The mannequin has been educated with a excessive range dataset of human faces. The mannequin is predicted to be honest to completely different human faces. The equity analysis demonstrates the mannequin performs good and balanced by way of human gender, skin-tone, and ages.

Face stylization visualization

Some face stylization outcomes are demonstrated within the following determine. The photographs within the high row (in orange containers) symbolize the model pictures used to fine-tune the mannequin. The photographs within the left column (within the inexperienced containers) are the pure face pictures used for testing. The 2×4 matrix of pictures represents the output of the MediaPipe FaceStylizer which is mixing outputs between the pure faces on the left-most column and the corresponding face types on the highest row. The outcomes show that our answer can obtain high-quality face stylization for a number of widespread types.

Pattern outcomes of our MediaPipe FaceStylizer.

MediaPipe Options

The MediaPipe FaceStylizer goes to be launched to public customers in MediaPipe Options. Customers can leverage MediaPipe Mannequin Maker to coach a custom-made face stylization mannequin utilizing their very own model pictures. After coaching, the exported bundle of TFLite mannequin recordsdata will be deployed to functions throughout platforms (Android, iOS, Internet, Python, and so on.) utilizing the MediaPipe Duties FaceStylizer API in just some traces of code.

Acknowledgements

This work is made potential by a collaboration spanning a number of groups throughout Google. We’d prefer to acknowledge contributions from Omer Tov, Yang Zhao, Andrey Vakunov, Fei Deng, Ariel Ephrat, Inbar Mosseri, Lu Wang, Chuo-Ling Chang, Tingbo Hou, and Matthias Grundmann.

Source link

On-device real-time few-shot face stylization – Google Research Blog

ML/AI Platform Build vs Buy Decision: What Factors to Consider

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

Conformer-Based Speech Recognition on Extreme Edge-Computing Devices

LLM Monitoring and Observability — A Summary of Techniques and Approaches for Responsible AI | by Josh Poduska | Sep, 2023

Future science at the molecular level | MIT News

Recommended For You

ML/AI Platform Build vs Buy Decision: What Factors to Consider

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

Conformer-Based Speech Recognition on Extreme Edge-Computing Devices

Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness

Understanding the visual knowledge of language models | MIT News

Future science at the molecular level | MIT News

Advantech to embed Cogniteam's platform into its AMRs

Vianai's New Open-Source Solution Tackles AI's Hallucination Problem

Leave a Reply Cancel reply

Helping robots grasp the unpredictable | MIT News

A technique for more effective multipurpose robots | MIT News

The Current State of AI! (My Personal News Recap)

Robotics investments reach $418M in November 2023

2024 World Battery & Energy Storage Industry Expo (WBE)

MIT faculty, instructors, students experiment with generative AI in teaching and learning | MIT News

What is AI – Artificial Intelligence in Telugu | Future of AI | TeluguBadi

Zion Solutions Group Joins Forces with Locus Robotics to Supercharge Warehouse Productivity

A method to enable safe mobile robot navigation in dynamic environments

Robot Talk Episode 90 – Robotically Augmented People

Eliminating Vector Quantization: Diffusion-Based Autoregressive AI Models for Image Generation

RBR50 Spotlight: Slip Robotics minimizes trailer loading times with simple approach

Voyage Multilingual 2 Embedding Evaluation | by Lars Wiik | Jun, 2024

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password

On-device real-time few-shot face stylization – Google Research Blog

You might also like

Few-shot on-device face stylization

An end-to-end pipeline

Generator: BlazeStyleGAN

An encoder for environment friendly GAN inversion

On-device efficiency

Equity analysis

Face stylization visualization

MediaPipe Options

Acknowledgements

LLM Monitoring and Observability — A Summary of Techniques and Approaches for Responsible AI | by Josh Poduska | Sep, 2023

Future science at the molecular level | MIT News

Recommended For You

Leave a Reply Cancel reply

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password