bnew

Veteran
Joined
Nov 1, 2015
Messages
56,220
Reputation
8,261
Daps
157,924



PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360°​

Sizhe An, Hongyi Xu, Yichun Shi, Guoxian Song, Umit Y. Ogras, Linjie Luo
University of Wisconsin-Madison, ByteDance Inc.
Paper arXiv Video Code


panohead_overview-min.gif


panohead_inversion_johnson.gif

Abstract​

Synthesis and reconstruction of 3D human head has gained increasing interests in computer vision and computer graphics recently. Existing state-of-the-art 3D generative adversarial networks (GANs) for 3D human head synthesis are either limited to near-frontal views or hard to preserve 3D consistency in large view angles. We propose PanoHead, the first 3D-aware generative model that enables high-quality view-consistent image synthesis of full heads in 360° with diverse appearance and detailed geometry using only in-the-wild unstructured images for training. At its core, we lift up the representation power of recent 3D GANs and bridge the data alignment gap when training from in-the-wild images with widely distributed views. Specifically, we propose a novel two-stage self-adaptive image alignment for robust 3D GAN training. We further introduce a tri-grid neural volume representation that effectively addresses front-face and back-head feature entanglement rooted in the widely-adopted tri-plane formulation. Our method instills prior knowledge of 2D image segmentation in adversarial learning of 3D neural scene structures, enabling compositable head synthesis in diverse backgrounds. Benefiting from these designs, our method significantly outperforms previous 3D GANs, generating high-quality 3D heads with accurate geometry and diverse appearances, even with long wavy and afro hairstyles, renderable from arbitrary poses. Furthermore, we show that our system can reconstruct full 3D heads from single input images for personalized realistic 3D avatars.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,220
Reputation
8,261
Daps
157,924



CloneCleaner​

An extension for Automatic1111 to work around Stable Diffusion's "clone problem". It automatically modifies your prompts with random names, nationalities, hair style and hair color to create more variations in generated people.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,220
Reputation
8,261
Daps
157,924

64991359c628c150d21c19f6_Mosaic_Databricks_Teamup4-p-2000.jpg

by
Naveen Rao, Hanlin Tang
on
June 26, 2023

MosaicML Agrees to Join Databricks to Power Generative AI for All​

Together with Databricks, we can bring our customers and community to the forefront of AI faster than ever before.
We are excited to announce that MosaicML has agreed to join Databricks to further our vision of making custom AI model development available to any organization.

We started MosaicML to solve the hard engineering and research problems necessary to make large scale neural network training and inference more accessible to everyone. With the recent generative AI wave, this mission has taken center stage. We fundamentally believe in a better world where everyone is empowered to train their own models, imbued with their own data, wisdom, and creativity, rather than have this capability centralized in a few generic models.

When Ali, Patrick, and the other Databricks co-founders reached out about a partnership, we immediately recognized them as kindred spirits: researchers-turned-entrepreneurs sharing a similar mission. Their strong company culture and focus on engineering mirrored what we thought a grown-up MosaicML would be

Today, we’re excited to announce that MosaicML has signed an agreement to join Databricks to create a leading generative AI platform. The transaction is subject to certain customary closing conditions and regulatory clearances, and the companies will remain independent until those reviews are complete, but we are excited about what we can do together with Databricks when the transaction closes.

Our flagship products will continue to grow. To our current customers and those on our long waitlist: this partnership will only help us serve you faster! MosaicML training, inference, and our MPT family of foundation models are already powering generative AI for enterprises and developers around the world, and together with Databricks, we look forward to going bigger with all of you.

Generative AI is at an inflection point. Will the future rely mostly on large generic models owned by a few? Or will we witness a true Cambrian explosion of custom AI models that are built by many developers and companies from every corner of the world? MosaicML’s expertise in generative AI software infrastructure, model training, and model deployment, combined with Databricks’ customer reach and engineering capacity, will allow us to tip the scales in the favor of the many. We look forward to continuing this journey together with the AI community. As always, please join us in conversation on Twitter, LinkedIn, or in our community forum.

We’d like to thank our board members Matt Ocko at DCVC, Shahin Farshchi at Lux Capital, and Peter Barrett at Playground Global, as well as all our investors who have supported us through our journey.

Let’s keep going!
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,220
Reputation
8,261
Daps
157,924

NUWA-XL​


NUWA-XL is a cutting-edge multimodal generative model with the remarkable ability to produce extremely long video based on provided scripts in a “coarse-to-fine” process.

Long Video​


Given the prompts of a script, NUWA-XL can generate an extremely long video that conforms to it in a “coarse-to-fine” process.

3xNXBEHl.png






NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation​

Shengming Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan
In this paper, we propose NUWA-XL, a novel Diffusion over Diffusion architecture for eXtremely Long video generation. Most current work generates long videos segment by segment sequentially, which normally leads to the gap between training on short videos and inferring long videos, and the sequential generation is inefficient. Instead, our approach adopts a ``coarse-to-fine'' process, in which the video can be generated in parallel at the same granularity. A global diffusion model is applied to generate the keyframes across the entire time range, and then local diffusion models recursively fill in the content between nearby frames. This simple yet effective strategy allows us to directly train on long videos (3376 frames) to reduce the training-inference gap, and makes it possible to generate all segments in parallel. To evaluate our model, we build FlintstonesHD dataset, a new benchmark for long video generation. Experiments show that our model not only generates high-quality long videos with both global and local coherence, but also decreases the average inference time from 7.55min to 26s (by 94.26\%) at the same hardware setting when generating 1024 frames. The homepage link is \url{this https URL}

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,220
Reputation
8,261
Daps
157,924
my goodneess! it opened it's mouth. :mindblown:


edit:

whoa


DragGAN.gif


project page:




Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold​

About​

Official Code for DragGAN (SIGGRAPH 2023)

DEMO:​


 

GooPunch

Pro
Joined
Mar 11, 2022
Messages
255
Reputation
200
Daps
1,885
@bnew What do you think of George Hotz' claim that GPT-4 is a mixture of experts with 8x 220B models? That would explain the all the recent Microsoft research papers on optimizing for smaller models.
 
Top