bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/1
🚨 Paper Alert 🚨

➑️Paper Title: EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

🌟Few pointers from the paper

🎯Building effective imitation learning methods that enable robots to learn from limited data and still generalize across diverse real-world environments is a long-standing problem in robot learning.

♠️ In this paper authors have proposed β€œEquiBot”, a robust, data-efficient, and generalizable approach for robot manipulation task learning. Their approach combines SIM(3)-equivariant neural network architectures with diffusion models.

🎯This ensures that their learned policies are invariant to changes in scale, rotation, and translation, enhancing their applicability to unseen environments while retaining the benefits of diffusion-based policy learning, such as multi-modality and robustness.

🎯They showed on a suite of 6 simulation tasks that their proposed method reduces the data requirements and improves generalization to novel scenarios.

🎯In the real world, with 10 variations of 6 mobile manipulation tasks, they showed that their method can easily generalize to novel objects and scenes after learning from just 5 minutes of human demonstrations in each task.

🏒Organization: @Stanford

πŸ§™Paper Authors: @yjy0625 , Zi-ang Cao , @CongyueD , @contactrika , @SongShuran ,@leto__jean

1️⃣Read the Full Paper here: [2407.01479] EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

2️⃣Project Page: EquiBot

3️⃣Code: GitHub - yjy0625/equibot: Official implementation for paper "EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning".

πŸŽ₯ Be sure to watch the attached Video -Sound on πŸ”ŠπŸ”Š

🎡 Music by Zakhar Valaha from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/1
🚨Paper Alert 🚨

➑️Paper Title: Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

🌟Few pointers from the paper

🎯In this paper authors have presented β€œDiffusion Forcing”, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels.

🎯They applied Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens without fully diffusing past ones.

🎯Their approach is shown to combine the strengths of next-token prediction models, such as variable-length generation, with the strengths of full-sequence diffusion models, such as the ability to guide sampling to desirable trajectories.

🎯Their method offers a range of additional capabilities, such as
βš“rolling-out sequences of continuous tokens, such as video, with lengths past the training horizon, where baselines diverge and
βš“ new sampling and guiding schemes that uniquely profit from Diffusion Forcing's variable-horizon and causal architecture, and which lead to marked performance gains in decision-making and planning tasks.

🎯In addition to its empirical success, their method is proven to optimize a variational lower bound on the likelihoods of all subsequences of tokens drawn from the true joint distribution.

🏒Organization: @MIT_CSAIL

πŸ§™Paper Authors: @BoyuanChen0 , Diego Marti Monso, @du_yilun , @max_simchowitz , @RussTedrake , @vincesitzmann

1️⃣Read the Full Paper here: [2407.01392] Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

2️⃣Project Page: Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

3️⃣Code: GitHub - buoyancy99/diffusion-forcing: code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

πŸŽ₯ Be sure to watch the attached Demo Video -Sound on πŸ”ŠπŸ”Š

🎡 Music by Nick Valerson from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/1
🚨Paper Alert 🚨

➑️Paper Title: MimicMotion : High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

🌟Few pointers from the paper

🎯In recent years, generative artificial intelligence has achieved significant advancements in the field of image generation, spawning a variety of applications.

🎯However, video generation still faces considerable challenges in various aspects, such as controllability, video length, and richness of details, which hinder the application and popularization of this technology.

🎯In this work, authors have proposed a controllable video generation framework, dubbed β€œMimicMotion”, which can generate high-quality videos of arbitrary length with any motion guidance.

🎯Compared with previous methods, their approach has several highlights.
βš“Firstly, with confidence-aware pose guidance, temporal smoothness can be achieved so model robustness can be enhanced with large-scale training data.
βš“ Secondly, regional loss amplification based on pose confidence significantly eases the distortion of image significantly.

🎯Lastly, for generating long smooth videos, a progressive latent fusion strategy is proposed. By this means, videos of arbitrary length can be generated with acceptable resource consumption. With extensive experiments and user studies, MimicMotion demonstrates significant improvements over previous approaches in multiple aspects.

🏒Organization: @TencentGlobal , @sjtu1896

πŸ§™Paper Authors: Yuang Zhang, Jiaxi Gu, Li-Wen Wang, Han Wang, Junqi Cheng, Yuefeng Zhu, Fangyuan Zou

1️⃣Read the Full Paper here: [2406.19680] MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

2️⃣Project Page: SOCIAL MEDIA TITLE TAG

3️⃣Code: GitHub - Tencent/MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

πŸŽ₯ Be sure to watch the attached Demo Video -Sound on πŸ”ŠπŸ”Š

🎡 Music by Alexander Lisenkov from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/1
🚨IROS 2024 Paper Alert 🚨

➑️Paper Title: Learning Variable Compliance Control From a Few Demonstrations for Bimanual Robot with Haptic Feedback Teleoperation System

🌟Few pointers from the paper

🎯Automating dexterous, contact-rich manipulation tasks using rigid robots is a significant challenge in robotics. Rigid robots, defined by their actuation through position commands, face issues of excessive contact forces due to their inability to adapt to contact with the environment, potentially causing damage.

🎯While compliance control schemes have been introduced to mitigate these issues by controlling forces via external sensors, they are hampered by the need for fine-tuning task-specific controller parameters. Learning from Demonstrations (LfD) offers an intuitive alternative, allowing robots to learn manipulations through observed actions.

🎯In this work, authors have introduced a novel system to enhance the teaching of dexterous, contact-rich manipulations to rigid robots. Their system is twofold: firstly, it incorporates a teleoperation interface utilizing Virtual Reality (VR) controllers, designed to provide an intuitive and cost-effective method for task demonstration with haptic feedback.

🎯Secondly, they presented Comp-ACT (Compliance Control via Action Chunking with Transformers), a method that leverages the demonstrations to learn variable compliance control from a few demonstrations.

🎯Their methods have been validated across various complex contact-rich manipulation tasks using single-arm and bimanual robot setups in simulated and real-world environments, demonstrating the effectiveness of their system in teaching robots dexterous manipulations with enhanced adaptability and safety.

🏒Organization: University of Tokyo (@UTokyo_News_en ), OMRON SINIC X Corporation (@sinicx_jp )

πŸ§™Paper Authors: @tatsukamijo , @cambel07 , @mh69543540

1️⃣Read the Full Paper here: [2406.14990] Learning Variable Compliance Control From a Few Demonstrations for Bimanual Robot with Haptic Feedback Teleoperation System

πŸŽ₯ Be sure to watch the attached Demo Video -Sound on πŸ”ŠπŸ”Š

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/1
🚨ECCV 2024 Paper Alert 🚨

➑️Paper Title: Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

🌟Few pointers from the paper

🎯Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues.

🎯Despite promising progress on mainstream, natural image depth estimation, techniques perform poorly on endoscopy images due to a lack of strong geometric features and challenging illumination effects.

🎯In this paper, authors utilized the photometric cues, i.e., the light emitted from an endoscope and reflected by the surface, to improve monocular depth estimation.

🎯They first created two novel loss functions with supervised and self-supervised variants that utilize a per-pixel shading representation. They then proposed a novel depth refinement network (PPSNet) that leverages the same per-pixel shading representation.

🎯Finally, The authors have also introduced teacher-student transfer learning to produce better depth maps from both synthetic data with supervision and clinical data with self-supervision. They achieved state-of-the-art results on the C3VD dataset while estimating high-quality depth maps from clinical data.

🏒Organization: Department of Computer Science, University of North Carolina at Chapel Hill (@unccs )

πŸ§™Paper Authors: @Yahskapar , Samuel Ehrenstein, Shuxian Wang, Inbar Fried, Stephen M. Pizer, Marc Niethammer, @SenguptRoni

1️⃣Read the Full Paper here: [2403.17915] Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

2️⃣Project Page: Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

3️⃣Code: GitHub - Roni-Lab/PPSNet: PPSNet: Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos (ECCV, 2024)

πŸŽ₯ Be sure to watch the attached Demo Video -Sound on πŸ”ŠπŸ”Š

🎡 Music by Denys Kyshchuk from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#ECCV2024


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/1
🚨ECCV 2024 Paper Alert 🚨

➑️Paper Title: DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

🌟Few pointers from the paper

🎯Large, pretrained latent diffusion models (LDMs) have demonstrated an extraordinary ability to generate creative content, specialize to user data through few-shot fine-tuning, and condition their output on other modalities, such as semantic maps.

🎯However, are they usable as large-scale data generators, e.g., to improve tasks in the perception stack, like semantic segmentation? Authors of this paper investigated this question in the context of autonomous driving, and answered it with a resounding "yes".

🎯Authors proposed an efficient data generation pipeline termed β€œDGInStyle”.

🧊First, they examined the problem of specializing a pretrained LDM to semantically-controlled generation within a narrow domain.
🧊Second, they designed a Multi-resolution Latent Fusion technique to overcome the bias of LDMs towards dominant objects.
🧊Third, they proposed a Style Swap technique to endow the rich generative prior with the learned semantic control.

🎯Using DGInStyle, they generated a diverse dataset of street scenes, train a domain-agnostic semantic segmentation model on it, and evaluated the model on multiple popular autonomous driving datasets.

🎯Their approach consistently increases the performance of several domain generalization methods, in some cases by +2.5 mIoU compared to the previous state-of-the-art method without their generative augmentation scheme.

🏒Organization: @ETH_en , @KU_Leuven , @INSAITinstitute Sofia

πŸ§™Paper Authors: Yuru Jia, @lukashoyer3 , @ShengyHuang , @TianfuWang2 , Luc Van Gool, Konrad Schindler, @AntonObukhov1

1️⃣Read the Full Paper here: [2312.03048] DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

2️⃣Project Page: DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

3️⃣Generation Code: GitHub - yurujaja/DGInStyle: DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

4️⃣Segmentation Code: GitHub - yurujaja/DGInStyle-SegModel: Downstream semantic segmentation evaluation of DGInStyle.

πŸŽ₯ Be sure to watch the attached Video

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/1
🚨CVPR 2024 Paper Alert 🚨

➑️Paper Title: ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring

🌟Few pointers from the paper

🎯Monocular 3D human mesh estimation is an ill-posed problem characterized by inherent ambiguity and occlusion. While recent probabilistic methods propose generating multiple solutions little attention is paid to obtaining high-quality estimates from them.

🎯To address this limitation authors of this paper have introduced β€œScoreHypo” a versatile framework by first leverages their novel β€œHypoNet” to generate multiple hypotheses followed by employing a meticulously designed scorer β€œScoreNet” to evaluate and select high-quality estimates.

🎯ScoreHypo formulates the estimation process as a reverse denoising process, where HypoNet produces a diverse set of plausible estimates that effectively align with the image cues.

🎯Subsequently, ScoreNet is employed to rigorously evaluate and rank these estimates based on their quality and finally identify superior ones.

🎯Experimental results demonstrate that HypoNet outperforms existing state-of-the-art probabilistic methods as a multi-hypothesis mesh estimator. Moreover, the estimates selected by ScoreNet significantly outperform random generation or simple averaging.

🎯Notably, the trained ScoreNet exhibits generalizability, as it can effectively score existing methods and significantly reduce their errors by more than 15%.

🏒Organization: @PKU1898 , International Digital Economy Academy (IDEA), @sjtu1896

πŸ§™Paper Authors: Yuan Xu, @XiaoxuanMa_ , Jiajun Su, @walterzhu8 , Yu Qiao, Yizhou Wang

1️⃣Read the Full Paper here: https://shorturl.at/pyIuc

2️⃣Project Page: ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring

3️⃣Code: Coming πŸ”œ

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by Pavel Bekirov from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#CVPR2024


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/1
🚨 Paper Alert 🚨

➑️Paper Title: DoubleTake: Geometry Guided Depth Estimation

🌟Few pointers from the paper

🎯Estimating depth from a sequence of posed RGB images is a fundamental computer vision task, with applications in augmented reality, path planning etc. Prior work typically makes use of previous frames in a multi view stereo framework, relying on matching textures in a local neighborhood.

🎯In contrast, Authors model leverages historical predictions by giving the latest 3D geometry data as an extra input to our network. This self-generated geometric hint can encode information from areas of the scene not covered by the keyframes and it is more regularized when compared to individual predicted depth maps for previous frames.

🎯Authors have introduced a Hint MLP which combines cost volume features with a hint of the prior geometry, rendered as a depth map from the current camera location, together with a measure of the confidence in the prior geometry.

🎯They demonstrated that their method, which can run at interactive speeds, achieves state-of-the-art estimates of depth and 3D scene reconstruction in both offline and incremental evaluation scenarios.

🏒Organization: @NianticLabs , @ucl

πŸ§™Paper Authors: @MohammedAmr1 , @AleottiFilippo , Jamie Watson, Zawar Qureshi, @gui_ggh , Gabriel Brostow, Sara Vicente, @_mdfirman

1️⃣Read the Full Paper here: https://nianticlabs.github.io/doubletake/resources/DoubleTake.pdf

2️⃣Project Page: DoubleTake: Geometry Guided Depth Estimation

3️⃣Code: Coming πŸ”œ

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/2
🚨Paper Alert 🚨

➑️Paper Title: L4GM: Large 4D Gaussian Reconstruction Model

🌟Few pointers from the paper

🎯In this paper authors have presented L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input – in a single feed-forward pass that takes only a second.

🎯Key to their success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in 12M videos with a total of 300M frames.

🎯Authors kept their L4GM simple for scalability and build directly on top of LGM, a pretrained 3D Large Reconstruction Model that outputs 3D Gaussian ellipsoids from multiview image input.

🎯L4GM outputs a per-frame 3D Gaussian Splatting representation from video frames sampled at a low fps and then upsamples the representation to a higher fps to achieve temporal smoothness.

🎯They added temporal self-attention layers to the base LGM to help it learn consistency across time, and utilize a per-timestep multiview rendering loss to train the model.

🎯The representation is upsampled to a higher framerate by training an interpolation model which produces intermediate 3D Gaussian representations. They showcased that L4GM that is only trained on synthetic data generalizes extremely well on in-the-wild videos, producing high quality animated 3D assets.

🏒Organization: @nvidia , @UofT , @Cambridge_Uni , @MIT , S-Lab, @NTUsg

πŸ§™Paper Authors: Jiawei Ren, Kevin Xie, @ashmrz10 , Hanxue Liang, Xiaohui Zeng, @karsten_kreis , @liuziwei7 , Antonio Torralba, @FidlerSanja , Seung Wook Kim, @HuanLing6

1️⃣Read the Full Paper here: [2406.10324] L4GM: Large 4D Gaussian Reconstruction Model

2️⃣Project Page: L4GM: Large 4D Gaussian Reconstruction Model

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by Praz Khanal from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

2/2
Impressive


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/1
🚨Paper Alert 🚨

➑️Paper Title: Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

🌟Few pointers from the paper

🎯In this paper authors have presented β€œFollow-Your-Emoji”, a diffusion-based framework for portrait animation, which animates a reference portrait with target landmark sequences.

🎯The main challenge of portrait animation is to preserve the identity of the reference portrait and transfer the target expression to this portrait while maintaining temporal consistency and fidelity.

🎯To address these challenges, Follow-Your-Emoji equipped the powerful Stable Diffusion model with two well-designed technologies. Specifically, they first adopt a new explicit motion signal, namely expression-aware landmarks, to guide the animation process.

🎯Authors discovered this landmark can not only ensure the accurate motion alignment between the reference portrait and target motion during inference but also increase the ability to portray exaggerated expressions (i.e., large pupil movements) and avoid identity leakage.

🎯Then, authors have also proposed a facial fine-grained loss to improve the model's ability of subtle expression perception and reference portrait appearance reconstruction by using both expression and facial masks.

🎯Accordingly, their method demonstrates significant performance in controlling the expression of freestyle portraits, including real humans, cartoons, sculptures, and even animals.

🎯By leveraging a simple and effective progressive generation strategy, they extended their model to stable long-term animation, thus increasing its potential application value.

🎯To address the lack of a benchmark for this field, they have introduced EmojiBench, a comprehensive benchmark comprising diverse portrait images, driving videos, and landmarks. They show extensive evaluations on EmojiBench to verify the superiority of Follow-Your-Emoji.

🏒Organization: @hkust , @TencentGlobal , @Tsinghua_Uni

πŸ§™Paper Authors: Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng Chen

1️⃣Read the Full Paper here: [2406.01900] Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

2️⃣Project Page: Follow-Your-Emoji: Freestyle Portrait Animation

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by Maksym Dudchyk from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/1
🚨Paper Alert 🚨

➑️Paper Title: Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

🌟Few pointers from the paper

🎯Large-scale endeavors like RT-1 and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data.

🎯Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited to environments with privileged state information, they require hand-designed skills, and are limited to interactions with few object instances.

🎯The Authors of this paper propose β€œMANIPULATE-ANYTHING”, a scalable automated generation method for real-world robotic manipulation. Unlike prior work, our method can operate in real-world environments without any privileged state information, hand-designed skills, and can manipulate any static object.

🎯They evaluate their method using two setups:
βš“First, MANIPULATE-ANYTHING successfully generates trajectories for all 5 real-world and 12 simulation tasks, significantly outperforming existing methods like VoxPoser.
βš“Second, MANIPULATE-ANYTHING’s demonstrations can train more robust behavior cloning policies than training with human demonstrations, or from data generated by VoxPoser and Code-As-Policies.

🎯The authors believed that MANIPULATE-ANYTHING can be the scalable method for both generating data for robotics and solving novel tasks in a zero-shot setting.

🏒Organization: @uwcse , @nvidia , @allen_ai , Universidad Católica San Pablo

πŸ§™Paper Authors: @DJiafei , @TonyWentaoYuan , @wpumacay7567 ,@YiruHelenWang ,@ehsanik , Dieter Fox, @RanjayKrishna

1️⃣Read the Full Paper here: https://arxiv.org/pdf/2406.18915

2️⃣Project Page: Manipulate Anything

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by StudioKolomna from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/1
🚨ICML 2024 Paper Alert 🚨

➑️Paper Title: Efficient World Models with Context-Aware Tokenization

🌟Few pointers from the paper

🎯Scaling up deep Reinforcement Learning (RL) methods presents a significant challenge. Following developments in generative modeling, model-based RL positions itself as a strong contender.

🎯Recent advances in sequence modeling have led to effective transformer-based world models, albeit at the price of heavy computations due to the long sequences of tokens required to accurately simulate environments.

🎯In this work, authors have proposed βˆ†-IRIS, a new agent with a world model architecture composed of a discrete autoencoder that encodes stochastic deltas between time steps and an autoregressive transformer that predicts future deltas by summarizing the current state of the world with continuous tokens.

🎯In the Crafter benchmark, βˆ†-IRIS sets a new state of the art at multiple frame budgets, while being an order of magnitude faster to train than previous attention-based approaches.

🏒Organization: @unige_en

πŸ§™Paper Authors: @micheli_vincent , @EloiAlonso1 , @francoisfleuret

1️⃣Read the Full Paper here: [2406.19320] Efficient World Models with Context-Aware Tokenization

2️⃣Code: GitHub - vmicheli/delta-iris: Efficient World Models with Context-Aware Tokenization. ICML 2024

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by StudioKolomna from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#icml2024


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/1
🚨LLM Alert 🚨

πŸ’Ž @GoogleDeepMind " Gemma Team" has officially announced the release of Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters.

πŸ’ŽThe 9 billion and 27 billion parameter models are available today, with a 2 billion parameter model to be released shortly.

🌟Few pointers from the Announcement

🎯 In this new version, they have provided several technical modifications to their architecture, such as interleaving local-global attentions and group-query attention.

🎯They also train the 2B and 9B models with knowledge distillation instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3Γ— bigger.

🎯They trained Gemma 2 27B on 13 trillion tokens of primarily-English data, the 9B model on 8 trillion tokens, and the 2.6B on 2 trillion tokens. These tokens come from a variety of data sources, including web documents, code, and science articles.

🎯There models are not multimodal and are not trained specifically for state-of-the-art multi-
lingual capabilities. The final data mixture was determined through ablations similar to the approach in Gemini 1.0.

🎯Just like the original Gemma models, Gemma 2 is available under the commercially-friendly Gemma license, giving developers and researchers the ability to share and commercialize their innovations.

1️⃣Blog: Gemma 2 is now available to researchers and developers

2️⃣Technical Report: https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/7
Viggle's new feature "Move" is now live! Visit VIGGLE to animate your pic right away.

Compared to our previous feature which offers greensreen and white background, Move allows you to keep the original background of the image, without further editing.

Get it moving!

2/7
That’s so great!!! Can we see GenAiMovies soon?

3/7
Wow, fantastic. I personally find green screen very useful but I'm pleased with the upgrades. Nice work.

4/7
πŸ₯³πŸ€©

5/7
Looks good, will try this out later. Make sure you keep the green screen too as thats fine πŸ‘πŸ»πŸ‘πŸ»

6/7
Great. For multiple characters video, how to choose which one to replace?

7/7
Wow!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,130
Reputation
8,239
Daps
157,831

1/1
🚨 Paper Alert 🚨

➑️Paper Title: Real-Time Video Generation with Pyramid Attention Broadcast

🌟Few pointers from the paper

🎯Recently, Sora and other DiT-based video generation models have attracted significant attention. However, in contrast to image generation, there are few studies focused on accelerating the inference of DiT-based video generation models.

🎯Additionally, the inference cost for generating a single video can be substantial, often requiring tens of GPU minutes or even hours. Therefore, accelerating the inference of video generation models has become urgent for broader GenAI applications.

🎯In this work authors have introduced β€œPyramid Attention Broadcast (PAB)”, the first approach that achieves real-time DiT-based video generation.

🎯By mitigating redundant attention computation, PAB achieves up to 21.6 FPS with 10.6x acceleration, without sacrificing quality across popular DiT-based video generation models including Open-Sora, Open-Sora-Plan, and Latte.

🎯Notably, as a training-free approach, PAB can empower any future DiT-based video generation models with real-time capabilities.

🏒Organization: @NUSingapore , @LifeAtPurdue

πŸ§™Paper Authors: @oahzxl , @JxlDragon , @VictorKaiWang1 , @YangYou1991

1️⃣Read the Full Paper here: Coming πŸ”œ

2️⃣Blog: Real-Time Video Generation with Pyramid Attention Broadcast

3️⃣Code: GitHub - NUS-HPC-AI-Lab/OpenDiT: OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference

4️⃣Doc: OpenDiT/docs/pab.md at master Β· NUS-HPC-AI-Lab/OpenDiT

πŸŽ₯ Be sure to watch the attached Demo Video-Sound on πŸ”ŠπŸ”Š

🎡 Music by Hot_Dope from @pixabay

Find this Valuable πŸ’Ž ?

♻️QT and teach your network something new

Follow me πŸ‘£, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 
Top