Eureka! NVIDIA Research Breakthrough Puts New Spin on Robot Learning

StoneColdSteveAustin · Jul 7, 2024

bnew said:
1/1
Daily Training of Robots Driven by RL
Segments of daily training for robots driven by reinforcement learning.
Multiple tests done in advance for friendly service humans.
The training includes some extreme tests, please do not imitate.
#AI #Unitree #AGI #EmbodiedIntelligence #RobotDog #QuadrupedRobot #IndustrialRobot

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

The day we decide to get rid of ai & robots pure force won't do it

bnew · Jul 8, 2024

1/1

Paper Alert

Paper Title: EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

Few pointers from the paper

Building effective imitation learning methods that enable robots to learn from limited data and still generalize across diverse real-world environments is a long-standing problem in robot learning.

In this paper authors have proposed “EquiBot”, a robust, data-efficient, and generalizable approach for robot manipulation task learning. Their approach combines SIM(3)-equivariant neural network architectures with diffusion models.

This ensures that their learned policies are invariant to changes in scale, rotation, and translation, enhancing their applicability to unseen environments while retaining the benefits of diffusion-based policy learning, such as multi-modality and robustness.

They showed on a suite of 6 simulation tasks that their proposed method reduces the data requirements and improves generalization to novel scenarios.

In the real world, with 10 variations of 6 mobile manipulation tasks, they showed that their method can easily generalize to novel objects and scenes after learning from just 5 minutes of human demonstrations in each task.

Organization: @Stanford

Paper Authors: @yjy0625 , Zi-ang Cao , @CongyueD , @contactrika , @SongShuran ,@leto__jean

Read the Full Paper here: [2407.01479] EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

Project Page: EquiBot

Code: GitHub - yjy0625/equibot: Official implementation for paper "EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning".

Be sure to watch the attached Video -Sound on

Music by Zakhar Valaha from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 8, 2024

1/1

IROS 2024 Paper Alert

Paper Title: Learning Variable Compliance Control From a Few Demonstrations for Bimanual Robot with Haptic Feedback Teleoperation System

Few pointers from the paper

Automating dexterous, contact-rich manipulation tasks using rigid robots is a significant challenge in robotics. Rigid robots, defined by their actuation through position commands, face issues of excessive contact forces due to their inability to adapt to contact with the environment, potentially causing damage.

While compliance control schemes have been introduced to mitigate these issues by controlling forces via external sensors, they are hampered by the need for fine-tuning task-specific controller parameters. Learning from Demonstrations (LfD) offers an intuitive alternative, allowing robots to learn manipulations through observed actions.

In this work, authors have introduced a novel system to enhance the teaching of dexterous, contact-rich manipulations to rigid robots. Their system is twofold: firstly, it incorporates a teleoperation interface utilizing Virtual Reality (VR) controllers, designed to provide an intuitive and cost-effective method for task demonstration with haptic feedback.

Secondly, they presented Comp-ACT (Compliance Control via Action Chunking with Transformers), a method that leverages the demonstrations to learn variable compliance control from a few demonstrations.

Their methods have been validated across various complex contact-rich manipulation tasks using single-arm and bimanual robot setups in simulated and real-world environments, demonstrating the effectiveness of their system in teaching robots dexterous manipulations with enhanced adaptability and safety.

Organization: University of Tokyo (@UTokyo_News_en ), OMRON SINIC X Corporation (@sinicx_jp )

Paper Authors: @tatsukamijo , @cambel07 , @mh69543540

Read the Full Paper here: [2406.14990] Learning Variable Compliance Control From a Few Demonstrations for Bimanual Robot with Haptic Feedback Teleoperation System

Be sure to watch the attached Demo Video -Sound on

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 8, 2024

1/1

Paper Alert

Paper Title: Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Few pointers from the paper

Large-scale endeavors like RT-1 and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data.

Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited to environments with privileged state information, they require hand-designed skills, and are limited to interactions with few object instances.

The Authors of this paper propose “MANIPULATE-ANYTHING”, a scalable automated generation method for real-world robotic manipulation. Unlike prior work, our method can operate in real-world environments without any privileged state information, hand-designed skills, and can manipulate any static object.

They evaluate their method using two setups:

First, MANIPULATE-ANYTHING successfully generates trajectories for all 5 real-world and 12 simulation tasks, significantly outperforming existing methods like VoxPoser.

Second, MANIPULATE-ANYTHING’s demonstrations can train more robust behavior cloning policies than training with human demonstrations, or from data generated by VoxPoser and Code-As-Policies.

The authors believed that MANIPULATE-ANYTHING can be the scalable method for both generating data for robotics and solving novel tasks in a zero-shot setting.

Organization: @uwcse , @nvidia , @allen_ai , Universidad Católica San Pablo

Paper Authors: @DJiafei , @TonyWentaoYuan , @wpumacay7567 ,@YiruHelenWang ,@ehsanik , Dieter Fox, @RanjayKrishna

Read the Full Paper here: https://arxiv.org/pdf/2406.18915

Project Page: Manipulate Anything

Be sure to watch the attached Demo Video-Sound on

Music by StudioKolomna from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 8, 2024

1/4
I am mind blown by this new technology!
AI is now embodied.
And we are open-sourcing it all.

Listen to @HaixuanT casually discussing with its cute robot at the @linuxfoundation:

What's your name?
> I am Reachy, a robot from @pollenrobotics, I have two arms.

What do you see?
> A large venue with many people sitting at tables.

Can you give me a high five?
> Yes of course!

2/4
Bro, that’s barely turning test level. We’ve only begun.

3/4
where can i get the video？

4/4
Awesome, congratulations to your amazing team!
And thank you for choosing Reachy for this project :smile:

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 9, 2024

1/1

Paper Alert

Paper Title: HumanPlus : Humanoid Shadowing and Imitation from Humans

Few pointers from the paper

In this paper, authors have introduced a full-stack system for humanoids to learn motion and autonomous skills from human data.

They first trained a low-level policy in simulation via reinforcement learning using existing 40-hour human motion datasets.

This policy transfers to the real world and allows humanoid robots to follow human body and hand motion in real time using only a RGB camera, i.e. shadowing.

Through shadowing, human operators can teleoperated humanoids to collect whole-body data for learning different tasks in the real world.

Using the data collected, authors then performed supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously by imitating human skills.

They demonstrated the system on their customized 33-DoF 180cm humanoid, autonomously completing tasks such as wearing a shoe to stand up and walk, unloading objects from warehouse racks, folding a sweatshirt, rearranging objects, typing, and greeting another robot with 60-100% success rates using up to 40 demonstrations.

Organization: @Stanford

Paper Authors: @zipengfu , @qingqing_zhao_ , @Qi_Wu577 , @GordonWetzstein , @chelseabfinn

Read the Full Paper here: https://humanoid-ai.github.io/HumanPlus.pdf

Project Page: HumanPlus: Humanoid Shadowing and Imitation from Humans

Code: GitHub - MarkFzp/humanplus: HumanPlus: Humanoid Shadowing and Imitation from Humans

Hardware:HumanPlus

️ Hardware Tutorial

Be sure to watch the attached Demo Video-Sound on

Music by John Rush from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#humanoid /search?q=#teleoperation

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** HumanPlus: Humanoid Shadowing and Imitation from Humans

**Summary:** Researchers at Stanford have created a system that allows humanoid robots to learn from humans and perform tasks on their own. They did this by:

**Step 1:** Training a robot in a simulation using data from 40 hours of human movement.

**Step 2:** Using a camera, the robot can copy human movements in real-time, like a shadow.

**Step 3:** A human operator can control the robot to collect data on how to perform tasks, like picking up objects.

**Step 4:** The robot uses this data to learn how to perform tasks on its own, like folding a shirt or typing.

**Results:** The robot was able to perform tasks with a success rate of 60-100% using up to 40 demonstrations.

**What it means:** This system allows robots to learn from humans and perform tasks autonomously, which could be useful in many areas, such as warehouses, hospitals, or homes.

bnew · Jul 9, 2024

1/1

CHI 2024 Paper Alert

Paper Title: InflatableBots: Inflatable Shape-Changing Mobile Robots for Large-Scale Encountered-Type Haptics in VR

Few pointers from the paper

In this paper authors have introduced “InflatableBots”, shape-changing inflatable robots for
large-scale encountered-type haptics in VR.

Unlike traditional inflatable shape displays, which are immobile and limited in interaction
areas, their approach combines mobile robots with fan-based inflatable structures.

This enables safe, scalable, and deployable haptic interactions on a large scale. They developed three coordinated inflatable mobile robots, each of which consists of an omni-directional mobile base and a reel-based inflatable structure.

The robot can simultaneously change its height and position rapidly (horizontal:58.5 cm/sec, vertical: 10.4 cm/sec, from 40 cm to 200 cm), which allows for quick and dynamic haptic rendering of multiple touch points to simulate various body-scale objects and surfaces in real-time across large spaces (3.5 m x 2.5 m).

They evaluated their system with a user study (N = 12), which confirms the unique advantages in safety, deployability, and large-scale interactability to significantly improve realism in VR experiences.

Organization: @TohokuUniPR , @UCalgary

Paper Authors: Ryota Gomi, @ryosuzk, Kazuki Takashima, Kazuyuki Fujita, Yoshifumi Kitamura

Read the Full Paper here: https://dl.acm.org/doi/pdf/10.1145/3613904.3642069

Be sure to watch the attached Demo Video-Sound on

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#VR /search?q=#AR /search?q=#robots /search?q=#CHI2024

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

CHI 2024 Paper Alert

This is an announcement about a new research paper that's been published.

Paper Title: InflatableBots: Inflatable Shape-Changing Mobile Robots for Large-Scale Encountered-Type Haptics in VR

Here are some key points from the paper:

What's it about?

The authors have created something called "InflatableBots", which are robots that can change shape and move around. They're designed to be used in Virtual Reality (VR) to create a more realistic experience.

How is it different?

Unlike other inflatable displays that can't move and are limited in what they can do, these robots can move around and change shape in real-time. This makes it possible to create a more immersive and interactive experience in VR.

How does it work?

The robots have a special base that can move in any direction, and a part that can inflate and deflate to change shape. They can move quickly and change shape rapidly, which allows them to simulate different objects and surfaces in VR.

What did they test?

The researchers tested their system with 12 people and found that it was safe, easy to set up, and allowed for a more realistic experience in VR.

Who did the research?

The research was done by a team from Tohoku University and the University of Calgary. The authors of the paper are Ryota Gomi, Ryosuke Suzuki, Kazuki Takashima, Kazuyuki Fujita, and Yoshifumi Kitamura.

Want to read more?

You can read the full paper here: https://dl.acm.org/doi/pdf/10.1145/3613904.3642069

bnew · Jul 10, 2024

1/1

Paper Alert

Paper Title: OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Few pointers from the paper

In this paper authors have presented OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy.

Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real time teleoperation through VR headset, verbal instruction, and RGB camera.

OmniH2O also enables full autonomy by learning from teleoperated demonstrations or integrating with frontier models such as GPT-4o.

OmniH2O demonstrates versatility and dexterity in various real-world whole-body tasks through teleoperation or autonomy, such as playing multiple sports, moving and manipulating objects, and interacting with humans.

They developed an RL-based sim-to-real pipeline, which involves large-scale retargeting and augmentation of human motion datasets, learning a real-world deployable policy with sparse sensor input by imitating a privileged teacher policy, and reward designs to enhance robustness and stability.

They have released the first humanoid whole-body control dataset, OmniH2O-6, containing six everyday tasks, and demonstrate humanoid whole-body skill learning from teleoperated datasets.

Organization: @CarnegieMellon , @sjtu1896

Paper Authors: @TairanHe99 , @zhengyiluo, @Xialin_He, @_wenlixiao , @ChongZitaZhang , Weinan Zhang @kkitani , Changliu Liu @GuanyaShi

Read the Full Paper here:https://omni.human2humanoid.com/resources/OmniH2O_paper.pdf

Project Page: OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Be sure to watch the attached Demo Video-Sound on

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#humanoid /search?q=#teleoperation

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

**What's it about?**

This paper is about a system called OmniH2O that allows a human to control a humanoid robot (a robot that looks like a human) using different methods, such as virtual reality, voice commands, or even just by watching the human move. The system can also learn to do tasks on its own without human input.

**How does it work?**

The system uses a special way of controlling the robot's movements, called kinematic pose, which allows the human to control the robot in different ways. For example, the human can wear a virtual reality headset and move their body to control the robot's movements in real-time. The system can also learn from the human's movements and do tasks on its own, such as playing sports, moving objects, and interacting with people.

**What's special about it?**

The system is special because it can learn to do tasks in a real-world environment, not just in a simulation. It can also learn from a large dataset of human movements and adapt to new situations. The researchers have also released a dataset of humanoid robot movements, called OmniH2O-6, which can be used by other researchers to improve their own systems.

**Who worked on it?**

The paper was written by researchers from Carnegie Mellon University and Shanghai Jiao Tong University. The authors are listed at the bottom of the post.

**Want to learn more?**

You can read the full paper by clicking on the link provided, or visit the project page to learn more about OmniH2O.

bnew · Jul 10, 2024

1/1

ICML Paper Alert

Paper Title: Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation

Few pointers from the paper

Model-Free Reinforcement Learning (MFRL), leveraging the policy gradient theorem, has
demonstrated considerable success in continuous control tasks. However, these approaches are
plagued by high gradient variance due to zeroth order gradient estimation, resulting in suboptimal policies.

Conversely, First-Order Model-Based Reinforcement Learning (FO-MBRL) methods
employing differentiable simulation provide gradients with reduced variance but are susceptible to sampling error in scenarios involving stiff dynamics, such as physical contact.

This paper investigates the source of this error and introduces “Adaptive Horizon Actor-Critic (AHAC)”, an FO-MBRL algorithm that reduces gradient error by adapting the model-based horizon to avoid stiff dynamics.

Empirical findings reveal that AHAC out-performs MFRL baselines, attaining 40% more reward across a set of locomotion tasks and efficiently scaling to high-dimensional control environments with improved wall-clock-time efficiency.

Lastly, this work suggests that future research should not only focus on refining algorithmic approaches for policy learning but also on enhancing simulator technologies to more effectively manage gradient error.

Organization: @GeorgiaTech , @Stanford, @nvidia

Paper Authors: @imgeorgiev , @krishpopdesu , @xujie7979 , @eric_heiden , @animesh_garg

Read the Full Paper here: [2405.17784] Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation

Project Page: AHAC

Code: GitHub - imgeorgiev/DiffRL: Learning Optimal Policies Through Contact in Differentiable Simulation

Video: Adaptive Horizon Actor Critic

Be sure to watch the attached Demo Video-Sound on

Music by Dmitry from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

.

A.I Generated explanation:

**Title:** Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation

**Summary:** This paper is about a new way to teach computers to make decisions in complex situations, like robots interacting with their environment. The goal is to make the computers learn faster and better.

**Problem:** There are two main ways to teach computers to make decisions: Model-Free Reinforcement Learning (MFRL) and First-Order Model-Based Reinforcement Learning (FO-MBRL). MFRL is good at learning from experience, but it can be slow and make mistakes. FO-MBRL is faster and more accurate, but it can get confused when things get complicated, like when a robot is interacting with its environment in a complex way.

**Solution:** The researchers came up with a new way to combine the strengths of both approaches, called Adaptive Horizon Actor-Critic (AHAC). AHAC adapts to the situation and adjusts its approach to avoid making mistakes. This makes it faster and more accurate than the other methods.

**Results:** The researchers tested AHAC on several tasks, like teaching a robot to walk or run. AHAC performed 40% better than the other methods and was able to handle complex situations more efficiently.

**Conclusion:** The researchers think that this new approach is a big step forward, but they also think that we need to improve the simulators we use to train the computers. This will help us make even better decision-making algorithms in the future.

**Resources:**

* Read the full paper here: https://arxiv.org/abs/2405.17784
* Project page: https://adaptive-horizon-actor-critic.github.io/
* Code: https://github.com/imgeorgiev/DiffRL
* Video: https://invidious.poast.org/watch?v=bAW9O3C_1ck

bnew · Jul 10, 2024

Open-TeleVision: Why human intelligence could be the key to next-gen robotic automation

We want to hear from you! Take our quick AI survey and share your insights on the current state of AI, how you’re implementing it, and what you expect to see in the future. Learn More Last week, researchers at MIT and UCSD unveiled a new immersive remote control experience for robots. This...

venturebeat.com

Open-TeleVision: Why human intelligence could be the key to next-gen robotic automation

James Thomason @jathomason

July 8, 2024 9:00 AM

Open-TeleVision: Remote operation of humanoid robot

Credit: MIT and UCSD Paper "Open-TeleVision: Teleoperation with Immersive Active Visual Feedback"

We want to hear from you! Take our quick AI survey and share your insights on the current state of AI, how you’re implementing it, and what you expect to see in the future. Learn More

Last week, researchers at MIT and UCSD unveiled a new immersive remote control experience for robots. This innovative system, dubbed “Open-TeleVision,” enables operators to actively perceive the robot’s surroundings while mirroring their hand and arm movements. As the researchers describe it, the system “creates an immersive experience as if the operator’s mind is transmitted to a robot embodiment.”

In recent years, AI has dominated discussions about the future of robotics. From autonomous vehicles to warehouse robots, the promise of machines that can think and act for themselves has captured imaginations and investments. Companies like Boston Dynamics have showcased impressive AI-driven robots that can navigate complex environments and perform intricate tasks.

However, AI-powered robots still struggle with adaptability, creative problem-solving, and handling unexpected situations – areas where human intelligence excels.

The human touch

The Open-TeleVision system takes a different approach to robotics. Instead of trying to replicate human intelligence in a machine, it creates a seamless interface between human operators and robotic bodies. The researchers explain that their system “allows operators to actively perceive the robot’s surroundings in a stereoscopic manner. Additionally, the system mirrors the operator’s arm and hand movements on the robot.”

This approach leverages the unparalleled cognitive abilities of humans while extending our physical reach through advanced robotics.

Key advantages of this human-centered approach include:

Adaptability: Humans can quickly adjust to new situations and environments, a skill that AI still struggles to match.
Intuition: Years of real-world experience allow humans to make split-second decisions based on subtle cues that might be difficult to program into an AI.
Creative problem-solving: Humans can think outside the box and devise novel solutions to unexpected challenges.
Ethical decision-making: In complex scenarios, human judgment may be preferred for making nuanced ethical choices.

Potential Applications The implications of this technology are far-reaching. Some potential applications include:

Disaster response: Human-controlled robots could navigate dangerous environments while keeping first responders safe.
Telesurgery: Surgeons could perform delicate procedures from anywhere in the world.
Space exploration: Astronauts on Earth could control robots on distant planets, eliminating communication delays.
Industrial maintenance: Experts could remotely repair complex machinery in hard-to-reach locations.

How Open-TeleVision works

Open-TeleVision is a teleoperation system that uses a VR device to stream the hand, head, and wrist poses of the operator to a server. The server then retargets these human poses to the robot and sends joint position targets to control the robot’s movements. The system includes a single active stereo RGB camera on the robot’s head, equipped with 2 or 3 degrees of freedom actuation, which moves along with the operator’s head movements.

Image Credit: Xuxin Cheng, Jialong Li, Shiqi Yang, Ge Yang, Xiaolong Wang

Paper: “Open-TeleVision: Teleoperation with Immersive Active Visual Feedback”, MIT and UCSD

The paper states that the system streams real-time, ego-centric 3D observations to the VR device, allowing the operator to see what the robot sees. This provides a more intuitive mechanism for exploring the robot’s environment and focusing on important regions for interaction.

The system operates at 60 Hz, with the entire loop of capturing operator movements, retargeting to the robot, and streaming video back to the operator happening at this frequency.

One of the most exciting aspects of Open-TeleVision is its potential for long-distance operation. The researchers demonstrated this capability, noting: “Our system enables remote control by an operator via the Internet. One of the authors, Ge Yang at MIT (east coast) is able to teleoperate the H1 robot at UC San Diego (west coast).”

This coast-to-coast operation showcases the system’s potential for truly global remote control of robotic systems.

New projects emerging quickly

Open-TeleVision is just one of many new projects exploring advanced human-robot interfaces. Researchers Younghyo Park and Pulkit Agrawal at MTI also recently released an open source project investigating the use of Apple’s Vision Pro headset for robot control. This project aims to leverage the Vision Pro’s advanced hand and eye-tracking capabilities to create intuitive control schemes for robotic systems.

The combination of these research efforts highlights the growing interest in creating more immersive and intuitive ways for humans to control robots, rather than solely focusing on autonomous AI systems.

Credit: Younghyo Park and Pulkit Agrawal, MIT, “Using Apple Vision Pro to Train and Control Robots”

Challenges and future directions

While promising, the Open-TeleVision system still faces hurdles. Latency in long-distance communications, the need for high-bandwidth connections, and operator fatigue are all areas that require further research.

The team is also exploring ways to combine their human-control system with AI assistance. This hybrid approach could offer the best of both worlds – human decision-making augmented by AI’s rapid data processing and pattern recognition capabilities.

A new paradigm enterprise automation

As we look to the future of robotics and automation, systems like Open-TeleVision challenge us to reconsider the role of human intelligence in technological advancement. For enterprise technology decision makers, this research presents an intriguing opportunity: the ability to push automation projects forward without waiting for AI to fully mature.

While AI will undoubtedly continue to advance, this research demonstrates that enhancing human control rather than replacing it entirely may be a powerful and more immediately achievable alternative. By leveraging existing human expertise and decision-making capabilities, companies can potentially accelerate their automation initiatives and see ROI more quickly.

Key takeaways for enterprise leaders:

Immediate implementation: Human-in-the-loop systems can be deployed now, using current technology and human expertise.
Flexibility: These systems can adapt to changing business needs more quickly than fully autonomous AI solutions.
Reduced training time: Leveraging human operators means less time spent training AI models on complex tasks.
Scalability: With remote operation capabilities, a single expert can potentially control multiple systems across different locations.
Risk mitigation: Human oversight can help prevent costly errors and provide a safeguard against unexpected situations.

As the field of robotics evolves, it’s becoming clear that the most effective solutions may lie not in choosing between human and artificial intelligence, but in finding innovative ways to combine their strengths. The Open-TeleVision system, along with similar projects, represents a significant step in that direction.

For forward-thinking enterprises, this approach opens up new possibilities for human-robot collaboration that could reshape industries, streamline operations, and extend the reach of human capabilities across the globe. By embracing these technologies now, companies can position themselves at the forefront of the next wave of automation and gain a competitive edge in their respective markets.

bnew · Jul 10, 2024

1/1

Paper Alert

Paper Title: ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data

Few pointers from the paper

Audio signals provide rich information for the robot interaction and object properties through contact. These information can surprisingly ease the learning of contact-rich robot manipulation skills, especially when the visual information alone is ambiguous or incomplete.

However, the usage of audio data in robot manipulation has been constrained to teleoperated demonstrations collected by either attaching a microphone to the robot or object, which significantly limits its usage in robot learning pipelines.

In this paper author have introduced ManiWAV: an ‘ear-in-hand’ data collection device to collect in-the-wild human demonstrations with synchronous audio and visual feedback, and a corresponding policy interface to learn robot manipulation policy directly from the demonstrations.

They demonstrated the capabilities of their system through four contact-rich manipulation tasks that require either passively sensing the contact events and modes, or actively sensing the object surface materials and states.

In addition, they showed that their system can generalize to unseen in-the-wild environments, by learning from diverse in-the-wild human demonstrations.

Organization: @Stanford , @Columbia , @ToyotaResearch

Paper Authors: @Liu_Zeyi_ , @chichengcc , @eacousineau , Naveen Kuppuswamy, @Ben_Burchfiel , @SongShuran

Read the Full Paper here: [2406.19464] ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data

Project Page: ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data

Code: GitHub - real-stanford/maniwav: Official codebase of paper "ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data"

Dataset: Index of /maniwav/data

Be sure to watch the attached Technical Summary Video-Sound on

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data

**Summary:** This paper is about teaching robots to perform tasks that require touching and manipulating objects, like picking up a ball or opening a door. The researchers found that using audio signals, like the sound of a ball bouncing, can help the robot learn these tasks more easily.

**The Problem:** Usually, robots learn by watching humans perform tasks, but this can be limited because the visual information might not be enough. For example, if a robot is trying to pick up a ball, it might not be able to see exactly how the ball is moving, but it can hear the sound of the ball bouncing.

**The Solution:** The researchers created a special device that can collect audio and visual data from humans performing tasks, like picking up a ball. This device is like a special glove that has a microphone and a camera. They used this device to collect data from humans performing four different tasks, like picking up a ball or opening a door.

**The Results:** The researchers found that their system can learn to perform these tasks by using the audio and visual data collected from humans. They also found that their system can work in different environments and with different objects, even if it hasn't seen them before.

**The Team:** The researchers are from Stanford University, Columbia University, and Toyota Research.

**Resources:**

* **Read the Full Paper:** [2406.19464] ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data
* **Project Page:** ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data
* **Code:** GitHub - real-stanford/maniwav: Official codebase of paper "ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data"
* **Dataset:** Index of /maniwav/data

bnew · Jul 11, 2024

1/1

Product Update

Imagine stepping into an office where humans and robots collaborate seamlessly.

The hum of machinery harmonizes with the click of keyboards, creating a symphony of productivity.

Watch a team of EVEs from @1x_tech work side by side with their human counterparts, transforming cluttered spaces into pristine oases.

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 11, 2024

1/1

Paper Alert

Paper Title: ORION: Vision-based Manipulation from Single Human Video with Open-World Object Graphs

Few pointers from the paper

In this paper authors have presented an object-centric approach to empower robots to learn
vision-based manipulation skills from human videos.

They investigated the problem of imitating robot manipulation from a single human video in the open-world setting, where a robot must learn to manipulate novel objects from one video demonstration.

Authors have introduced ORION, an algorithm that tackles the problem by extracting an object-centric manipulation plan from a single RGB-D video and deriving a policy that conditions on the extracted plan.

Their method enables the robot to learn from videos captured by daily mobile devices such as an iPad and generalize the policies to deployment environments with varying visual backgrounds, camera angles, spatial layouts, and novel object instances.

They systematically evaluated their method on both short-horizon and long-horizon tasks, demonstrating the efficacy of ORION in learning from a single human video in the open world.

Organization: @UTAustin , @SonyAI_global

Paper Authors: @yifengzhu_ut , Arisrei Lim, @PeterStone_TX , @yukez

Read the Full Paper here: [2405.20321] Vision-based Manipulation from Single Human Video with Open-World Object Graphs

Project Page: ORION: Vision-based Manipulation from Single Human Video with Open-World Object Graphs

Be sure to watch the attached Demo Video-Sound on

Music by SPmusic from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#robotmanipulation

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** ORION: Vision-based Manipulation from Single Human Video with Open-World Object Graphs

**Summary:**

Imagine you want a robot to learn how to do a task, like picking up a ball or moving a block, just by watching a human do it on a video. This paper presents a new way to make that happen. The authors created an algorithm called ORION that allows a robot to learn from a single video of a human doing a task, and then apply that knowledge to do the task itself, even if the environment is different.

**Key Points:**

* The authors want to enable robots to learn from humans by watching videos of them doing tasks.
* They developed an algorithm called ORION that can extract the important steps of a task from a single video and use that to teach a robot how to do it.
* ORION can work with videos taken by everyday devices like an iPad, and the robot can learn to do the task even if the environment is different from the one in the video.
* The authors tested ORION on different tasks and found that it works well, even when the task is complex and requires the robot to do multiple steps.

**Who did this research:**

* The research was done by a team from the University of Texas at Austin and Sony AI Global.
* The authors of the paper are Yifeng Zhu, Arisrei Lim, Peter Stone, and Yuke Zhu.

**Want to learn more:**

* You can read the full paper here: [2405.20321] Vision-based Manipulation from Single Human Video with Open-World Object Graphs
* You can also check out the project page here: ORION: Vision-based Manipulation from Single Human Video with Open-World Object Graphs

bnew · Jul 25, 2024

Researchers are training home robots in simulations based on iPhone scans | TechCrunch

Researchers at MIT CSAIL this week are showcasing a new method for training home robots in simulation.

techcrunch.com

Researchers are training home robots in simulations based on iPhone scans

Brian Heater

1:00 PM PDT • July 24, 2024

Comment

Image Credits: Screenshot / YouTube

There’s a long list of reasons why you don’t see a lot of non-vacuum robots in the home. At the top of the list is the problem of unstructured and semi-structured environments. No two homes are the same, from layout to lighting to surfaces to humans and pets. Even if a robot can effectively map each home, the spaces are always in flux.

Researchers at MIT CSAIL this week are showcasing a new method for training home robots in simulation. Using an iPhone, someone can scan a part of their home, which can then be uploaded into a simulation.

Simulation has become a bedrock element of robot training in recent decades. It allows robots to try and fail at tasks thousands — or even millions — of times in the same amount of time it would take to do it once in the real world.

The consequences of failing in simulation are also significantly lower than in real life. Imagine for a moment that teaching a robot to put a mug in a dishwasher required it to break 100 real-life mugs in the process.

“Training in the virtual world in simulation is very powerful, because the robot can practice millions and millions of times,” researcher Pulkit Agrawal says in a video tied to the research. “It might have broken a thousand dishes, but it doesn’t matter, because everything was in the virtual world.”

Much like the robots themselves, however, simulation can only go so far when it comes to dynamic environments like the home. Making simulations as accessible as an iPhone scan can dramatically enhance the robot’s adaptability to different environments.

In fact, creating a robust enough database of environments such as these ultimately makes the system more adaptable when something is inevitably out of place, be it moving a piece of furniture or leaving a dish on the kitchen counter.

bnew · Jul 26, 2024

Eureka! NVIDIA Research Breakthrough Puts New Spin on Robot Learning

Pro

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Open-TeleVision: Why human intelligence could be the key to next-gen robotic automation​

The human touch​

How Open-TeleVision works​

New projects emerging quickly​

Challenges and future directions​

A new paradigm enterprise automation​

Veteran

Veteran

Veteran

Veteran

Researchers are training home robots in simulations based on iPhone scans​

Veteran

Open-TeleVision: Why human intelligence could be the key to next-gen robotic automation

The human touch

How Open-TeleVision works

New projects emerging quickly

Challenges and future directions

A new paradigm enterprise automation

Researchers are training home robots in simulations based on iPhone scans