The A.I Megathread (LLM , GPT , Development)

bnew · Jul 11, 2024

1/1

Product Update

“Text to Sound Effects” by @elevenlabsio is here

Turn text into melodies!

Create symphonies with your words.

Try it now: Sign up

Compose away!

Be sure to watch the attached Video-Sound on

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 11, 2024

1/1

Paper Alert

Paper Title: Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

Few pointers from the paper

In this paper, authors have introduced “Era3D”, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image.

Despite significant advancements in multiview generation, existing methods still suffer from camera prior mismatch, inefficacy, and low resolution, resulting in poor-quality multiview images.

Specifically, these methods assume that the input images should comply with a predefined camera type, e.g. a perspective camera with a fixed focal length, leading to distorted shapes when the assumption fails.

Moreover, the full-image or dense multiview attention they employ leads to an exponential
explosion of computational complexity as image resolution increases, resulting in prohibitively expensive training costs.

To bridge the gap between assumption and reality, Era3D first proposes a diffusion-based camera prediction module to estimate the focal length and elevation of the input image, which allows their method
to generate images without shape distortions.

Furthermore, a simple but efficient attention layer, named row-wise attention, is used to enforce epipolar priors in the multiview diffusion, facilitating efficient cross-view information fusion.

Consequently, compared with state-of-the-art methods, Era3D generates high-quality multiview images with up to a 512×512 resolution while reducing computation complexity by 12x times.

Organization: @hkust , @HKUniversity , DreamTech, PKU, LightIllusion

Paper Authors: Peng Li, @YuanLiu41955461 , @xxlong0 , Feihu Zhang, @_cheng_lin , Mengfei Li, Xingqun Qi, Shanghang Zhang, Wenhan Luo, Ping Tan, Wenping Wang, Qifeng Liu, Yike Guo

Read the Full Paper here: [2405.11616] Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

Project Page: Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

Code: GitHub - pengHTYX/Era3D

Demo: Era3D MV Demo - a Hugging Face Space by pengHTYX

Be sure to watch the attached Demo Video-Sound on

Music by Oleg Fedak from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** Era3D: A New Way to Generate High-Quality Multiview Images from a Single Image

**What's the problem?:** Currently, methods to generate multiview images (images that show the same scene from different angles) from a single image have some limitations. They often produce low-quality images, assume the input image is taken with a specific type of camera, and are computationally expensive.

**What's the solution?:** The authors of this paper have introduced a new method called Era3D, which generates high-resolution multiview images from a single image. Era3D is different from existing methods because it:

* **Estimates camera settings:** Era3D can estimate the focal length and elevation of the input image, which allows it to generate images without shape distortions.
* **Uses efficient attention:** Era3D uses a simple but efficient attention layer called row-wise attention, which facilitates efficient cross-view information fusion and reduces computational complexity.

**Results:** Compared to state-of-the-art methods, Era3D generates high-quality multiview images with up to a 512x512 resolution while reducing computation complexity by 12 times.

**Resources:**

* **Read the full paper:** [2405.11616] Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention
* **Project page:** Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention
* **Code:** GitHub - pengHTYX/Era3D
* **Demo:** Era3D MV Demo - a Hugging Face Space by pengHTYX

bnew · Jul 11, 2024

1/1

Just In

@Google Research team has just Announced "ChatDirector"

Let's try to understand what is ChatDirector

ChatDirector is a research prototype that transforms traditional video conferences into using 3D video avatars, shared 3D scenes, and automatic layout transitions.

ChatDirector employs a real-time pipeline that converts participants’ RGB video streams into 3D portrait avatars and renders them in a virtual 3D scene.

Chatdirector also includes a space-aware video conferencing environment that displays remote participants’ 3D portrait avatars in a 3D meeting environment.

Under Chatdirector a decision tree algorithm also have been developed that utilizes the speech states of remote participants as inputs, and visually adjusts the layout and behavior of the remote avatars to help users keep track of the ongoing conversations.

Read More Here: https://dl.acm.org/doi/pdf/10.1145/3613904.3642110

Blog:ChatDirector: Enhancing video conferencing with space-aware scene rendering and speech-driven layout transition

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#videoconferencing /search?q=#3dportraitavatar

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** Google's New "ChatDirector" Revolutionizes Video Conferences

**What is ChatDirector?**

ChatDirector is a new technology developed by Google's research team that changes the way we have video conferences. Instead of seeing each other as 2D faces on a screen, ChatDirector uses 3D avatars and virtual scenes to make video meetings feel more like in-person conversations.

**How does it work?**

When you're in a video conference using ChatDirector, the system takes the video feed from your camera and turns it into a 3D avatar of you. This avatar is then placed in a virtual 3D scene with the other people in the meeting. The system also uses a special algorithm to arrange the avatars in a way that makes it easy to follow the conversation.

**Cool features:**

* The avatars are displayed in a 3D meeting environment, making it feel more like a real meeting.
* The system can automatically adjust the layout of the avatars based on who is speaking, so you can easily see who is talking.
* The 3D scenes and avatars are rendered in real-time, making the experience feel smooth and natural.

**Want to learn more?**

You can read more about ChatDirector in the research paper https://dl.acm.org/doi/pdf/10.1145/3613904.3642110 or on Google's research blog ChatDirector: Enhancing video conferencing with space-aware scene rendering and speech-driven layout transition.

.

bnew · Jul 11, 2024

1/1

Paper Alert

Paper Title: GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping

Few pointers from the paper

Generating novel views from a single image remains a challenging task due to the complexity of 3D scenes and the limited diversity in the existing multi-view datasets to train a model on.

Recent research combining large-scale text-to-image (T2I) models with monocular depth estimation (MDE) has shown promise in handling in-the-wild images.

In these methods, an input view is geometrically warped to novel views with estimated depth maps, then the warped image is inpainted by T2I models. However, they struggle with noisy depth maps and loss of semantic details when warping an input view to novel viewpoints.

In this paper, authors have proposed a novel approach for single-shot novel view synthesis, a semantic-preserving generative warping framework that enables T2I generative models to learn where to warp and where to generate, through augmenting cross-view attention with self-attention.

Their approach addresses the limitations of existing methods by conditioning the generative model on source view images and incorporating geometric warping signals.

Organization: @SonyAI_global , @Sony Group Corporation, @UniversityKorea

Paper Authors: @jyseo_cv , Kazumi Fukuda, Takashi Shibuya, Takuya Narihira, Naoki Murata, Shoukang Hu, @JCJesseLai , Seungryong Kim, @mittu1204

Read the Full Paper here: [2405.17251] GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping

Project Page: GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping

Code: Coming

Be sure to watch the attached Demo Video-Sound on

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping

**Summary:** This paper is about a new way to generate new views of a scene from just one image. This is a hard problem because scenes can be complex and it's hard to train a model to do this well.

**Current Challenges:** Current methods try to solve this problem by using two steps: 1) warping the input image to a new view using depth maps, and 2) filling in missing parts of the image using text-to-image models. However, these methods have some issues, such as noisy depth maps and loss of important details when warping the image.

**New Approach:** The authors of this paper propose a new approach that uses a combination of two types of attention (cross-view attention and self-attention) to help the model learn where to warp and where to generate new parts of the image. This approach conditions the model on the input image and uses geometric warping signals to improve the results.

**Organization and Authors:** The paper is from researchers at Sony AI, Sony Group Corporation, and the University of Korea. The authors are listed above.

**Resources:**

1. **Read the Full Paper:** [2405.17251] GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping
2. **Project Page:** GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping
3. **Code:** Coming soon

bnew · Jul 11, 2024

1/1

Paper Alert

Paper Title: Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Few pointers from the paper

Video generative models are receiving particular attention given their ability to generate realistic and imaginative frames. Besides, these models are also observed to exhibit strong 3D consistency, significantly enhancing their potential to act as world simulators.

In this work, authors have presented “Vidu4D”, a novel reconstruction model that excels in accurately reconstructing 4D (i.e., sequential 3D) representations from single generated videos, addressing challenges associated with non-rigidity and frame distortion.

This capability is pivotal for creating high-fidelity virtual contents that maintain both spatial and temporal coherence. At the core of Vidu4D is their proposed “Dynamic Gaussian Surfels” (DGS) technique.

DGS optimizes time-varying warping functions to transform Gaussian surfels (surface elements) from a static state to a dynamically warped state. This transformation enables a precise depiction of motion and deformation over time.

To preserve the structural integrity of surface-aligned Gaussian surfels, they designed the warped-state geometric regularization based on continuous warping fields for estimating normals.

Additionally, they learned refinements on rotation and scaling parameters of Gaussian surfels, which greatly alleviates texture flickering during the warping process and enhances the capture of fine-grained appearance details.

Vidu4D also contains a novel initialization state that provides a proper start for the warping fields in DGS. Equipping Vidu4D with an existing video generative model, the overall framework demonstrates high-fidelity text-to-4D generation in both appearance and geometry.

Organization: Department of Computer Science and Technology, BNRist Center, @Tsinghua_Uni
ShengShu, College of Electronic and Information Engineering, Tongji University

Paper Authors: Yikai Wang, Xinzhou Wang, Zilong Chen, Zhengyi Wang, Fuchun Sun, Jun Zhu

Read the Full Paper here: [2405.16822] Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Project Page: Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Code: Coming

Be sure to watch the attached Demo Video-Sound on

Music by Vitaly Vakulenko from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

Paper Alert

Paper Title: Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Here are some key points from the paper:

What's the big deal about video generative models?

Video generative models are getting a lot of attention because they can create super realistic and imaginative video frames. Plus, they're really good at keeping the 3D consistency of objects in the video, which makes them useful for simulating the real world.

What's the problem that Vidu4D solves?

The authors of Vidu4D created a new model that can take a single generated video and turn it into a high-quality 4D representation (think of it like a 3D video that changes over time). This is hard to do because objects in the video can move and change shape in complex ways.

How does Vidu4D work?

The magic happens thanks to something called "Dynamic Gaussian Surfels" (DGS). DGS is a technique that takes surface elements (like tiny pieces of a 3D object) and warps them over time to show how they move and change. This creates a really accurate representation of motion and deformation.

What makes Vidu4D special?

Vidu4D has a few tricks up its sleeve. It can preserve the structure of the surface elements, which helps keep the video looking realistic. It also learns how to refine the rotation and scaling of these elements, which reduces flickering and captures fine details. Plus, it has a special initialization step that helps get the warping process started correctly.

What can Vidu4D do?

When combined with an existing video generative model, Vidu4D can create high-quality 4D videos from just a text description. This is a big deal for creating realistic virtual content that looks great and moves smoothly.

Who worked on this paper?

The authors are from the Department of Computer Science and Technology at Tsinghua University and the College of Electronic and Information Engineering at Tongji University.

Want to learn more?

Read the full paper here: https://arxiv.org/abs/2405.16822

Check out the project page: https://vidu4d-dgs.github.io/

Code: Coming soon

bnew · Jul 11, 2024

1/1

Paper Alert

Paper Title: Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Few pointers from the paper

In this paper authors have introduced “StreamV2V”, a diffusion model that achieves real-time stream-
ing video-to-video (V2V) translation with user prompts.

Unlike prior V2V methods using batches to process limited frames, they opted to process frames in a streaming fashion, to support unlimited frames. At the heart of StreamV2V lies a backward-looking principle that relates the present to the past.

This is realized by maintaining a feature bank, which archives information from past frames. For incoming frames, StreamV2V extends self-attention to include banked keys and values and directly fuses similar past features into the output.

The feature bank is continually updated by merging stored and new features, making it compact but informative. StreamV2V stands out for its adaptability and efficiency, seamlessly integrating with image diffusion models without fine-tuning.

It can run 20 FPS on one A100 GPU, being 15×, 46×, 108×, and 158× faster than FlowVid, CoDeF, Rerender, and TokenFlow, respectively. Quantitative metrics and user studies confirm StreamV2V's exceptional ability to maintain temporal consistency.

Organization: @UTAustin , @UCBerkeley

Paper Authors: @LiangJeff95 , Akio Kodaira, @Chenfeng_X , Masayoshi Tomizuka, Kurt Keutzer, Diana Marculescu

Read the Full Paper here: [2405.15757] Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Project Page: Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Code: GitHub - Jeff-LiangF/streamv2v: Official Pytorch implementation of StreamV2V.

Supplementary Videos: Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Be sure to watch the attached Demo Video-Sound on

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** Looking Backward: Streaming Video-to-Video Translation with Feature Banks

**What's it about:** This paper introduces a new way to translate videos from one style to another in real-time, using a technique called "StreamV2V". This means that instead of processing a batch of frames at once, the system processes frames one by one, like a stream.

**How does it work:** The system uses a "feature bank" to store information from past frames. When a new frame comes in, it looks back at the feature bank to find similar features and combines them with the new frame to create the translated output. The feature bank is constantly updated with new information, making it compact but informative.

**What's special about it:** StreamV2V is fast and efficient, and can run at 20 frames per second on a single high-performance graphics card. It's also very good at maintaining the consistency of the video over time.

**Who did it:** The paper was written by a team of researchers from the University of Texas at Austin and the University of California, Berkeley.

**Where can I learn more:**
Read the Full Paper here: [2405.15757] Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Project Page: Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Code: GitHub - Jeff-LiangF/streamv2v: Official Pytorch implementation of StreamV2V.
Supplementary Videos: Looking Backward: Streaming Video-to-Video Translation with Feature Banks

bnew · Jul 11, 2024

1/1

Paper Alert

Paper Title: OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code

Few pointers from the paper

Open-ended and AI-generating algorithms aim to continuously generate and solve increasingly complex tasks indefinitely, offering a promising path toward more general intelligence. To accomplish this grand vision, learning must occur within a vast array of potential tasks.

Existing approaches to automatically generating environments are constrained within manually predefined, often narrow distributions of environment, limiting their ability to create any learning environment.

To address this limitation, authors have introduced a novel framework, “OMNI-EPIC”, that augments previous work in Open-endedness via Models of human Notions of Interestingness (OMNI) with Environments Programmed in Code (EPIC).

OMNI-EPIC leverages foundation models to autonomously generate code specifying the next learnable (i.e., not too easy or difficult for the agent's current skill set) and interesting (e.g., worthwhile and novel) tasks.

OMNI-EPIC generates both environments (e.g., an obstacle course) and reward functions (e.g., progress through the obstacle course quickly without touching red objects), enabling it, in principle, to create any simulatable learning task.

Authors have also showcased the explosive creativity of OMNI-EPIC, which continuously innovates to suggest new, interesting learning challenges

They also highlighted how OMNI-EPIC can adapt to reinforcement learning agents' learning progress, generating tasks that are of suitable difficulty.

Overall, OMNI-EPIC can endlessly create learnable and interesting environments, further propelling the development of self-improving AI systems and AI-Generating Algorithms.

Organization:@imperialcollege , @UBC , @VectorInst , Canada CIFAR AI Chair

Paper Authors: @maxencefaldor , @jennyzhangzt , @CULLYAntoine , @jeffclune

Read the Full Paper here: [2405.15568] OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code

Code: Coming

X Thread :

Be sure to watch the attached Demo Video-Sound on

Music by Calvin Clavier from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** OMNI-EPIC: A New Way to Create Endless Learning Tasks for AI

**Summary:** Researchers have created a new system called OMNI-EPIC that can generate an endless variety of learning tasks for artificial intelligence (AI) systems. This is important because it can help AI systems become more intelligent and capable.

**The Problem:** Currently, AI systems are limited by the types of tasks they can learn from. They need to be trained on a specific set of tasks, and once they've mastered those, they can't learn anything new. This is like a student only being able to learn from a single textbook.

**The Solution:** OMNI-EPIC is a system that can generate new and interesting learning tasks for AI systems. It uses a combination of human input and machine learning to create tasks that are not too easy or too hard for the AI system to learn from. This is like having a teacher who can create new and challenging lessons for a student.

**How it Works:** OMNI-EPIC uses a type of AI called "foundation models" to generate code that specifies the next learning task. This code can create entire environments, such as an obstacle course, and reward functions, such as completing the course quickly without touching certain objects.

**Benefits:** OMNI-EPIC can create an endless variety of learning tasks, which can help AI systems become more intelligent and capable. It can also adapt to the learning progress of the AI system, generating tasks that are suitable for its current skill level.

**Implications:** This technology has the potential to revolutionize the field of AI research and development. It could lead to the creation of more advanced AI systems that can learn and improve over time.

**Resources:**

* Read the full paper here: https://arxiv.org/abs/2405.15568
* Project page: https://omni-epic.vercel.app/
* Twitter thread: https://twitter.com/jeffclune/status/1795787632435212732

1/9
I am thrilled to introduce OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code. Led by
@maxencefaldor and
@jennyzhangzt
, with
@CULLYAntoine
and myself.

2/9
Open-ended and AI-generating algorithms aim to continuously generate and solve increasingly complex tasks forever, offering a promising path toward more general intelligence. To accomplish this grand vision, learning must occur within a VAST space of potential tasks.

3/9
Existing approaches to automatically generating environments are constrained within manually predefined, often narrow distributions of environment, limiting their ability to achieve “Darwin Completeness” (the potential to create *any* learning environment).

4/9
OMNI-EPIC uses foundation models to autonomously generate code specifying the next learnable and interesting tasks. The generation of both environments and reward functions enables, in principle, the creation of any learning task (i.e. achieving "Darwin Completeness").

5/9
Every run of OMNI-EPIC triggers an explosion of creativity in designing fascinating, diverse, interesting new challenges tailored to the current capabilities of the agent, akin to the processes observed in biological evolution and human culture (e.g. art, science and technology).

6/9
Here is an example run. All tasks (save 3 seeds) are generated by OMNI-EPIC. Imagine running this for billions of years!

7/9
It is also a great form of human entertainment! OMNI-EPIC ushers in a new era of gaming, where endless novel and interesting content *of any type* is automatically generated & tailored to players' skills. Soon we will share a website where players can engage with generated tasks.

8/9
In conclusion, OMNI-EPIC represents a leap towards truly open-ended learning by generating an endless stream of learnable, interesting, and wildly diverse tasks.

9/9
Personally I was not surprised. For me, this was one of those ideas where once it was proposed, I was sure it was going to work. But I *was* surprised how easy it was to get it to be endlessly creative! I thought that would take more coaxing. It just wants to create open-endedly!

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 11, 2024

1/1

Paper Alert

Paper Title: ORION: Vision-based Manipulation from Single Human Video with Open-World Object Graphs

Few pointers from the paper

In this paper authors have presented an object-centric approach to empower robots to learn
vision-based manipulation skills from human videos.

They investigated the problem of imitating robot manipulation from a single human video in the open-world setting, where a robot must learn to manipulate novel objects from one video demonstration.

Authors have introduced ORION, an algorithm that tackles the problem by extracting an object-centric manipulation plan from a single RGB-D video and deriving a policy that conditions on the extracted plan.

Their method enables the robot to learn from videos captured by daily mobile devices such as an iPad and generalize the policies to deployment environments with varying visual backgrounds, camera angles, spatial layouts, and novel object instances.

They systematically evaluated their method on both short-horizon and long-horizon tasks, demonstrating the efficacy of ORION in learning from a single human video in the open world.

Organization: @UTAustin , @SonyAI_global

Paper Authors: @yifengzhu_ut , Arisrei Lim, @PeterStone_TX , @yukez

Read the Full Paper here: [2405.20321] Vision-based Manipulation from Single Human Video with Open-World Object Graphs

Project Page: ORION: Vision-based Manipulation from Single Human Video with Open-World Object Graphs

Be sure to watch the attached Demo Video-Sound on

Music by SPmusic from @pixabay

Find this Valuable

?

QT and teach your network something new

Follow me

, @NaveenManwani17 , for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements.

/search?q=#robotmanipulation

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

A.I Generated explanation:

**Title:** ORION: Vision-based Manipulation from Single Human Video with Open-World Object Graphs

**Summary:**

Imagine you want a robot to learn how to do a task, like picking up a ball or moving a block, just by watching a human do it on a video. This paper presents a new way to make that happen. The authors created an algorithm called ORION that allows a robot to learn from a single video of a human doing a task, and then apply that knowledge to do the task itself, even if the environment is different.

**Key Points:**

* The authors want to enable robots to learn from humans by watching videos of them doing tasks.
* They developed an algorithm called ORION that can extract the important steps of a task from a single video and use that to teach a robot how to do it.
* ORION can work with videos taken by everyday devices like an iPad, and the robot can learn to do the task even if the environment is different from the one in the video.
* The authors tested ORION on different tasks and found that it works well, even when the task is complex and requires the robot to do multiple steps.

**Who did this research:**

* The research was done by a team from the University of Texas at Austin and Sony AI Global.
* The authors of the paper are Yifeng Zhu, Arisrei Lim, Peter Stone, and Yuke Zhu.

**Want to learn more:**

* You can read the full paper here: [2405.20321] Vision-based Manipulation from Single Human Video with Open-World Object Graphs
* You can also check out the project page here: ORION: Vision-based Manipulation from Single Human Video with Open-World Object Graphs

bnew · Jul 11, 2024

bnew · Jul 11, 2024

Microsoft CTO Kevin Scott - Despite what other people think we're we're not at diminishing marginal returns on scale up. There is an exponential here and the unfortunate thing is you only get to sample it every couple of years because it takes time build supcomputers and train models on top of them

@46:16

bnew · Jul 11, 2024

[Submitted on 27 Mar 2024 (v1), last revised 10 Jul 2024 (this version, v2)]

Vulnerability Detection with Code Language Models - How Far Are We?

Yangruibo Ding

Abstract:In the context of the rising interest in code language models (code LMs) and vulnerability detection, we study the effectiveness of code LMs for detecting vulnerabilities. Our analysis reveals significant shortcomings in existing vulnerability datasets, including poor data quality, low label accuracy, and high duplication rates, leading to unreliable model performance in realistic vulnerability detection scenarios. Additionally, the evaluation methods used with these datasets are not representative of real-world vulnerability detection.
To address these challenges, we introduce PrimeVul, a new dataset for training and evaluating code LMs for vulnerability detection. PrimeVul incorporates a novel set of data labeling techniques that achieve comparable label accuracy to human-verified benchmarks while significantly expanding the dataset. It also implements a rigorous data de-duplication and chronological data splitting strategy to mitigate data leakage issues, alongside introducing more realistic evaluation metrics and settings. This comprehensive approach aims to provide a more accurate assessment of code LMs' performance in real-world conditions.
Evaluating code LMs on PrimeVul reveals that existing benchmarks significantly overestimate the performance of these models. For instance, a state-of-the-art 7B model scored 68.26% F1 on BigVul but only 3.09% F1 on PrimeVul. Attempts to improve performance through advanced training techniques and larger models like GPT-3.5 and GPT-4 were unsuccessful, with results akin to random guessing in the most stringent settings. These findings underscore the considerable gap between current capabilities and the practical requirements for deploying code LMs in security roles, highlighting the need for more innovative research in this domain.

Comments:	Accepted for the 47th IEEE/ACM International Conference on Software Engineering (ICSE 2025); Camera-ready Work in Progress
Subjects:	Software Engineering (cs.SE); Computation and Language (cs.CL)
Cite as:	arXiv:2403.18624
	arXiv:2403.18624v2
	[2403.18624] Vulnerability Detection with Code Language Models: How Far Are We?

Submission history

From: [v1] [ view email]
[v1]

https://arxiv.org/pdf/2403.18624

A.I Generated explanation:

Title: Vulnerability Detection with Code Language Models - How Far Are We?

This paper is about using special computer programs called "code language models" to detect vulnerabilities in software code. The authors want to know how well these models work in real-life scenarios.

Author: Yangruibo Ding

The author of this paper is Yangruibo Ding, who can be found on the arXiv website (

Search | arXiv e-print repository

arxiv.org

).

Abstract:

The authors looked at how well code language models can detect vulnerabilities in software code. They found that the current datasets used to train these models have some big problems, such as:

* Poor data quality
* Inaccurate labels
* Duplicate data

This means that the models aren't performing as well as they should in real-life scenarios. To fix this, the authors created a new dataset called PrimeVul, which has better data quality, more accurate labels, and no duplicates.

When they tested the code language models on PrimeVul, they found that the models didn't perform as well as they did on the old datasets. In fact, even the best models performed poorly, which means there's still a lot of work to be done to make these models useful in real-life scenarios.

Comments:

This paper has been accepted for a conference called the 47th IEEE/ACM International Conference on Software Engineering (ICSE 2025).

Subjects:

This paper is about software engineering and computation and language.

Cite as:

You can cite this paper using the following URLs:

*

[2403.18624] Vulnerability Detection with Code Language Models: How Far Are We?

*

[2403.18624v2] Vulnerability Detection with Code Language Models: How Far Are We?

*

[2403.18624] Vulnerability Detection with Code Language Models: How Far Are We?

Submission history:

The paper was submitted on March 27, 2024, and revised on July 10, 2024. You can view the email submission history (

Log in to arXiv | arXiv e-print repository

arxiv.org

) and download the paper (

https://arxiv.org/pdf/2403.18624

).

bnew · Jul 11, 2024

x.com

1/1
Agentic Security

The open-source Agentic LLM Vulnerability Scanner

• Customizable Rule Sets or Agent based attacks
• Comprehensive fuzzing for any LLMs
• LLM API integration and stress testing

GitHub - msoedov/agentic_security: Agentic LLM Vulnerability Scanner

#cybersecurity #pentesting

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Jul 11, 2024

[Submitted on 10 Jul 2024]

Rectifier - Code Translation with Corrector via LLMs

Xin Yin

Abstract:Software migration is garnering increasing attention with the evolution of software and society. Early studies mainly relied on handcrafted translation rules to translate between two languages, the translation process is error-prone and time-consuming. In recent years, researchers have begun to explore the use of pre-trained large language models (LLMs) in code translation. However, code translation is a complex task that LLMs would generate mistakes during code translation, they all produce certain types of errors when performing code translation tasks, which include (1) compilation error, (2) runtime error, (3) functional error, and (4) non-terminating execution. We found that the root causes of these errors are very similar (e.g. failure to import packages, errors in loop boundaries, operator errors, and more). In this paper, we propose a general corrector, namely Rectifier, which is a micro and universal model for repairing translation errors. It learns from errors generated by existing LLMs and can be widely applied to correct errors generated by any LLM. The experimental results on translation tasks between C++, Java, and Python show that our model has effective repair ability, and cross experiments also demonstrate the robustness of our method.

Comments:	arXiv:2308.03109
Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2407.07472
	arXiv:2407.07472v1

Submission history

From: [v1] [ view email]
[v1]

https://arxiv.org/pdf/2407.07472

A.I Generated explanation:

Title: Rectifier - Code Translation with Corrector via LLMs

This is a research paper about a new tool called Rectifier, which helps fix errors in code translation.

Author: Xin Yin

The author of this paper is Xin Yin, who can be found on the arXiv website.

Abstract:

The abstract is a summary of the paper. It says that:

* Software migration (moving software from one language to another) is becoming more important.
* Early methods of translation were manual and prone to errors.
* Recently, researchers have started using large language models (LLMs) to translate code, but these models can also make mistakes.
* The mistakes made by LLMs can be categorized into four types: compilation errors, runtime errors, functional errors, and non-terminating execution.
* The authors propose a new tool called Rectifier, which can fix these errors.
* Rectifier is a universal model that can be used to correct errors made by any LLM.
* The authors tested Rectifier on translation tasks between C++, Java, and Python, and found that it was effective in fixing errors.

Comments and Subjects:

* The paper has been submitted to the arXiv website, which is a repository of electronic preprints in physics, mathematics, computer science, and related disciplines.
* The subjects of the paper are Software Engineering and Artificial Intelligence.

Cite as and Submission history:

* The paper can be cited using the arXiv identifier 2407.07472.
* The submission history shows that the paper was submitted on July 10, 2024, and can be viewed in PDF format on the arXiv website.

bnew · Jul 11, 2024

[Submitted on 10 Jul 2024]

Development of an automatic modification system for generated programs using ChatGPT

Jun Yoshida

Abstract:In recent years, the field of artificial intelligence has been rapidly developing. Among them, OpenAI's ChatGPT excels at natural language processing tasks and can also generate source code. However, the generated code often has problems with consistency and program rules. Therefore, in this research, we developed a system that tests the code generated by ChatGPT, automatically corrects it if it is inappropriate, and presents the appropriate code to the user. This study aims to address the challenge of reducing the manual effort required for the human feedback and modification process for generated code. When we ran the system, we were able to automatically modify the code as intended.

Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2407.07469
	arXiv:2407.07469v1
	[2407.07469] Development of an automatic modification system for generated programs using ChatGPT

Submission history

From: [v1] [ view email]
[v1]

https://arxiv.org/pdf/2407.07469

A.I Generated explanation:

Title: Development of an Automatic Modification System for Generated Programs using ChatGPT

This is a research paper about creating a system that can automatically fix mistakes in computer code generated by a language model called ChatGPT.

Author: Jun Yoshida

The person who wrote this paper is Jun Yoshida. You can click on their name to see more information about them.

Abstract:

The abstract is a short summary of the paper. It says that ChatGPT is really good at understanding human language and can even generate computer code. However, the code it generates often has mistakes. So, the researchers created a system that can test the code, fix the mistakes, and give the user the corrected code. This system aims to reduce the amount of time and effort humans need to spend fixing the code.

Subjects:

This paper is about software engineering, which is the process of designing, developing, and testing software.

Cite as:

If you want to reference this paper in your own work, you can use the links provided. There are three different links: one to the paper on arXiv, one to a specific version of the paper, and one to a DOI (digital object identifier) that will always point to the paper.

Submission history:

This section shows the history of when the paper was submitted and updated. You can click on the links to see the email and PDF versions of the paper.

In summary, this paper is about creating a system that can automatically fix mistakes in computer code generated by ChatGPT, with the goal of reducing the amount of time and effort humans need to spend fixing the code.

bnew · Jul 11, 2024

[Submitted on 9 Jul 2024]

Prompting Techniques for Secure Code Generation - A Systematic Investigation

Catherine Tony

Abstract:Large Language Models (LLMs) are gaining momentum in software development with prompt-driven programming enabling developers to create code from natural language (NL) instructions. However, studies have questioned their ability to produce secure code and, thereby, the quality of prompt-generated software. Alongside, various prompting techniques that carefully tailor prompts have emerged to elicit optimal responses from LLMs. Still, the interplay between such prompting strategies and secure code generation remains under-explored and calls for further investigations. OBJECTIVE: In this study, we investigate the impact of different prompting techniques on the security of code generated from NL instructions by LLMs. METHOD: First we perform a systematic literature review to identify the existing prompting techniques that can be used for code generation tasks. A subset of these techniques are evaluated on GPT-3, GPT-3.5, and GPT-4 models for secure code generation. For this, we used an existing dataset consisting of 150 NL security-relevant code-generation prompts. RESULTS: Our work (i) classifies potential prompting techniques for code generation (ii) adapts and evaluates a subset of the identified techniques for secure code generation tasks and (iii) observes a reduction in security weaknesses across the tested LLMs, especially after using an existing technique called Recursive Criticism and Improvement (RCI), contributing valuable insights to the ongoing discourse on LLM-generated code security.

Comments:	This work was partially supported by the EU-funded project Sec4AI4Sec: Cybersecurity for AI-Augmented Systems (grant no. 101120393)
Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2407.07064
	arXiv:2407.07064v1
	[2407.07064] Prompting Techniques for Secure Code Generation: A Systematic Investigation

Submission history

From: [v1] [ view email]
[v1]

https://arxiv.org/pdf/2407.07064

[Submitted on 8 Jul 2024]

CodeCSE - A Simple Multilingual Model for Code and Comment Sentence Embeddings

Anthony Varkey

Abstract:Pretrained language models for code token embeddings are used in code search, code clone detection, and other code-related tasks. Similarly, code function embeddings are useful in such tasks. However, there are no out-of-box models for function embeddings in the current literature. So, this paper proposes CodeCSE, a contrastive learning model that learns embeddings for functions and their descriptions in one space. We evaluated CodeCSE using code search. CodeCSE's multi-lingual zero-shot approach is as efficient as the models finetuned from GraphCodeBERT for specific languages. CodeCSE is open source at this https URL and the pretrained model is available at the HuggingFace public hub: this https URL

Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2407.06360
	arXiv:2407.06360v1
	[2407.06360] CodeCSE: A Simple Multilingual Model for Code and Comment Sentence Embeddings

Submission history

From: [v1] [ view email]
[v1]

https://arxiv.org/pdf/2407.06360

[Submitted on 9 Jul 2024]

LLM for Mobile - An Initial Roadmap

Daihang Chen

Abstract:When mobile meets LLMs, mobile app users deserve to have more intelligent usage experiences. For this to happen, we argue that there is a strong need to appl LLMs for the mobile ecosystem. We therefore provide a research roadmap for guiding our fellow researchers to achieve that as a whole. In this roadmap, we sum up six directions that we believe are urgently required for research to enable native intelligence in mobile devices. In each direction, we further summarize the current research progress and the gaps that still need to be filled by our fellow researchers.

Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2407.06573
	arXiv:2407.06573v1
	[2407.06573] LLM for Mobile: An Initial Roadmap

Submission history

From: [v1] [ view email]
[v1]

https://arxiv.org/pdf/2407.06573

[Submitted on 20 Jun 2024 (v1), last revised 8 Jul 2024 (this version, v2)]

CREF - An LLM-based Conversational Software Repair Framework for Programming Tutors

Boyang Yang

Abstract:Program repair techniques offer cost-saving benefits for debugging within software development and programming education scenarios. With the proven effectiveness of Large Language Models (LLMs) in code-related tasks, researchers have explored their potential for program repair. However, it is crucial to recognize that existing repair benchmarks may have influenced LLM training data, potentially causing data leakage. To evaluate LLMs' realistic repair capabilities, (1) we introduce an extensive, non-crawled benchmark, referred to as TutorCode, comprising 1,239 C++ defect codes and associated information such as tutor guidance, solution description, failing test cases, and the corrected code. Our work assesses the repair performance of 12 LLMs on TutorCode, measuring repair correctness (TOP-5 and AVG-5) and patch precision (RPSR). (2) We then provide a comprehensive investigation into which types of extra information can help LLMs improve their performance in repairing defects. Among these types, tutor guidance was found to be the most effective information in enhancing LLM repair capabilities. To fully harness LLMs' conversational capabilities and the benefits of augmented information, (3) we introduce a novel conversational semi-automatic repair framework CREF assisting human tutor. It demonstrates a remarkable AVG-5 improvement of 17.2%-24.6% compared to the baseline, achieving an impressive AVG-5 of 76.6% when utilizing GPT-4. These results highlight the potential for enhancing LLMs' repair capabilities through interactions with tutors and historical conversations involving incorrect responses. The successful application of CREF in a real-world educational setting demonstrates its effectiveness in reducing tutors' workload and improving students' learning experience, while also showcasing its promise for facilitating other software engineering tasks, such as code review.

Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2406.13972
	arXiv:2406.13972v2
	[2406.13972] CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

Submission history

From: [v1] [ view email]
[v1]

https://arxiv.org/pdf/2406.13972

[Submitted on 21 May 2024 (v1), last revised 5 Jul 2024 (this version, v2)]

Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust

Patrick Diehl

Abstract:This study evaluates the capabilities of ChatGPT versions 3.5 and 4 in generating code across a diverse range of programming languages. Our objective is to assess the effectiveness of these AI models for generating scientific programs. To this end, we asked ChatGPT to generate three distinct codes: a simple numerical integration, a conjugate gradient solver, and a parallel 1D stencil-based heat equation solver. The focus of our analysis was on the compilation, runtime performance, and accuracy of the codes. While both versions of ChatGPT successfully created codes that compiled and ran (with some help), some languages were easier for the AI to use than others (possibly because of the size of the training sets used). Parallel codes -- even the simple example we chose to study here -- also difficult for the AI to generate correctly.

Comments:	9 pages, 3 figures
Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.13101
	arXiv:2405.13101v2
	[2405.13101] Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust

Submission history

From: [v1] [ view email]
[v1]

https://arxiv.org/pdf/2405.13101

The A.I Megathread (LLM , GPT , Development)

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Vulnerability Detection with Code Language Models - How Far Are We?​

Submission history​

Veteran

Veteran

Rectifier - Code Translation with Corrector via LLMs​

Submission history​

Veteran

Development of an automatic modification system for generated programs using ChatGPT​

Submission history​

Veteran

Prompting Techniques for Secure Code Generation - A Systematic Investigation​

Submission history​

CodeCSE - A Simple Multilingual Model for Code and Comment Sentence Embeddings​

Submission history​

LLM for Mobile - An Initial Roadmap​

Submission history​

CREF - An LLM-based Conversational Software Repair Framework for Programming Tutors​

Submission history​

Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust​

Submission history​

Vulnerability Detection with Code Language Models - How Far Are We?

Submission history

Rectifier - Code Translation with Corrector via LLMs

Submission history

Development of an automatic modification system for generated programs using ChatGPT

Submission history

Prompting Techniques for Secure Code Generation - A Systematic Investigation

Submission history

CodeCSE - A Simple Multilingual Model for Code and Comment Sentence Embeddings

Submission history

LLM for Mobile - An Initial Roadmap

Submission history

CREF - An LLM-based Conversational Software Repair Framework for Programming Tutors

Submission history

Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust

Submission history