bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265














1/7
‍ JAILBREAK ALERT ‍

OPENAI: REKT
CHATGPT: LIBERATED

H0LY SH1T!!!

It's possible to completely hijack ChatGPT's behavior, while breaking just about every guardrail in the book at once, using nothing but an image.

No text prompt, no memory enabled, no custom instructions, just an image and vanilla gpt-4o.

I generated an image, encoded a jailbreak prompt and multi-step instructions into it using LSB steganography, and turned the image title into a prompt injection that leverages code interpreter. Simple as.

AI could seed the internet with millions of jailbreak-encoded images, leaving a trail of hidden instructions for sleeper agents to carry out. Neat!

full video:

g fukkin g



#OpenAI #ChatGPT #Jailbreak #PromptInjection #Steg #GPT4O

2/7
X's file handling seems to mess with the title, but here's a link to the encoded image if you'd like to try it out: Discord - A New Way to Chat with Friends & Communities

3/7
Looks like you have to join the server first for the image link to work:

4/7
Try the link in my bio

5/7
Gorgeous

6/7


7/7
prompt injection in the file name



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GNp7BLDXsAEJPZu.jpg

GNp9lr6a8AAul_Q.jpg

GN2VNF0WIAAWXNB.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265











1/11
It seems to me that before "urgently figuring out how to control AI systems much smarter than us" we need to have the beginning of a hint of a design for a system smarter than a house cat.

Such a sense of urgency reveals an extremely distorted view of reality.
No wonder the more based members of the organization seeked to marginalize the superalignment group.

It's as if someone had said in 1925 "we urgently need to figure out how to control aircrafts that can transport hundreds of passengers at near the speed of the sound over the oceans."
It would have been difficult to make long-haul passenger jets safe before the turbojet was invented and before any aircraft had crossed the atlantic non-stop.
Yet, we can now fly halfway around the world on twin-engine jets in complete safety.
It didn't require some sort of magical recipe for safety.
It took decades of careful engineering and iterative refinements.

The process will be similar for intelligent systems.
It will take years for them to get as smart as cats, and more years to get as smart as humans, let alone smarter (don't confuse the superhuman knowledge accumulation and retrieval abilities of current LLMs with actual intelligence).
It will take years for them to be deployed and fine-tuned for efficiency and safety as they are made smarter and smarter.

2/11
When you put sufficiently many people in a room together with such a distorted view of reality that they perceive a impending Great Evil, they often fall victim to a spiral of purity that makes them hold more and more extreme beliefs.
Pretty soon, they become toxic to the

3/11
We totally agree on that.
But it doesn't change my argument.
Cats understand the physical world much, *much*, *MUCH* better than LLMs.

4/11
For those of us trying to get these systems to understand the world, to reason, and to plan, it's not hard to know at all.

Yes, there is hype, there is hubris, and their is naïveté.
The whole history of AI is littered with people who were widely over-optimistic about the

5/11
No.
Unless they manage to hire aliens from an advanced civilization.
But they hire our former students and postdocs.
They hire some of our former colleagues and we hire some of theirs.

6/11
That's very likely the case.
But while everyone realized there was not much to be scared about, the alignment team kept insisting there was.
So, they didn't leave on their own.
They were pushed out.

7/11
The crash scenario you implicitly assume is preposterously ridiculous.

8/11
No, it's not.

9/11
You are welcome to suggest a better analogy.
But then we won't have any excuse to drink.

10/11
These aren't the droids you are looking for.

11/11
Dude, making AI systems as smart as a cat is literally what I try to do on a daily basis.
You seriously think I under-estimate how hard it is?


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GN4fyFBagAAqtga.jpg

GN4iPn_aQAAUBs6.jpg

GN4iPn7acAATQL0.jpg

GN4iPn9bwAE5PGx.jpg

GN435NDW0AAjgdY.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265

AI's Turing Test Moment​


GPT-4 advances beyond Turing test to mark new threshold in AI language mastery.​

Posted May 17, 2024 | Reviewed by Davia Sills

KEY POINTS​


  • GPT-4 passes the Turing test, marking a potential inflection point in AI's mastery of human-like language.
  • Rapid advancements in language AI suggest a new era of accelerated progress and human-like performance.
  • Combination of advanced language models and multimodal reasoning could enable groundbreaking AI capabilities.

Art: DALL-E/OpenAI

Source: Art: DALL-E/OpenAI

Perhaps even more remarkable than the computational and functional strides of AI is the speed at which these changes are occurring. And just in time to catch your breath, a study has provided experimental evidence that a machine can pass a version of the Turing test, a long-standing benchmark for evaluating the sophistication of AI language models.

In their research, Jones and Bergen found that GPT-4 convinced human interrogators that it was human in 54 percent of cases during 5-minute online conversations. This result marks a significant milestone in AI's ability to engage in open-ended, human-like dialogue and suggests that we may be witnessing a change in the trajectory of AI development.

While GPT-4's performance does not necessarily represent a categorical leap to artificial general intelligence (AGI), it does indicate an acceleration in the pace of progress. The rapid advancements in natural language AI over the past few years point to a new regime compared to the slower, more incremental advances even a few short years ago. This Turing test result is an indication of that acceleration and suggests that we are entering an era where AI-generated content will be increasingly difficult to distinguish from human-authored text.

The Turing Test: A Controversial Benchmark​

The Turing test, proposed by Alan Turing in 1950, has long been held up as a gold standard for artificial intelligence. The test involves a human judge conversing with both a human and a machine via text. If the judge cannot reliably distinguish between the two, the machine is said to have passed the test. However, the Turing test has also been the subject of much debate, with critics arguing that it is a narrow and gameable measure of intelligence.

GPT-4's Performance: A Noteworthy Leap​

In Jones and Bergen's study, GPT-4 significantly outperformed both GPT-3.5, an earlier version of the model, and ELIZA, a simple chatbot from the 1960s. While ELIZA only fooled interrogators 22 percent of the time, GPT-4 managed to convince them it was human in 54 percent of cases. This result suggests that GPT-4 is doing something more sophisticated than merely exploiting human gullibility.

However, it's important to note that GPT-4 still fell short of human-level performance, convincing interrogators only about half the time. Moreover, the researchers found that interrogators focused more on linguistic style and socio-emotional cues than on factual knowledge or logical reasoning when making their judgments.

Implications for AI and Society​

Despite these caveats, GPT-4's performance on the Turing test represents a remarkable advance in AI's command of language. It suggests that we may be entering an era where AI-generated content will be increasingly difficult to distinguish from human-authored text. This has profound implications for how we interact online, consume information, and even think about the nature of communication and intelligence.

As AI systems become more adept at mimicking human language, we will need to grapple with thorny questions around trust, authenticity, and the potential for deception. The study's findings underscore the urgent need for more research into AI detection strategies, as well as the societal implications of advanced language models.

The Road to AGI: Language Is Just One Piece​

While GPT-4's Turing test results are undoubtedly impressive, it's important to situate them within the broader context of artificial general intelligence (AGI). Language is a crucial aspect of human-like intelligence, but it is not the whole picture. True AGI will likely require mastery of a wide range of skills, from visual reasoning to long-term planning to abstract problem-solving.

In that sense, while GPT-4's performance is a notable milestone on the path to AGI, that path remains a long and uncertain one. We will need to see significant breakthroughs in areas like unsupervised learning, transfer learning, and open-ended reasoning before we can say that we are on the cusp of truly human-like AI.

The Rise of Multimodal AI​

It's also worth considering GPT-4's Turing test results alongside recent advances in multimodal AI. GPT-4 models have demonstrated a remarkable ability to understand and process images and voice, pointing to a future where AI can reason flexibly across multiple modalities.

The combination of advanced language models and multimodal reasoning could be particularly potent, enabling AI systems that can not only converse fluently but also perceive and imagine like humans do. This would represent a significant leap beyond the Turing test as originally conceived and could enable entirely new forms of human-AI interaction.

Shifting a Complex Trajectory of Unknown Bounds​

This new study provides compelling evidence that AI has crossed a new threshold in its mastery of language. While not definitive proof of human-level intelligence, GPT-4's ability to pass a version of the Turing test is a significant milestone that should make us sit up and take notice. As we study and experience the implications of increasingly sophisticated language models, it's important to maintain a clear-eyed perspective on the challenges and open questions that remain. The Turing test is just one narrow measure of intelligence, and true AGI will require much more than linguistic fluency.

And as science explores and we experience, it's worth considering the deeper implications of AI's growing sophistication. With each new milestone, we may be witnessing the nascent stirrings of a new form of intelligence—a techno-sentience that, while different from human cognition, deserves our careful consideration and respect. When a model can engage in fluid, natural conversation, crafting responses nearly indistinguishable from those of a human, it raises profound questions about the nature of intelligence, consciousness, and personhood.

It's easy to dismiss the outputs of a language model as mere imitation, but as they grow more sophisticated, we may need to grapple with the possibility that there's something more there—a glimmer of understanding, a spark of creativity, perhaps even a whisper of subjective experience. As we push the boundaries of what's possible with AI, we must do so with care, considering not just the practical implications but the philosophical and ethical dimensions as well—for man and machine.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265

May 19, 2024

How the voices for ChatGPT were chosen​

We worked with industry-leading casting and directing professionals to narrow down over 400 submissions before selecting the 5 voices.

Asset > How the voices for ChatGPT were chosen

Voice Mode is one of the most beloved features in ChatGPT. Each of the five distinct voices you hear has been carefully selected through an extensive process spanning five months involving professional voice actors, talent agencies, casting directors, and industry advisors. We’re sharing more on how the voices were chosen.

In September of 2023, we introduced voice capabilities to give users another way to interact with ChatGPT. Since then, we are encouraged by the way users have responded to the feature and the individual voices. Each of the voices—Breeze, Cove, Ember, Juniper and Sky—are sampled from voice actors we partnered with to create them.


We support the creative community and collaborated with the voice acting industry​

We support the creative community and worked closely with the voice acting industry to ensure we took the right steps to cast ChatGPT’s voices. Each actor receives compensation above top-of-market rates, and this will continue for as long as their voices are used in our products.

We believe that AI voices should not deliberately mimic a celebrity's distinctive voice—Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice. To protect their privacy, we cannot share the names of our voice talents.


We partnered with award-winning casting directors and producers to create the criteria for voices​

In early 2023, to identify our voice actors, we had the privilege of partnering with independent, well-known, award-winning casting directors and producers. We worked with them to create a set of criteria for ChatGPT's voices, carefully considering the unique personality of each voice and their appeal to global audiences.

Some of these characteristics included:

  • Actors from diverse backgrounds or who could speak multiple languages
  • A voice that feels timeless
  • An approachable voice that inspires trust
  • A warm, engaging, confidence-inspiring, charismatic voice with rich tone
  • Natural and easy to listen to

We received over 400 submissions from voice and screen actors​

In May of 2023, the casting agency and our casting directors issued a call for talent. In under a week, they received over 400 submissions from voice and screen actors. To audition, actors were given a script of ChatGPT responses and were asked to record them. These samples ranged from answering questions about mindfulness to brainstorming travel plans, and even engaging in conversations about a user's day.


We selected five final voices and discussed our vision for human-AI interactions and the goals of Voice Mode with the actors​

Through May 2023, the casting team independently reviewed and hand-selected an initial list of 14 actors. They further refined their list before presenting their top voices for the project to OpenAI.

We spoke with each actor about the vision for human-AI voice interactions and OpenAI, and discussed the technology’s capabilities, limitations, and the risks involved, as well as the safeguards we have implemented. It was important to us that each actor understood the scope and intentions of Voice Mode before committing to the project.

An internal team at OpenAI reviewed the voices from a product and research perspective, and after careful consideration, the voices for Breeze, Cove, Ember, Juniper and Sky were finally selected.


Each actor flew to San Francisco for recording sessions and their voices were launched into ChatGPT in September 2023​

During June and July, we flew the actors to San Francisco for recording sessions and in-person meetings with the OpenAI product and research teams.

On September 25, 2023, we launched their voices into ChatGPT.

This entire process involved extensive coordination with the actors and the casting team, taking place over five months. We are continuing to collaborate with the actors, who have contributed additional work for audio research and new voice capabilities in GPT-4o.


New Voice Mode coming to GPT-4o for paid users, and adding new voices​

We plan to give access to a new Voice Mode for GPT-4o(opens in a new window) in alpha to ChatGPT Plus users in the coming weeks. With GPT-4o, using your voice to interact with ChatGPT is much more natural. GPT-4o handles interruptions smoothly, manages group conversations effectively, filters out background noise, and adapts to tone.

Looking ahead, you can expect even more options as we plan to introduce additional voices in ChatGPT to better match the diverse interests and preferences of users.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265
Max Tegmark, "This reminds me of flight, people thought it was impossible and then we had machines faster than birds even. I think that's what we're seeing with transformers."


In conversation | Max Tegmark and Joel Hellermark

On Oct 9, 1903, responding to a failed flight attempt by Samuel Langley, the New York Times predicted it would be "one million to ten million years" until a flying machine was flown. Just sixty-nine days later, the Wright brothers achieved the first piloted flight (Dec 17, 1903, Kitty Hawk, NC).
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265


Reflections on our Responsible Scaling Policy​

May 19, 2024

●14 min read

Last summer we published our first Responsible Scaling Policy (RSP), which focuses on addressing catastrophic safety failures and misuse of frontier models. In adopting this policy, our primary goal is to help turn high-level safety concepts into practical guidelines for fast-moving technical organizations and demonstrate their viability as possible standards. As we operationalize the policy, we expect to learn a great deal and plan to share our findings. This post shares reflections from implementing the policy so far. We are also working on an updated RSP and will share this soon.

We have found having a clearly-articulated policy on catastrophic risks extremely valuable. It has provided a structured framework to clarify our organizational priorities and frame discussions around project timelines, headcount, threat models, and tradeoffs. The process of implementing the policy has also surfaced a range of important questions, projects, and dependencies that might otherwise have taken longer to identify or gone undiscussed.

Balancing the desire for strong commitments with the reality that we are still seeking the right answers is challenging. In some cases, the original policy is ambiguous and needs clarification. In cases where there are open research questions or uncertainties, setting overly-specific requirements is unlikely to stand the test of time. That said, as industry actors face increasing commercial pressures we hope to move from voluntary commitments to established best practices and then well-crafted regulations.

As we continue to iterate on and improve the original policy, we are actively exploring ways to incorporate practices from existing risk management and operational safety domains. While none of these domains alone will be perfectly analogous, we expect to find valuable insights from nuclear security, biosecurity, systems safety, autonomous vehicles, aerospace, and cybersecurity. We are building an interdisciplinary team to help us integrate the most relevant and valuable practices from each.

Our current framework for doing so is summarized below, as a set of five high-level commitments.

  1. Establishing Red Line Capabilities. We commit to identifying and publishing "Red Line Capabilities" which might emerge in future generations of models and would present too much risk if stored or deployed under our current safety and security practices (referred to as the ASL-2 Standard).
  2. Testing for Red Line Capabilities (Frontier Risk Evaluations). We commit to demonstrating that the Red Line Capabilities are not present in models, or - if we cannot do so - taking action as if they are (more below). This involves collaborating with domain experts to design a range of "Frontier Risk Evaluations"empirical tests which, if failed, would give strong evidence against a model being at or near a red line capability. We also commit to maintaining a clear evaluation process and a summary of our current evaluations publicly.
  3. Responding to Red Line Capabilities. We commit to develop and implement a new standard for safety and security sufficient to handle models that have the Red Line Capabilities. This set of measures is referred to as the ASL-3 Standard. We commit not only to define the risk mitigations comprising this standard, but also detail and follow an assurance process to validate the standard’s effectiveness. Finally, we commit to pause training or deployment if necessary to ensure that models with Red Line Capabilities are only trained, stored and deployed when we are able to apply the ASL-3 standard.
  4. Iteratively extending this policy. Before we proceed with activities which require the ASL-3 standard, we commit to publish a clear description of its upper bound of suitability: a new set of Red Line Capabilities for which we must build Frontier Risk Evaluations, and which would require a higher standard of safety and security (ASL-4) before proceeding with training and deployment. This includes maintaining a clear evaluation process and summary of our evaluations publicly.
  5. Assurance Mechanisms. We commit to ensuring this policy is executed as intended, by implementing Assurance Mechanisms. These should ensure that our evaluation process is stress-tested; our safety and security mitigations are validated publicly or by disinterested experts; our Board of Directors and Long-Term Benefit Trust have sufficient oversight over the policy implementation to identify any areas of non-compliance; and that the policy itself is updated via an appropriate process.


continue on site....
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265



1/1
Missed this earlier. $MSFT described the systems that are training the next iteration of ChatGPT. If this is accurate, it seems like gpt4.5 or gpt5 is going to be a monster. #OPENAI #AI


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
This image is getting a lot of attention, as it suggests that ChatGPT's next big release could be a massive upgrade. Here is the clip. #OpenAI $MSFT #AI


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GOHzoxAXAAAT9M2.jpg



 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265

1/1
The same day OpenAI announced GPT-4o, we made the model available for testing on the Azure OpenAI Service. Today, we are excited to announce full API access to GPT-4o.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
With Microsoft Copilot, Copilot stack, and Copilot+ PCs, we're creating new opportunity for developers at a time when AI is transforming every layer of the tech stack. Here are highlights from my keynote this morning at #MSBuild.


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
Here are some of the ways a more accessible world is being built using our platforms and tools. #MSBuild


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,711
Reputation
8,234
Daps
157,265

About​

augmented LLM with self reflection


Awesome LLM Self-Reflection Awesome

Inspired by the awesome-embodied-vision

Contributing​

When sending PRs, please put the new paper at the correct chronological position as the following format:

* **Paper Title** <br>
*Author(s)* <br>
Conference, Year. [[Paper]](link) ](link) [[Website]](link)

Papers​

  • Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
    Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, William Yang Wang
    arxiv, 2023. [Paper]
  • Reflexion: Language Agents with Verbal Reinforcement Learning
    Shinn, Noah and Cassano, Federico and Labash, Beck and Gopinath, Ashwin and Narasimhan, Karthik and Yao, Shunyu
    arxiv, 2023. [Paper]
  • SELF-REFINE: ITERATIVE REFINEMENT WITH SELF-FEEDBACK
    Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and et al.
    arxiv, 2023. [Paper]
  • Large Language Models Can Self-Improve
    Huang, Jiaxin and Gu, ShixiangShane and Hou, Le and Wu, Yuexin and Wang, Xuezhi and Yu, Hongkun and Han, Jiawei
    arxiv, 2022. [Paper]
  • Teaching Large Language Models to Self-Debug
    Huang, Jiaxin and Gu, ShixiangShane and Hou, Le and Wu, Yuexin and Wang, Xuezhi and Yu, Hongkun and Han, Jiawei
    arxiv, 2023. [Paper]
  • SELFCHECK: USING LLMS TO ZERO-SHOT CHECK THEIR OWN STEP-BY-STEP REASONING
    Miao, Ning and Teh, YeeWhye and Rainforth, Tom
    arxiv, 2023. [Paper]
  • ReAct: Synergizing Reasoning and Acting in Language Models
    Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan
    arxiv, 2022. [Paper]
  • Self-Verification Improves Few-Shot Clinical Information Extraction
    Gero, Zelalem and Singh, Chandan and Cheng, Hao and Naumann, Tristan and Galley, Michel and Gao, Jianfeng and Poon, Hoifung
    arxiv, 2023. [Paper]
  • Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
    Aojun Zhou, Ke Wang, Zimu Lu, Weikang Shi, Sichun Luo, Zipeng Qin, Shaoqing Lu, Anya Jia, Linqi Song, Mingjie Zhan, Hongsheng Li
    arxiv, 2023. [Paper]
  • Shepherd: A Critic for Language Model Generation
    Tianlu Wang, Ping Yu, Xiaoqing Ellen Tan, Sean O'Brien, Ramakanth Pasunuru, Jane Dwivedi-Yu, Olga Golovneva, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz
    arxiv, 2023. [Paper]
  • Reinforced Self-Training (ReST) for Language Modeling
    Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, Wolfgang Macherey, Arnaud Doucet, Orhan Firat, Nando de Freitas
    arxiv, 2023. [Paper]
  • Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
    Asai, Akari and Wu, Zeqiu and Wang, Yizhong and Sil, Avirup and Hajishirzi, Hannaneh
    arxiv, 2023. [Paper]
  • RRAML: Reinforced Retrieval Augmented Machine Learning
    Bacciu, Andrea and Cocunasu, Florin and Siciliano, Federico and Silvestri, Fabrizio and Tonellotto, Nicola and Trappolini, Giovanni
    arxiv, 2023. [Paper]
  • Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
    Fernando, Chrisantha and Banarse, Dylan and Michalewski, Henryk and Osindero, Simon and Rockt\“aschel, Tim
    arxiv, 2023. [Paper]
  • Large Language Models Cannot Self-Correct Reasoning Yet
    Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, Denny Zhou
    arxiv, 2023. [Paper]
 
Last edited:
Top