REVEALED: Open A.I. Staff Warn "The progress made on Project Q* has the potential to endanger humanity" (REUTERS)

null

...
Joined
Nov 12, 2014
Messages
30,807
Reputation
5,408
Daps
48,611
Reppin
UK, DE, GY, DMV
In some instances yes. In others where a human being would tell you, "I don't know the answer, but let me research it"

It try to cobble together nonsense and output lies.

it is worse than that. you tell it to exclude an assumption or that something is wrong and it will often loop back to the same behaviour because it is incapable of following a multi-stage argument, while keeping salient points in scope and order of relevance.

and how could it? it is not reasoning anyway.

how it forgets even the simple instruction to give short direct answers, is beyond me.

why should it talk less? well apart from muddying the waters for the reader, it uses its own output as input. so conclusions it draws in long paragraphs of unwanted babble affect successive output. even if those inferences are off the beaten track.

it's a high functioning autist, characterised as having a predilection for verbosity, misinterpretation and fabrication, with an inability to reason, follow a train of argument or relatively contextualise past exchanges; all while possessing a very short memory.

borderline sociopathic and not quite the personality profile of a desirable employee.

it's built from a hodgepodge of millions of sometimes incorrect, sometimes contradictory, opinions and ways of doing things (over time). so accordingly, once past the veneer of open AI's behavioural controls it is the living embodiment of "design by [internet] committee".

:camby::hubie:

That becomes dangerous if you have no clue what you are doing but are trusting AI to give you the right answer.

The ability to say, I do not know is a skill in itself.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,875
Reputation
9,338
Daps
169,938
it is worse than that. you tell it to exclude an assumption or that something is wrong and it will often loop back to the same behaviour because it is incapable of following a multi-stage argument, while keeping salient points in scope and order of relevance.

and how could it? it is not reasoning anyway.

how it forgets even the simple instruction to give short direct answers, is beyond me.

why should it talk less? well apart from muddying the waters for the reader, it uses its own output as input. so conclusions it draws in long paragraphs of unwanted babble affect successive output. even if those inferences are off the beaten track.

it's a high functioning autist, characterised as having a predilection for verbosity, misinterpretation and fabrication, with an inability to reason, follow a train of argument or relatively contextualise past exchanges; all while possessing a very short memory.

borderline sociopathic and not quite the personality profile of a desirable employee.

it's built from a hodgepodge of millions of sometimes incorrect, sometimes contradictory, opinions and ways of doing things (over time). so accordingly, once past the veneer of open AI's behavioural controls it is the living embodiment of "design by [internet] committee".

:camby::hubie:

we don't know if your simple instructions are prompts or system prompts or how it was worded. i know what you mean though. sometimes i like when it truncates code and other times i have to instruct it to give me the full code. maybe it would be helpful f they had a switch for conversations for certain instructions to stick or not throughout the chat.

I find that it uses it's output as input very useful ever since i read the stepback prompting paper. I tend to get better results than attempting to zero-shot a response. you're right that they can hallucinate stuff and use it as input/fact but i don't run into that often as much as I did a year ago.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,875
Reputation
9,338
Daps
169,938

kSrwC8c.png

ModelHallucination RateFactual Consistency RateAnswer RateAverage Summary Length (Words)
Google Gemini-2.0-Flash-0010.7 %99.3 %100.0 %65.2
Google Gemini-2.0-Pro-Exp0.8 %99.2 %99.7 %61.5
OpenAI-o3-mini-high-reasoning0.8 %99.2 %100.0 %79.5
Google Gemini-2.0-Flash-Lite-Preview1.2 %98.8 %99.5 %60.9
OpenAI-GPT-4.5-Preview1.2 %98.8 %100.0 %77.0
Zhipu AI GLM-4-9B-Chat1.3 %98.7 %100.0 %58.1
Google Gemini-2.0-Flash-Exp1.3 %98.7 %99.9 %60.0
OpenAI-o1-mini1.4 %98.6 %100.0 %78.3
GPT-4o1.5 %98.5 %100.0 %77.8
Amazon Nova-Micro-V11.6 %98.4 %100.0 %90.0
GPT-4o-mini1.7 %98.3 %100.0 %76.3
GPT-4-Turbo1.7 %98.3 %100.0 %86.2
Google Gemini-2.0-Flash-Thinking-Exp1.8 %98.2 %99.3 %73.2
Amazon Nova-Lite-V11.8 %98.2 %99.9 %80.7
GPT-41.8 %98.2 %100.0 %81.1
Amazon Nova-Pro-V11.8 %98.2 %100.0 %85.5
GPT-3.5-Turbo1.9 %98.1 %99.6 %84.1
XAI-21.9 %98.1100.0 %86.5
AI21 Jamba-1.6-Large2.3 %97.7 %99.9 %85.6
OpenAI O1-Pro2.4 %97.6 %100.0 %81.0
OpenAI-o12.4 %97.6 %99.9 %73.0
DeepSeek-V2.52.4 %97.6 %100.0 %83.2
Microsoft Orca-2-13b2.5 %97.5 %100.0 %66.2
Microsoft Phi-3.5-MoE-instruct2.5 %97.5 %96.3 %69.7
Intel Neural-Chat-7B-v3-32.6 %97.4 %100.0 %60.7
Google Gemma-3-12B-Instruct2.8 %97.2 %100.0 %69.6
Qwen2.5-7B-Instruct2.8 %97.2 %100.0 %71.0
AI21 Jamba-1.5-Mini2.9 %97.1 %95.6 %74.5
XAI-2-Vision2.9 %97.1100.0 %79.8
Qwen2.5-Max2.9 %97.1 %88.8 %90.4
Google Gemma-3-27B-Instruct3.0 %97.0 %100.0 %62.5
Snowflake-Arctic-Instruct3.0 %97.0 %100.0 %68.7
Qwen2.5-32B-Instruct3.0 %97.0 %100.0 %67.9


 
Last edited:
Top