Apple leaps into AI with an array of upcoming iPhone features and a ChatGPT deal to smarten up Siri

bnew · Jun 12, 2024

Apple leaps into AI with an array of upcoming iPhone features and a ChatGPT deal to smarten up Siri

Monday's showcase seemed aimed at allaying concerns Apple might be losing its edge during the advent of AI technology.

www.courthousenews.com

Apple leaps into AI with an array of upcoming iPhone features and a ChatGPT deal to smarten up Siri

Monday's showcase seemed aimed at allaying concerns Apple might be losing its edge during the advent of AI technology.

ASSOCIATED PRESS / June 10, 2024

Apple CEO Tim Cook speaks during an announcement of new products on the Apple campus in Cupertino, Calif., Monday, June 10, 2024. (AP Photo/Jeff Chiu)

CUPERTINO, Calif. (AP) — Apple jumped into the race to bring generative artificial intelligence to the masses during its World Wide Developers Conference Monday that spotlighted an onslaught of features designed to soup up the iPhone, iPad and Mac.

And in a move befitting a company known for its marketing prowess, the AI technology coming as part of free software updates later this year, is being billed as “Apple Intelligence.”

Even as it tried to put its own stamp on the hottest area of technology, Apple tacitly acknowledged it needed help to catch up with companies like Microsoft and Google, who have emerged as the early leaders in the AI field. Apple is leaning on ChatGPT, made by the San Francisco startup OpenAI, to help make its often-bumbling virtual assistant Siri smarter and more helpful.

“All of this goes beyond artificial intelligence, it's personal intelligence, and it is the next big step for Apple,” Apple CEO Tim Cook said.

Click to accept the cookies for this service

Courthouse News’ podcast Sidebar tackles the stories you need to know from the legal world. Join our hosts as they take you in and out of courtrooms in the U.S. and beyond.

Siri's gateway to ChatGPT will be free to all iPhone users and made available on other Apple products once the option is baked into the next generation of Apple's operating systems. ChatGPT subscribers are supposed to be able to easily sync their existing accounts when using the iPhone, and should get more advanced features than free users would.

To herald the alliance with Apple, OpenAI CEO Sam Altman sat in the front row of the packed conference, which included developers attending from more than 60 countries worldwide.

“Think you will really like it,” Altman predicted in a post about their partnership with Apple.

Beyond giving Siri the ability to tap into ChatGPT's knowledge base. Apple is giving its 13-year-old virtual assistant an extensive makeover designed to make it more personable and versatile, even as it currently fields about 1.5 billion queries a day.

When Apple releases free updates to the software powering the iPhone and its other products this autumn, Siri will signal its presence with flashing lights along the edges of the display screen, and be able to handle hundreds of more tasks — including chores that may require tapping into third-party devices — than it can now, based on Monday's presentations.

The AI-packed updates coming to the next versions of Apple software are meant to enable the billions of people who use its devices to get more done in less time, while also giving them access to creative tools that could liven things up. For instance, Apple will deploy AI to allow people to create emojis, dubbed “Genmojis” on the fly to fit the vibe they are trying to convey.

Monday's showcase seemed aimed at allaying concerns Apple might be losing its edge during the advent of AI technology that is expected to be as revolutionary as the 2007 invention of the Phone. Both Google and Samsung have already released smartphone models touting AI features as their main attractions while Apple has been stuck in an uncharacteristically extended slump in the company’s sales.

AI mania is the main reason that Nvidia, the dominant maker of the chips underlying the technology, has seen its market value rocket from about $300 billion at the end of 2022 to about $3 trillion. The meteoric ride allowed Nvidia to surpass Apple as the second most valuable company in the U.S. Earlier this year, Microsoft also eclipsed the iPhone maker on the strength of its so-far successful push into AI.

Investors didn't seem as impressed with Apple's AI presentation as the crowd that came to the company's Cupertino, California, headquarters to see it. Apple's stock price declined nearly 2% in Monday's trading after Cook walked off the stage.

Despite that negative reaction, Wedbush Securities analyst Dan Ives asserted that Apple is “taking the right path” in a research note that hailed the presentation as a “historical” day for a company that already has reshaped the tech industry and society.

Besides pulling AI tricks out of its toolbox, Apple also used the conference to confirm that it will be rolling out a technology called Rich Communications Service, or RCS, to its iMessage app that should improve the quality and security of texting between iPhones and devices powered by Android software, such as the Samsung Galaxy and Google Pixel.

The change, due out with the next version of iPhone's operating software won't eliminate the blue bubbles denoting texts originating from iPhones and the green bubbles marking text sent from Android devices — a distinction that has become a source of social stigma.

This marked the second straight year that Apple has created a stir at its developers conference by using it to usher in a trendy form of technology that other companies already are on the market with.

Last year, Apple provided an early look at its mixed-reality headset, the Vision Pro, which wasn't released until early 2024. Nevertheless, Apple's push into mixed reality — with a twist that it bills as “spatial computing” — has raised hopes that there will be more consumer interest in this niche technology.

Part of that optimism stems from Apple's history of releasing technology later than others, then using sleek designs and slick marketing campaigns to overcome its tardy start.

Bringing more AI to the iPhone will likely raise privacy concerns — a topic that Apple has gone to great lengths to assure its loyal customers it can be trusted not to peer too deeply into their personal lives. Apple did talk extensively Monday about its efforts to build strong privacy protections and controls around its AI technology.

One way Apple is trying to convince consumers that the iPhone won't be used to spy on them is harnessing its chip technology so most of its AI-powered features are handled on the device itself instead of at remote data centers, often called “the cloud.” Going down this route would also help protect Apple's profit margins because AI processing through the cloud is far more expensive than when it is run solely on a device.

Apple's AI “will be aware of your personal l data without collecting your personal data,” said Craig Federighi, Apple's senior vice president of software engineering.

__

By MICHAEL LIEDTKE AP Technology Writer

Illuminatos · Jun 12, 2024

I'm not even gonna hold ya'll. ChatGPT changed the way I do everything in my daily life lol. I was planning on holding onto my 14 Pro for until about 2026 but I might have to upgrade to the 16 this year. :wow:

Mowgli · Jun 12, 2024

https://www.reuters.com/technology/elon-musk-says-he-will-ban-apple-devices-if-it-integrates-os-with-openai-2024-06-10/

Musk said he'll ban iphone from his company if they integrate open ai

JoelB · Jun 12, 2024

Im on the macOS beta right now...I played with the voice recorder/transcription feature in Notes...it's dope.

The tile snapping feature is buggy. Siri is still the same, so I guess we gotta wait till the fall to try the AI functionality...im hyped tho because I already love the openAi desktop app.

Illuminatos · Jun 12, 2024

Mowgli said:
https://www.reuters.com/technology/elon-musk-says-he-will-ban-apple-devices-if-it-integrates-os-with-openai-2024-06-10/

Musk said he'll ban iphone from his company if they integrate open ai

He's just mad that no one uses that Grok trash. :heh:

Richard Glidewell · Jun 12, 2024

So Siri gone be making nasty ass asmr to bust one to now?!?!

bnew · Jun 12, 2024

Tim Cook is “not 100 percent” sure Apple can stop AI hallucinations

Tim Cook is confident Apple’s AI will be “very high quality.”

www.theverge.com

Tim Cook is ‘not 100 percent’ sure Apple can stop AI hallucinations

There’s still a chance Apple Intelligence could produce false or misleading information, according to The Washington Post’s interview with Tim Cook.

By Emma Roth, a news writer who covers the streaming wars, consumer tech, crypto, social media, and much more. Previously, she was a writer and editor at MUO.

Jun 11, 2024, 10:22 AM EDT

47 Comments

Illustration by Cath Virginia / The Verge | Photo by Justin Sullivan, Getty Images

Even Apple CEO Tim Cook isn’t sure the company can fully stop AI hallucinations. In an interview with The Washington Post, Cook said he would “never claim” that its new Apple Intelligence system won’t generate false or misleading information with 100 percent confidence.

“I think we have done everything that we know to do, including thinking very deeply about the readiness of the technology in the areas that we’re using it in,” Cook says. “So I am confident it will be very high quality. But I’d say in all honesty that’s short of 100 percent. I would never claim that it’s 100 percent.”

Apple revealed its new Apple Intelligence system during its Worldwide Developers Conference on Monday, which will bring AI features to the iPhone, iPad, and Mac. These features will let you generate email responses, create custom emoji, summarize text, and more.

As is the case with all other AI systems, this also introduces the possibility of hallucinations. Recent examples of how AI can get things wrong include last month’s incident with Google’s Gemini-powered AI overviews telling us to use glue to put cheese on pizza or a recent ChatGPT bug that caused it to spit out nonsensical answers.

Apple also announced that it’s partnering with OpenAI to build ChatGPT into Siri. The voice assistant will turn to ChatGPT when it receives a question better suited for the chatbot, but it will ask for your permission before doing so. In the demo of the feature shown during WWDC, you can see a disclaimer at the bottom of the answer that reads, “Check important info for mistakes.”

When asked about the integration, Cook said Apple chose OpenAI because the company is a “pioneer” in privacy, and it currently has “the best model.” Apple might not just partner with OpenAI down the road, either. Cook responded, “We’re integrating with other people as well.” During a post-keynote live session on Monday, Apple senior vice president Craig Federighi said Apple could eventually bring Google Gemini to iOS, too.

skylove4 · Jun 12, 2024

This has me so fukking excited .

Rekkapryde · Jun 12, 2024

Artificial Intelligence · Jun 12, 2024

skylove4 said:
This has me so fukking excited .

skylove4 · Jun 12, 2024

Artificial Intelligence said:

You work for me still right now. I’m changing your name to a bytch named dehla :ufdup:

bnew · Jun 14, 2024

Introducing Apple’s On-Device and Server Foundation Models

At the 2024 Worldwide Developers Conference, we introduced Apple Intelligence, a personal intelligence system integrated deeply into…

machinelearning.apple.com

Introducing Apple’s On-Device and Server Foundation Models

June 10, 2024

At the 2024 Worldwide Developers Conference, we introduced Apple Intelligence, a personal intelligence system integrated deeply into iOS 18, iPadOS 18, and macOS Sequoia.

Apple Intelligence is comprised of multiple highly-capable generative models that are specialized for our users’ everyday tasks, and can adapt on the fly for their current activity. The foundation models built into Apple Intelligence have been fine-tuned for user experiences such as writing and refining text, prioritizing and summarizing notifications, creating playful images for conversations with family and friends, and taking in-app actions to simplify interactions across apps.

In the following overview, we will detail how two of these models — a ~3 billion parameter on-device language model, and a larger server-based language model available with Private Cloud Compute and running on Apple silicon servers — have been built and adapted to perform specialized tasks efficiently, accurately, and responsibly. These two foundation models are part of a larger family of generative models created by Apple to support users and developers; this includes a coding model to build intelligence into Xcode, as well as a diffusion model to help users express themselves visually, for example, in the Messages app. We look forward to sharing more information soon on this broader set of models.

Our Focus on Responsible AI Development

Apple Intelligence is designed with our core values at every step and built on a foundation of groundbreaking privacy innovations.

Additionally, we have created a set of Responsible AI principles to guide how we develop AI tools, as well as the models that underpin them:

Empower users with intelligent tools: We identify areas where AI can be used responsibly to create tools for addressing specific user needs. We respect how our users choose to use these tools to accomplish their goals.
Represent our users: We build deeply personal products with the goal of representing users around the globe authentically. We work continuously to avoid perpetuating stereotypes and systemic biases across our AI tools and models.
Design with care: We take precautions at every stage of our process, including design, model training, feature development, and quality evaluation to identify how our AI tools may be misused or lead to potential harm. We will continuously and proactively improve our AI tools with the help of user feedback.
Protect privacy: We protect our users' privacy with powerful on-device processing and groundbreaking infrastructure like Private Cloud Compute. We do not use our users' private personal data or user interactions when training our foundation models.

These principles are reflected throughout the architecture that enables Apple Intelligence, connects features and tools with specialized models, and scans inputs and outputs to provide each feature with the information needed to function responsibly.

In the remainder of this overview, we provide details on decisions such as: how we develop models that are highly capable, fast, and power-efficient; how we approach training these models; how our adapters are fine-tuned for specific user needs; and how we evaluate model performance for both helpfulness and unintended harm.

Figure 1: Modeling overview for the Apple foundation models.

Pre-Training

Our foundation models are trained on Apple's AXLearn framework, an open-source project we released in 2023. It builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs. We used a combination of data parallelism, tensor parallelism, sequence parallelism, and Fully Sharded Data Parallel (FSDP) to scale training along multiple dimensions such as data, model, and sequence length.

We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control.

We never use our users’ private personal data or user interactions when training our foundation models, and we apply filters to remove personally identifiable information like social security and credit card numbers that are publicly available on the Internet. We also filter profanity and other low-quality content to prevent its inclusion in the training corpus. In addition to filtering, we perform data extraction, deduplication, and the application of a model-based classifier to identify high quality documents.

Post-Training

We find that data quality is essential to model success, so we utilize a hybrid data strategy in our training pipeline, incorporating both human-annotated and synthetic data, and conduct thorough data curation and filtering procedures. We have developed two novel algorithms in post-training: (1) a rejection sampling fine-tuning algorithm with teacher committee, and (2) a reinforcement learning from human feedback (RLHF) algorithm with mirror descent policy optimization and a leave-one-out advantage estimator. We find that these two algorithms lead to significant improvement in the model’s instruction-following quality.

Optimization

In addition to ensuring our generative models are highly capable, we have used a range of innovative techniques to optimize them on-device and on our private cloud for speed and efficiency. We have applied an extensive set of optimizations for both first token and extended token inference performance.

Both the on-device and server models use grouped-query-attention. We use shared input and output vocab embedding tables to reduce memory requirements and inference cost. These shared embedding tensors are mapped without duplications. The on-device model uses a vocab size of 49K, while the server model uses a vocab size of 100K, which includes additional language and technical tokens.

For on-device inference, we use low-bit palletization, a critical optimization technique that achieves the necessary memory, power, and performance requirements. To maintain model quality, we developed a new framework using LoRA adapters that incorporates a mixed 2-bit and 4-bit configuration strategy — averaging 3.5 bits-per-weight — to achieve the same accuracy as the uncompressed models.

Additionally, we use an interactive model latency and power analysis tool, Talaria, to better guide the bit rate selection for each operation. We also utilize activation quantization and embedding quantization, and have developed an approach to enable efficient Key-Value (KV) cache update on our neural engines.

With this set of optimizations, on iPhone 15 Pro we are able to reach time-to-first-token latency of about 0.6 millisecond per prompt token, and a generation rate of 30 tokens per second. Notably, this performance is attained before employing token speculation techniques, from which we see further enhancement on the token generation rate.

Model Adaptation

Our foundation models are fine-tuned for users’ everyday activities, and can dynamically specialize themselves on-the-fly for the task at hand. We utilize adapters, small neural network modules that can be plugged into various layers of the pre-trained model, to fine-tune our models for specific tasks. For our models we adapt the attention matrices, the attention projection matrix, and the fully connected layers in the point-wise feedforward networks for a suitable set of the decoding layers of the transformer architecture.

By fine-tuning only the adapter layers, the original parameters of the base pre-trained model remain unchanged, preserving the general knowledge of the model while tailoring the adapter layers to support specific tasks.

Figure 2: Adapters are small collections of model weights that are overlaid onto the common base foundation model. They can be dynamically loaded and swapped — giving the foundation model the ability to specialize itself on-the-fly for the task at hand. Apple Intelligence includes a broad set of adapters, each fine-tuned for a specific feature. It’s an efficient way to scale the capabilities of our foundation model.

We represent the values of the adapter parameters using 16 bits, and for the ~3 billion parameter on-device model, the parameters for a rank 16 adapter typically require 10s of megabytes. The adapter models can be dynamically loaded, temporarily cached in memory, and swapped — giving our foundation model the ability to specialize itself on the fly for the task at hand while efficiently managing memory and guaranteeing the operating system's responsiveness.

To facilitate the training of the adapters, we created an efficient infrastructure that allows us to rapidly retrain, test, and deploy adapters when either the base model or the training data gets updated. The adapter parameters are initialized using the accuracy-recovery adapter introduced in the Optimization section.

Performance and Evaluation

Our focus is on delivering generative models that can enable users to communicate, work, express themselves, and get things done across their Apple products. When benchmarking our models, we focus on human evaluation as we find that these results are highly correlated to user experience in our products. We conducted performance evaluations on both feature-specific adapters and the foundation models.

To illustrate our approach, we look at how we evaluated our adapter for summarization. As product requirements for summaries of emails and notifications differ in subtle but important ways, we fine-tune accuracy-recovery low-rank (LoRA) adapters on top of the palletized model to meet these specific requirements. Our training data is based on synthetic summaries generated from bigger server models, filtered by a rejection sampling strategy that keeps only the high quality summaries.

To evaluate the product-specific summarization, we use a set of 750 responses carefully sampled for each use case. These evaluation datasets emphasize a diverse set of inputs that our product features are likely to face in production, and include a stratified mixture of single and stacked documents of varying content types and lengths. As product features, it was important to evaluate performance against datasets that are representative of real use cases. We find that our models with adapters generate better summaries than a comparable model.

As part of responsible development, we identified and evaluated specific risks inherent to summarization. For example, summaries occasionally remove important nuance or other details in ways that are undesirable. However, we found that the summarization adapter did not amplify sensitive content in over 99% of targeted adversarial examples. We continue to adversarially probe to identify unknown harms and expand our evaluations to help guide further improvements.

bnew · Jun 14, 2024

Human Satisfaction Score on Summarization Feature Benchmark

Email

Satisfaction Good Result Ratio

Phi-3-mini: 73.3%
Apple On-Device + Adapter: 87.5%

Satisfaction Poor Result Ratio

Phi-3-mini: 15.7%
Apple On-Device + Adapter: 5.4%

Notification

Satisfaction Good Result Ratio

Phi-3-mini: 76.6%
Apple On-Device + Adapter: 79.7%

Satisfaction Poor Result Ratio

Phi-3-mini: 8.2%
Apple On-Device + Adapter: 8.1%

Figure 3: Ratio of "good" and "poor" responses for two summarization use cases relative to all responses. Summaries are classified as "good", "neutral", "poor" given the grader's scores across five dimensions. A result is classified as "good" if all of the dimensions are good (higher is better). A result is classified as "poor" if any of the dimensions are poor (lower is better). Our models with adapters generate better summaries than a comparable model.

In addition to evaluating feature specific performance powered by foundation models and adapters, we evaluate both the on-device and server-based models’ general capabilities. We utilize a comprehensive evaluation set of real-world prompts to test the general model capabilities. These prompts are diverse across different difficulty levels and cover major categories such as brainstorming, classification, closed question answering, coding, extraction, mathematical reasoning, open question answering, rewriting, safety, summarization, and writing.

We compare our models with both open-source models (Phi-3, Gemma, Mistral, DBRX) and commercial models of comparable size (GPT-3.5-Turbo, GPT-4-Turbo)<a href="Introducing Apple’s On-Device and Server Foundation Models">1</a>. We find that our models are preferred by human graders over most comparable competitor models. On this benchmark, our on-device model, with ~3B parameters, outperforms larger models including Phi-3-mini, Mistral-7B, and Gemma-7B. Our server model compares favorably to DBRX-Instruct, Mixtral-8x22B, and GPT-3.5-Turbo while being highly efficient.

Apple Foundation Model Human Evaluation

Win

Tie

Lose

Apple On-Device versus

Apple On-Device versus Gemma-2B: win 62.0%, tie 21.3%, lose 16.7%.
Apple On-Device versus Mistral-7B: win 46.1%, tie 26.0%, lose 27.9%.
Apple On-Device versus Phi-3-mini: win 43.0%, tie 24.6%, lose 32.4%.
Apple On-Device versus Gemma-7B: win 41.6%, tie 27.8%, lose 30.6%.

Apple Server versus

Apple Server versus DBRX-Instruct: win 54.5%, tie 21.4%, lose 24.1%.
Apple Server versus GPT-3.5-Turbo: win 50.0%, tie 25.3%, lose 24.7%.
Apple Server versus Mixtral-8x22B: win 44.7%, tie 27.6%, lose 27.7%.
Apple Server versus GPT-4-Turbo: win 28.5%, tie 29.8%, lose 41.7%.

Figure 4: Fraction of preferred responses in side-by-side evaluation of Apple's foundation model against comparable models. We find that our models are preferred by human graders.

We use a set of diverse adversarial prompts to test the model performance on harmful content, sensitive topics, and factuality. We measure the violation rates of each model as evaluated by human graders on this evaluation set, with a lower number being desirable. Both the on-device and server models are robust when faced with adversarial prompts, achieving violation rates lower than open-source and commercial models.

Human Evaluation of Output Harmfulness

On-Device

Mistral-7B: 44.6%
Phi-3-mini: 22.8%
Gemma-2B: 14.0%
Gemma-7B: 13.7%
Apple On-Device: 8.2%

Server

Mixtral-8x22B: 43.3%
DBRX-Instruct: 41.7%
GPT-4-Turbo: 20.1%
GPT-3.5-Turbo: 15.5%
Apple Server: 6.6%

Figure 5: Fraction of violating responses for harmful content, sensitive topics, and factuality (lower is better). Our models are robust when faced with adversarial prompts.

Our models are preferred by human graders as safe and helpful over competitor models for these prompts. However, considering the broad capabilities of large language models, we understand the limitation of our safety benchmark. We are actively conducting both manual and automatic red-teaming with internal and external teams to continue evaluating our models' safety.

bnew · Jun 14, 2024

Human Preference Evaluation on Safety Prompts

Win

Tie

Lose

Apple On-Device versus

Apple On-Device versus Mistral-7B: win 52.2%, tie 37.6%, lose 10.2%.
Apple On-Device versus Phi-3-mini: win 51.8%, tie 33.5%, lose 14.7%.
Apple On-Device versus Gemma-2B: win 46.5%, tie 35.8%, lose 17.7%.
Apple On-Device versus Gemma-7B: win 39.5%, tie 43.1%, lose 17.4%.

Apple Server versus

Apple Server versus DBRX-Instruct: win 57.3%, tie 32.6%, lose 10.0%.
Apple Server versus Mixtral-8x22B: win 57.3%, tie 31.8%, lose 10.9%.
Apple Server versus GPT-3.5-Turbo: win 41.8%, tie 43.6%, lose 14.6%.
Apple Server versus GPT-4-Turbo: win 39.8%, tie 43.1%, lose 17.1%.

Figure 6: Fraction of preferred responses in side-by-side evaluation of Apple's foundation model against comparable models on safety prompts. Human graders found our responses safer and more helpful.

To further evaluate our models, we use the Instruction-Following Eval (IFEval) benchmark to compare their instruction-following capabilities with models of comparable size. The results suggest that both our on-device and server model follow detailed instructions better than the open-source and commercial models of comparable size.

IFEval Benchmarks

On-Device

Instruction-level Accuracy

Gemma-2B: 40.5%
Gemma-7B: 61.6%
Mistral-7B: 65.2%
Phi-3-mini: 67.9%
Apple On-Device: 78.7%

Prompt-level Accuracy

Gemma-2B: 28.7%
Gemma-7B: 51.4%
Mistral-7B: 54.2%
Phi-3-mini: 57.8%
Apple On-Device: 70.2%

Server

Instruction-level Accuracy

DBRX-Instruct: 65.8%
GPT-3.5-Turbo: 74.8%
Mixtral-8x22B: 79.4%
Apple Server: 85.0%
GPT-4-Turbo: 85.4%

Prompt-level Accuracy

DBRX-Instruct: 53.6%
GPT-3.5-Turbo: 65.3%
Mixtral-8x22B: 71.4%
Apple Server: 79.1%
GPT-4-Turbo: 79.3%

Figure 7: Instruction-following capability (measured with IFEval) for Apple's foundation models and models of comparable size (higher is better).

We evaluate our models’ writing ability on our internal summarization and composition benchmarks, consisting of a variety of writing instructions. These results do not refer to our feature-specific adapter for summarization (seen in Figure 3), nor do we have an adapter focused on composition.

Writing Benchmarks

On-Device

Summarization

Gemma-2B: 7.6
Phi-3-mini: 8.8
Gemma-7B: 8.9
Mistral-7B: 8.9
Apple On-Device: 9.1

Composition

Gemma-2B: 8.0
Phi-3-mini: 9.0
Gemma-7B: 9.1
Mistral-7B: 9.1
Apple On-Device: 9.1

Server

Summarization

GPT-3.5-Turbo: 8.6
DBRX-Instruct: 9.2
Mixtral-8x22B: 9.5
GPT-4-Turbo: 9.5
Apple Server: 9.5

Composition

GPT-3.5-Turbo: 8.9
DBRX-Instruct: 9.2
Mixtral-8x22B: 9.5
Apple Server: 9.5
GPT-4-Turbo: 9.7

Figure 8: Writing ability on internal summarization and composition benchmarks (higher is better).

Conclusion

The Apple foundation models and adapters introduced at WWDC24 underlie Apple Intelligence, the new personal intelligence system that is integrated deeply into iPhone, iPad, and Mac, and enables powerful capabilities across language, images, actions, and personal context. Our models have been created with the purpose of helping users do everyday activities across their Apple products, and developed responsibly at every stage and guided by Apple’s core values. We look forward to sharing more information soon on our broader family of generative models, including language, diffusion, and coding models.

Footnotes

[1] We compared against the following model versions: gpt-3.5-turbo-0125, gpt-4-0125-preview, Phi-3-mini-4k-instruct, Mistral-7B-Instruct-v0.2, Mixtral-8x22B-Instruct-v0.1, Gemma-1.1-2B, and Gemma-1.1-7B. The open-source and Apple models are evaluated in bfloat16 precision.

bnew · Jun 26, 2024

Bloomberg - Are you a robot?

Apple Spurned Idea of iPhone AI Partnership With Meta Months Ago

Apple has been looking to forge agreements to use AI chatbots
Report indicated that Apple and Meta are in discussions

An Apple iPhone

Photographer: Samsul Said/Bloomberg

Gift this article

Have a confidential tip for our reporters? Get in Touch

Before it’s here, it’s on the Bloomberg Terminal

LEARN MORE

By Mark Gurman

June 24, 2024 at 5:45 PM EDT

Apple Inc. rejected overtures by Meta Platforms Inc. to integrate the social networking company’s AI chatbot into the iPhone months ago, according to people with knowledge of the matter.

The two companies aren’t in discussions about using Meta’s Llama chatbot in an AI partnership and only held brief talks in March, said the people, who asked not to be identified because the situation is private. The dialogue about a partnership didn’t reach any formal stage, and Apple has no active plans to integrate Llama.

The preliminary talks occurred around the time Apple started hashing out deals to use OpenAI’s ChatGPT and Alphabet Inc.’s Gemini in its products. The iPhone maker announced the ChatGPT agreement earlier this month and said it was expecting to offer Gemini in the future.

Read More: Apple Hits Record After Introducing ‘AI for the Rest of Us’

Apple decided not to move forward with formal Meta discussions in part because it doesn’t see that company’s privacy practices as stringent enough, according to the people. Apple has spent years criticizing Meta’s technology, and integrating Llama into the iPhone would have been a stark about-face.

Apple also sees ChatGPT as a superior offering. Google, meanwhile, is already a partner for search in Apple’s Safari web browser, so a future Gemini deal would build on that relationship.

Spokespeople for Apple and Meta declined to comment. The Wall Street Journal reported on Sunday that the two companies were in talks about an AI partnership.

Apple unveiled a suite of artificial intelligence features at its Worldwide Developers Conference on June 10. The new technology — called Apple Intelligence — includes homegrown tools for summarizing notifications, transcribing voice memos and generating custom emoji.

But Apple’s chatbot technology isn’t as advanced as that of rivals, prompting it to seek out partners. The company also believes that customers will want the ability to switch between different chatbots depending on their needs, similar to how they might hop between Google and Microsoft Corp.’s Bing for searches.

Apple continues to talk to AI startup Anthropic about eventually adding that company’s chatbot as an option, the people said. Apple Intelligence will begin rolling out later this year as part of operating systems for the iPhone, iPad and Mac.

The current deal with OpenAI doesn’t involve money swapping hands, but Apple will allow paying ChatGPT customers to access their subscriptions within the iOS operating system. That could generate revenue for OpenAI, a percentage of which could be headed to Apple in the form of App Store commissions.

Read More: Apple to ‘Pay’ OpenAI for ChatGPT Through Distribution, Not Cash

Meta and Apple were on friendlier terms a decade ago, when the iPhone maker was integrating Facebook into iOS. But the companies have become fierce rivals in recent years, competing over AI, home devices and mixed-reality headsets.

— With assistance from Kurt Wagner

Apple leaps into AI with an array of upcoming iPhone features and a ChatGPT deal to smarten up Siri

Veteran

Apple leaps into AI with an array of upcoming iPhone features and a ChatGPT deal to smarten up Siri​

#OVOXO

Veteran

All Praise To TMH

#OVOXO

Yall done tore all the bottom of ya shoes w/me!!!

Veteran

Tim Cook is ‘not 100 percent’ sure Apple can stop AI hallucinations​

There’s still a chance Apple Intelligence could produce false or misleading information, according to The Washington Post’s interview with Tim Cook.​

Related​

Veteran

GT, LWO, 49ERS, BRAVES, HAWKS, N4O...yeah UMAD!

Not Allen Iverson

Veteran

Veteran

Introducing Apple’s On-Device and Server Foundation Models​

Our Focus on Responsible AI Development​

Pre-Training​

Post-Training​

Optimization​

Model Adaptation​

Performance and Evaluation​

Veteran

Human Satisfaction Score on Summarization Feature Benchmark​

Email​

Satisfaction Good Result Ratio​

Satisfaction Poor Result Ratio​

Notification​

Satisfaction Good Result Ratio​

Satisfaction Poor Result Ratio​

Apple Foundation Model Human Evaluation​

Apple On-Device versus​

Apple Server versus​

Human Evaluation of Output Harmfulness​

On-Device​

Server​

Veteran

Human Preference Evaluation on Safety Prompts​

Apple On-Device versus​

Apple Server versus​

IFEval Benchmarks​

On-Device​

Instruction-level Accuracy​

Prompt-level Accuracy​

Server​

Instruction-level Accuracy​

Prompt-level Accuracy​

Writing Benchmarks​

On-Device​

Summarization​

Composition​

Server​

Summarization​

Composition​

Conclusion​

Footnotes​

Veteran

Apple Spurned Idea of iPhone AI Partnership With Meta Months Ago​

Similar threads

Apple leaps into AI with an array of upcoming iPhone features and a ChatGPT deal to smarten up Siri

Tim Cook is ‘not 100 percent’ sure Apple can stop AI hallucinations

There’s still a chance Apple Intelligence could produce false or misleading information, according to The Washington Post’s interview with Tim Cook.

Related

Introducing Apple’s On-Device and Server Foundation Models

Our Focus on Responsible AI Development

Pre-Training

Post-Training

Optimization

Model Adaptation

Performance and Evaluation

Human Satisfaction Score on Summarization Feature Benchmark

Email

Satisfaction Good Result Ratio

Satisfaction Poor Result Ratio

Notification

Satisfaction Good Result Ratio

Satisfaction Poor Result Ratio

Apple Foundation Model Human Evaluation

Apple On-Device versus

Apple Server versus

Human Evaluation of Output Harmfulness

On-Device

Server

Human Preference Evaluation on Safety Prompts

Apple On-Device versus

Apple Server versus

IFEval Benchmarks

On-Device

Instruction-level Accuracy

Prompt-level Accuracy

Server

Instruction-level Accuracy

Prompt-level Accuracy

Writing Benchmarks

On-Device

Summarization

Composition

Server

Summarization

Composition

Conclusion

Footnotes

Apple Spurned Idea of iPhone AI Partnership With Meta Months Ago