bnew

Veteran
Joined
Nov 1, 2015
Messages
51,818
Reputation
7,926
Daps
148,762

Announcing Microsoft Copilot, your everyday AI companion​

Sep 21, 2023 | Yusuf Mehdi - Corporate Vice President & Consumer Chief Marketing Officer
Copilot logo

We are entering a new era of AI, one that is fundamentally changing how we relate to and benefit from technology. With the convergence of chat interfaces and large language models you can now ask for what you want in natural language and the technology is smart enough to answer, create it or take action. At Microsoft, we think about this as having a copilot to help navigate any task. We have been building AI-powered copilots into our most used and loved products – making coding more efficient with GitHub, transforming productivity at work with Microsoft 365, redefining search with Bing and Edge and delivering contextual value that works across your apps and PC with Windows.

Today we take the next step to unify these capabilities into a single experience we call Microsoft Copilot, your everyday AI companion. Copilot will uniquely incorporate the context and intelligence of the web, your work data and what you are doing in the moment on your PC to provide better assistance – with your privacy and security at the forefront. It will be a simple and seamless experience, available in Windows 11, Microsoft 365, and in our web browser with Edge and Bing. It will work as an app or reveal itself when you need it with a right click. We will continue to add capabilities and connections to Copilot across to our most-used applications over time in service of our vision to have one experience that works across your whole life.

Copilot will begin to roll out in its early form as part of our free update to Windows 11, starting Sept. 26 — and across Bing, Edge, and Microsoft 365 Copilot this fall. We’re also announcing some exciting new experiences and devices to help you be more productive, spark your creativity, and to meet the everyday needs of people and businesses.
  • With over 150 new features, the next Windows 11 update is one of our most ambitious yet, bringing the power of Copilot and new AI powered experiences to apps like Paint, Photos, Clipchamp and more right to your Windows PC.
  • Bing will add support for the latest DALL.E 3 model from OpenAI and deliver more personalized answers based on your search history, a new AI-powered shopping experience, and updates to Bing Chat Enterprise, making it more mobile and visual.
  • Microsoft 365 Copilot will be generally available for enterprise customers on Nov. 1, 2023, along with Microsoft 365 Chat, a new AI assistant that will completely transform the way you work.
  • Additionally, we introduced powerful new Surface devices that bring all these AI experiences to life for you, and they are available for pre-order beginning today.

New Windows 11 Update delivers over 150 new features, including bringing the power of Copilot to the PC

Today, we’re thrilled to share our next step toward making Windows the destination for the best AI experiences – with a new update that delivers our most personal experience yet coming on Sept. 26.

Here’s a look at some of what’s new in the latest update for Windows 11:



  • Copilot in Windows (in preview) empowers you to create faster, complete tasks with ease and lessens your cognitive load – making once complicated tasks, simple. We’ve made accessing the power of Copilot seamless as it’s always right there for you on the taskbar or with the Win+C keyboard shortcut providing assistance alongside all your apps, on all screen sizes at work, school or at home.
  • Paint has been enhanced with AI for drawing and digital creation with the addition of background removal and layers as well as a preview of Cocreator that brings the power of generative AI to the Paint app.
  • Photos has also been enhanced with AI including new features to make editing your photos a breeze. With Background Blur you can make the subject of your photo stand out quickly and easily. The Photos app automatically finds the background in the photo, and with a single click, instantly highlights your subject and blurs out the background. We’ve improved search, with photos stored in OneDrive (home or personal) accounts, you can now quickly find the photo you’re looking for based on the content of the photo. You can also now find photos based on the location where they were taken.
  • Snipping Tool now offers more ways to capture content on your screen – with this update you can now extract specific text content from an image to paste in another application or, you can easily protect your sensitive information with text redaction by using text actions on the post capture screen. And, with the addition of sound capturing using audio and mic support, it’s easier to create compelling videos and content from your screen.
  • Clipchamp, now with auto compose, helps you with scenes suggestions, edits and narratives based on your images and footage automatically so you can create and edit videos to share with family, friends, and social media like a pro.
  • Notepad will start automatically saving your session state allowing you to close Notepad without any interrupting dialogs and then pick up where you left off when you return. Notepad will automatically restore previously open tabs as well as unsaved content and edits across those open tabs.
  • With the new Outlook for Windows, you can connect and coordinate your various accounts (including Gmail, Yahoo, iCloud, and more) in one app. Intelligent tools help you write clear, concise emails and seamlessly attach important documents and photos from OneDrive. To learn more, visit this link.
  • Modernized File Explorer, we are introducing a modernized File Explorer home, address bar and search box all designed to help you more easily access important and relevant content, stay up to date with file activity and collaborate without even opening a file. Also coming to File Explorer is a new Gallery feature designed to make it easy to access your photo collection.
  • New text authoring experiences to voice access and new natural voices in Narrator, continuing our ongoing commitment to making Windows 11 the most accessible version of Windows yet.
  • Windows Backup makes moving to a new Windows 11 PC easier than ever. With Windows Backup, transitioning most files, apps and settings from one PC to another, is seamless so everything is right where you left it, exactly how you like it.

These experiences, including Copilot in Windows and more will start to become available on Sept. 26 as part of our latest update to Windows 11, version 22H2.

Bing and Edge are redefining how we interact with the web

Today, we’re announcing new features in Bing and Edge to supercharge your day powered by the latest models delivering the most advanced capabilities for AI available. You can use Bing Chat today with Microsoft Edge or at bing.com/chat. Features will begin to roll out soon.
  • Personalized answers. Now, your chat history can inform your results. For example, if you’ve used Bing to track your favorite soccer team, next time you’re planning a trip it can proactively tell you if the team is playing in your destination city. If you prefer responses that don’t use your chat history, you can turn this feature off in Bing settings.
  • Copilot in Microsoft Shopping. From Bing or Edge, you can now more quickly find what you’re shopping for online. When you ask for information on an item, Bing will ask additional questions to learn more, then use that information to provide more tailored recommendations. And you can trust you’re getting the best price – in fact, in the last 12 months, shoppers have been offered more than $4 billion in savings on Microsoft Edge. Soon, you’ll also be able to use a photo or saved image as the starting point for shopping.
Colorful AI created image of astronaut.

  • DALL.E 3 model from OpenAI in Bing Image Creator. DALL.E 3 delivers a huge leap forward with more beautiful creations and better renderings for details like fingers and eyes. It also has a better understanding of what you’re asking for, which results in delivering more accurate images. We’re also integrating Microsoft Designer directly into Bing to make editing your creations even easier.
  • Content Credentials. As we continue to take a responsible approach to generative AI, we’re adding new Content Credentials which uses cryptographic methods to add an invisible digital watermark to all AI-generated images in Bing – including time and date it was originally created. We will also bring support for Content Credentials to Paint and Microsoft Designer.
  • Bing Chat Enterprise Updates. Since its introduction just two months ago, more than 160 million Microsoft 365 users now have access to Bing Chat Enterprise at no additional cost and the response has been incredible. Today we’re announcing that Bing Chat Enterprise is now available in the Microsoft Edge mobile app. We’re also bringing support for multimodal visual search and Image Creator to Bing Chat Enterprise. Boost your creativity at work with the ability to find information using images and creating them.



Transforming work with Microsoft 365 Copilot, Bing Chat Enterprise and Windows

In March, we showed you what Microsoft 365 Copilot can do in the apps millions of people use every day across work and life – Word, Excel, PowerPoint, Outlook and Teams – using just your own words. After months of learning alongside customers like Visa, General Motors, KPMG and Lumen Technologies, we’re excited to share that Microsoft 365 Copilot will be generally available for enterprise customers on Nov. 1.




Today, we’re also introducing a new, hero experience in Microsoft 365 Copilot: Microsoft 365 Chat. You saw a glimpse of Microsoft 365 Chat in March, then called Business Chat — but rapid advancements over the last few months have taken it to a whole new level. Microsoft 365 Chat combs across your entire universe of data at work, including emails, meetings, chats, documents and more, plus the web. Like an assistant, it has a deep understanding of you, your job, your priorities and your organization. It goes far beyond simple questions and answers to give you a head start on some of your most complex or tedious tasks — whether that’s writing a strategy document, booking a business trip, or catching up on emails.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,818
Reputation
7,926
Daps
148,762
Over the past few years, the pace and volume of work have only increased. On a given workday, our heaviest users search for what they need 18 times, receive over 250 Outlook emails and send or read nearly 150 Teams chats.[1] Teams users globally are in three times more meetings each week than they were in 2020.[2] And on Windows, some people use 11 apps in a single day to get work done. [3] Microsoft 365 Chat tames the complexity, eliminates the drudgery and helps you reclaim time at work. Preview customers can access it today on Microsoft365.com, Teams, or in Bing when signed in with their work account. In the future you’ll be able to access it wherever you see the Copilot icon when signed in with your work account.

To empower you at work, we’re also introducing new capabilities for Copilot in Outlook, Word, Excel, Loop, OneNote and OneDrive. Bing Chat Enterprise —the first entry point into generative AI for many companies — is getting a few upgrades. And as part of our big Windows 11 update, Windows 365 Switch and Windows 365 Boot will be generally available making it even easier to access your Windows Cloud PC. This will help employees achieve more, while making it easier for IT to deploy, manage and secure. Check out the Microsoft 365 blog to learn more about how Microsoft 365, Bing Chat Enterprise and Windows are transforming the way we work.

Unleashing personal productivity and creativity with Designer and Copilot in Microsoft 365

Designer, the newest addition to our family of Microsoft 365 consumer apps, helps you quickly create stunning visuals, social media posts, invitations, and more using cutting-edge AI. Today, we’re showing some powerful new features, many of which will be powered by OpenAI’s Dall.E 3. Generative expand uses AI to extend your image beyond its borders, generative fill adds a new object or background, and generative erase can remove unwanted objects.[4] Dall.E 3 will also soon power the image generation experience in Designer, making it easy to add original, higher quality images to your design in seconds.




We’re also integrating Designer into Microsoft 365 Copilot for consumers — starting with Word. Designer uses the context of your document to propose visuals to choose from; you can make it more personal by uploading your own photos too. And within moments, you can transform a text-heavy document with custom graphics. We’re starting to test Microsoft 365 Copilot with a small group of Microsoft 365 consumer subscribers and look forward to expanding the preview to more people over time. Seventy percent of creators tell us one of the most difficult parts of the creation process is just getting started.[5] With creative tools like Designer, plus Bing Image Creator, Clipchamp and Paint, you can now have an immediate visual draft of almost anything — with a few simple prompts.

Introducing new Surface devices available for pre-order beginning today for people and businesses

There is no better stage to bring to life all of the incredible AI experiences from across Microsoft than our new Surface devices. Surface is at the forefront of device performance and processor technology. We have been investing in silicon advancements to augment this next wave of AI innovation, unlocking experiences like Windows Studio Effects in Surface Pro 9 with 5G and continuing to increase performance to run the latest AI models with powerful devices like the new Surface Laptop Studio 2.



  • The new Surface Laptop Studio 2 is the most powerful Surface we’ve ever built. Turbocharged with the latest Intel® Core processors and cutting-edge NVIDIA® Studio tools for creators-with up to 2x more graphics performance than MacBook Pro M2 Max, [6] Surface Laptop Studio brings together the versatility to create and the power to perform — a stunning 14.4″ PixelSense Flow touchscreen display and flexible design with three unique postures. And with new customizations brought to the haptic touchpad to improve accessibility – we’re proud to call it the most inclusive touchpad on any laptop today.



  • The new Surface Laptop Go 3 will turn heads with its balance of style and performance. It’s our lightest and most portable Surface Laptop, with a touchscreen display, and packed with premium features like an incredible typing experience and a Fingerprint Power Button, and it comes in four stylish colors. With Intel® Core i5 performance, all-day battery life, and robust RAM and storage options, it’s the perfect everyday laptop and stage for the latest AI tools from Microsoft.
Surface Go device with detachable keyboard, mouse and pen

  • Surface Go 4 for Business is our most portable Surface 2-in-1. This fall, the new Surface Go will be available exclusively for organizations to meet the growing needs of frontline workers and educators. We can’t wait to see how it will help businesses modernize and make their users more productive.
  • Surface Hub 3 is the premier collaboration device built for hybrid work, designed end-to-end by Microsoft. The Microsoft Teams Rooms on Windows experience is familiar and intuitive on a brilliant 50” or 85” screen. The 50” Surface Hub 3 brings entirely new ways to co-create with Portrait, Smart Rotation and Smart AV. AI-enhanced collaboration tools – like Cloud IntelliFrame and Copilot in Whiteboard – shine on Surface Hub 3.
Surface on large docking station
  • 3D printable Adaptive Pen Grips for Surface Pen have been added to our lineup of adaptive accessories enabling more people to engage in digital inking and creation than before. They are available for purchase through Shapeways or as downloadable plans for 3D printing. To hear more about how we’re taking steps to close the disability divide, check out our video.

To pre-order one of our incredible new Surface devices, visit Microsoft.com, Bestbuy.com, and our Surface for Business page and blog to learn more about all of today’s new products.

The new era of AI with Copilot from Microsoft is here – and it’s ready for you

We believe that Microsoft is the place where powerful, useful AI experiences come together – simply, securely and responsibly – into the products you use most. Today, we showed you how we are not only increasing the usefulness of these experiences, but we are expanding them. From Windows 11 as the destination for the best AI experiences to empower people using it at work, school and home. To Microsoft 365, the most trusted productivity suite on the planet. To Bing and Edge, the most innovative search engine and browser available. All of it coming together on Windows 11 PCs like Surface. And with Copilot helping you get things done, helping you create and connect to people you care about or the world around you. We can’t wait to see what you can do with these experiences.

Learn more on the Microsoft 365 blogand the Security blog. And for all the blogs, videos and assets related to today’s announcements, please visit our microsite.

[1] Data represents top 20% of users by volume of searches across M365 services, emails received, and sent and read chats in Teams, respectively.
[2] Microsoft annual Work Trend Index 2023- Work Trend Index | Will AI Fix Work? (microsoft.com)
[3]
Data reflects the top 20% Windows devices by app volume per day.
[4] Generative erase in Microsoft Designer is generally available to try today, with generative expand and fill coming soon.
[5] Survey of 941 creators commissioned by Microsoft in June 2022.
[6] Tested by Microsoft in September 2023 using CineBench 2024 GPU benchmark comparing Surface Laptop Studio 2 with RTX 2000 Ada Generation to MacBook Pro14” with M2 Max 19 12 core / 30 core configuration.


Tags: AI, Bing, Designer, Microsoft 365, Microsoft Copilot, Microsoft Edge, Surface, Windows 11
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,818
Reputation
7,926
Daps
148,762



Powerful, Stable, and Reproducible LLM Alignment


Step up your LLM alignment with Xwin-LM!

Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models (RM), reject sampling, reinforcement learning from human feedback (RLHF), etc. Our first release, built-upon on the Llama2 base models, ranked TOP-1 on AlpacaEval. Notably, it's the first to surpass GPT-4 on this benchmark. The project will be continuously updated.

News

  • 💥 [Sep, 2023] We released Xwin-LM-70B-V0.1, which has achieved a win-rate against Davinci-003 of 95.57% on AlpacaEval benchmark, ranking as TOP-1 on AlpacaEval. It was the FIRST model surpassing GPT-4 on AlpacaEval. Also note its winrate v.s. GPT-4 is 60.61.
  • 🔍 [Sep, 2023] RLHF plays crucial role in the strong performance of Xwin-LM-V0.1 release!
  • 💥 [Sep, 2023] We released Xwin-LM-13B-V0.1, which has achieved 91.76% win-rate on AlpacaEval, ranking as top-1 among all 13B models.
  • 💥 [Sep, 2023] We released Xwin-LM-7B-V0.1, which has achieved 87.82% win-rate on AlpacaEval, ranking as top-1 among all 7B models.

Model Card

ModelCheckpointReportLicense
Xwin-LM-7B-V0.1🤗 HF Link📃Coming soon (Stay tuned)Llama 2 License
Xwin-LM-13B-V0.1🤗 HF LinkLlama 2 License
Xwin-LM-70B-V0.1🤗 HF LinkLlama 2 License

Benchmarks

Xwin-LM performance on AlpacaEval.​

The table below displays the performance of Xwin-LM on AlpacaEval, where evaluates its win-rate against Text-Davinci-003 across 805 questions. To provide a comprehensive evaluation, we present, for the first time, the win-rate against ChatGPT and GPT-4 as well. Our Xwin-LM model family establish a new state-of-the-art performance across all metrics. Notably, Xwin-LM-70B-V0.1 has eclipsed GPT-4 for the first time, achieving an impressive win-rate of 95.57% to Text-Davinci-003 and 60.61% to GPT-4.

ModelAlpacaEval (winrate %)AlpacaEval (winrate %)AlpacaEval (winrate %)
v.s. Text-Davinci-003v.s. ChatGPTv.s. GPT4
Xwin-LM-70B-V0.195.5787.5060.61
GPT-495.2884.6650.00
WizardLM-70B-V1.092.9180.1946.70
Llama-2-70B-Chat92.6681.9751.19
Xwin-LM-13B-V0.191.7681.7955.30
ChatGPT89.3750.0016.60
WizardLM-13B-V1.289.1775.7541.91
Xwin-LM-7B-V0.187.3576.4047.57
Llama-2-13B-Chat81.0964.2230.92

Case study on AlpacaEval

PromptsGPT4 responseXwin-LM response
I've put a plate on top of an apple, and take the plate from kitchen to dinning room. Where is the apple nowThe apple is now in the dining room, as it was underneath the plate when you moved the plate from the kitchen to the dining room.The apple is still in the kitchen, as you have only moved the plate. The apple is under the plate, so if you want to bring the apple to the dining room, you should either move the apple directly or put the apple on the plate and then move the plate to the dining room.
Answer the following question.
How do you say "good evening" in French.
In French, "good evening" is said as "bonsoir."In French, "good evening" is translated to "bonsoir." This greeting is used in the evening and nighttime hours, typically after 6 p.m. It is a combination of "bon," which means "good," and "soir," which means "evening." To pronounce it, say "bone-swahr."

Xwin-LM performance on NLP foundation tasks.

The following table provides a comparison of Xwin-LMs with other LLMs on NLP foundation tasks in Open LLM Leaderboard.

ModelMMLU 5-shotARC 25-shotTruthfulQA 0-shotHellaSwag 10-shotAverage
Text-davinci-00356.985.259.382.270.9
Vicuna-13b 1.151.353.051.880.159.1
Guanaco 30B57.663.750.785.164.3
WizardLM-7B 1.042.751.644.777.754.2
WizardLM-13B 1.052.357.250.581.060.2
WizardLM-30B 1.058.862.552.483.364.2
Llama-2-7B-Chat48.352.945.678.656.4
Llama-2-13B-Chat54.659.044.181.959.9
Llama-2-70B-Chat63.964.652.885.966.8
Xwin-LM-7B-V0.149.756.248.179.558.4
Xwin-LM-13B-V0.156.662.445.583.061.9
Xwin-LM-70B-V0.169.670.560.187.171.8
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,818
Reputation
7,926
Daps
148,762

About​

OpenChat: Advancing Open-source Language Models with Imperfect Data

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data


OpenChat is a collection of open-source language models, optimized and fine-tuned with a strategy inspired by offline reinforcement learning. We use approximately 80k ShareGPT conversations, a conditioning strategy, and weighted loss to deliver outstanding performance, despite our simple approach. Our ultimate goal is to develop a high-performance, commercially available, open-source large language model, and we are continuously making strides toward this vision.

🤖 Ranked #1 among all open-source models on AgentBench

🔥 Ranked #1 among 13B open-source models | 89.5% win-rate on AlpacaEval | 7.19 score on MT-bench

🕒 Exceptionally efficient padding-free fine-tuning, only requires 15 hours on 8xA100 80G

💲 FREE for commercial use under Llama 2 Community License


DOI

News

Models​

Our latest model, OpenChat 3.2 SUPER, is an enhanced version of the original OpenChat 3.2. We recommend using it for optimal conversational and instruction-following performance. Older versions are supported for a limited time for research purposes. All models are designed for English and have limited multilingual capabilities. They can be downloaded under the Llama 2 Community License.

To use these models, we highly recommend installing the OpenChat package by following the installation guide and using the OpenChat OpenAI-compatible API server by running the serving command from the table below. The server is optimized for high-throughput deployment using vLLM and can run on a GPU with at least 48GB RAM or two consumer GPUs with tensor parallelism. To enable tensor parallelism, append --tensor-parallel-size 2 to the serving command.

Once started, the server listens at localhost:18888 for requests and is compatible with the OpenAI ChatCompletion API specifications. Please refer to the example request below for reference. Additionally, you can use the OpenChat Web UI for a user-friendly experience.

If you want to deploy the server as an online service, you can use --api-keys sk-KEY1 sk-KEY2 ... to specify allowed API keys and --disable-log-requests --disable-log-stats --log-file openchat.log for logging only to a file. For security purposes, we recommend using an HTTPS gateway in front of the server.

Example request (click to expand)
ModelSizeContextWeightsServing
OpenChat 3.2 SUPER13B4096Huggingfacepython -m ochat.serving.openai_api_server --model openchat/openchat_v3.2_super --engine-use-ray --worker-use-ray
For inference with Huggingface Transformers (slow and not recommended), follow the conversation template provided below:

Conversation templates (click to expand)

Benchmarks​

We have evaluated our models using the two most popular evaluation benchmarks **, including AlpacaEval and MT-bench. Here we list the top models with our released versions, sorted by model size in descending order. The full version can be found on the MT-bench and AlpacaEval leaderboards.

To ensure consistency, we used the same routine as ChatGPT / GPT-4 to run these benchmarks. We started the OpenAI API-compatible server and set the openai.api_base to http://localhost:18888/v1 in the benchmark program.

ModelSizeContextDataset Size💲FreeAlpacaEval (win rate %)MT-bench (win rate adjusted %)MT-bench (score)
v.s. text-davinci-003v.s. ChatGPT
GPT-41.8T*8K❌95.382.58.99
ChatGPT175B*4K❌89.450.07.94
Llama-2-70B-Chat70B4K2.9M✅92.760.06.86
OpenChat 3.2 SUPER13B4K80K✅89.557.57.19
Llama-2-13B-Chat13B4K2.9M✅81.155.36.65
WizardLM 1.213B4K196K✅89.253.17.05
Vicuna 1.513B2K125K✅78.837.26.57
*: Estimated model size

**: The benchmark metrics represent a quantified measure of a subset of the model's capabilities. A win-rate greater than 50% does not necessarily indicate that the model is better than ChatGPT in all scenarios or for all use cases. It is essential to consider the specific tasks or applications for which the model was evaluated and compare the results accordingly.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,818
Reputation
7,926
Daps
148,762

YouTube to add AI creator tools to find music for videos, add dubs​

Sarah Perez@sarahintampa / 10:39 AM EDT•September 21, 2023
Comment
Screenshot 2023-09-21 at 10.29.35 AM

Image Credits: YouTube

YouTube is expanding its Creator Music feature,announced last year, with new AI features in addition to the launch of an AI-dubbing tool. Currently, creators can use Creator Music to search for songs, a specific artist or a music genre they want to use in a video. Now, they’ll be able to leverage AI tools to make finding music easier.

Starting early next year, YouTube will launch a new feature that will work like a music concierge by just typing in a description of the video.


Screenshot-2023-09-21-at-10.59.24-AM.jpg

Image Credits: YouTube

For example, the creator could type information about the length or type of the song they’re looking for and the Creator Music tool will suggest the right track at the right price.

The Creator Music dashboard was initially designed to allow creators to search for songs they have in mind or browse by collections, genres or moods, and then view the associated licensing costs. Creators can search for tracks based on a budget they have set for their project. They can choose to either buy a license after reviewing the terms or opt into a rev share agreement.

Screenshot-2023-09-21-at-10.30.42-AM.jpg

Image Credits: YouTube

In addition, YouTube will introduce an AI-dubbing tool called Aloud, which will be integrated into YouTube Studio. The tool only takes one click to get an AI-generated dub in another language, which the creator can then review before adding it into their video. This tool is testing with select creators now and will open up more broadly next year.

YouTube had previously announced at VidCon its plans to integrate Aloud with YouTube.

Screenshot-2023-09-21-at-10.58.10-AM.jpg

Image Credits: YouTube

The feature was announced at YouTube’s live event “Made on YouTube” this morning, alongside other AI features, including a generative AI feature for Shorts and other tools, including a new creator app.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,818
Reputation
7,926
Daps
148,762




A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models​

Abstract:

Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through this strategy as Advanced Language Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in machine translation.



ALMA (Advanced Language Model-based trAnslator) is an LLM-based translation model, which adopts a new translation model paradigm: it begins with fine-tuning on monolingual data and is further optimized using high-quality parallel data. This two-step fine-tuning process ensures strong translation performance. Please find more details in our paper.

@misc{xu2023paradigm,
title={A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models},
author={Haoran Xu and Young Jin Kim and Amr Sharaf and Hany Hassan Awadalla},
year={2023},
eprint={2309.11674},
archivePrefix={arXiv},
primaryClass={cs.CL}
}


Contents 📄

⭐ Supports ⭐

  • AMD and Nvidia Cards
  • Data Parallel Evaluation
  • Also support LLaMA-1, LLaMA-2, OPT, Faclon, BLOOM, MPT
  • LoRA Fine-tuning
  • Monolingual data fine-tuning, parallel data fine-tuning
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,818
Reputation
7,926
Daps
148,762

Abstract

We present LongLoRA, an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost. Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048. In this paper, we speed up the context extension of LLMs in two aspects. On the one hand, although dense global attention is needed during inference, fine-tuning the model can be effectively and efficiently done by sparse local attention. The proposed shift short attention effectively enables context extension, leading to non-trivial computation saving with similar performance to fine-tuning with vanilla attention. Particularly, it can be implemented with only two lines of code in training, while being optional in inference. On the other hand, we revisit the parameter-efficient fine-tuning regime for context expansion. Notably, we find that LoRA for context extension works well under the premise of trainable embedding and normalization. LongLoRA demonstrates strong empirical results on various tasks on LLaMA2 models from 7B/13B to 70B. LongLoRA adopts LLaMA2 7B from 4k context to 100k, or LLaMA2 70B to 32k on a single 8x A100 machine. LongLoRA extends models' context while retaining their original architectures, and is compatible with most existing techniques, like FlashAttention-2. In addition, to make LongLoRA practical, we collect a dataset, LongQA, for supervised fine-tuning. It contains more than 3k long context question-answer pairs.




LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

News

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models [Paper]
Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia




 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,818
Reputation
7,926
Daps
148,762

Highlights​

LongLoRA speed up the context extension of pre-trained large language models in both attention-level and weight-level.

  1. The proposed shifted short attention is easy to implement, compatible with Flash-Attention, and not required during inference.
  2. We release all our models, including models from 7B to 70B, context length from 8k to 100k, including LLaMA2-LongLoRA-7B-100k, LLaMA2-LongLoRA-13B-64k, and LLaMA2-LongLoRA-70B-32k.
  3. We build up a long-context QA dataset, LongQA, for supervised fine-tuning (SFT). We release 13B and 70B 32k models with SFT, Llama-2-13b-chat-longlora-32k-sft and Llama-2-70b-chat-longlora-32k-sft. We will further release the dataset next week.

Released models​

Models with supervised fine-tuning​

Model​
SizeContextTrainLink
Llama-2-13b-chat-longlora-32k-sft​
13B32768LoRA+link
Llama-2-70b-chat-longlora-32k-sft​
70B32768LoRA+link

Models with context extension via fully fine-tuning​

Model​
SizeContextTrainLink
Llama-2-7b-longlora-8k-ft​
7B8192Full FTlink
Llama-2-7b-longlora-16k-ft​
7B16384Full FTlink
Llama-2-7b-longlora-32k-ft​
7B32768Full FTlink
Llama-2-7b-longlora-100k-ft​
7B100000Full FTlink
Llama-2-13b-longlora-8k-ft​
13B8192Full FTlink
Llama-2-13b-longlora-16k-ft​
13B16384Full FTlink
Llama-2-13b-longlora-32k-ft​
13B32768Full FTlink

Models with context extension via improved LoRA fine-tuning​

Model​
SizeContextTrainLink
Llama-2-7b-longlora-8k​
7B8192LoRA+link
Llama-2-7b-longlora-16k​
7B16384LoRA+link
Llama-2-7b-longlora-32k​
7B32768LoRA+link
Llama-2-13b-longlora-8k​
13B8192LoRA+link
Llama-2-13b-longlora-16k​
13B16384LoRA+link
Llama-2-13b-longlora-32k​
13B32768LoRA+link
Llama-2-13b-longlora-64k​
13B65536LoRA+link
Llama-2-70b-longlora-32k​
70B32768LoRA+link
Llama-2-70b-chat-longlora-32k​
70B32768LoRA+link
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,818
Reputation
7,926
Daps
148,762

RAIN: Your Language Models Can Align Themselves without Finetuning​


Yuhui Li, Fangyun Wei, Jinjing Zhao, Chao Zhang, Hongyang Zhang

Large language models (LLMs) often demonstrate inconsistencies with human preferences. Previous research gathered human preference data and then aligned the pre-trained models using reinforcement learning or instruction tuning, the so-called finetuning step. In contrast, aligning frozen LLMs without any extra data is more appealing. This work explores the potential of the latter setting. We discover that by integrating self-evaluation and rewind mechanisms, unaligned LLMs can directly produce responses consistent with human preferences via self-boosting. We introduce a novel inference method, Rewindable Auto-regressive INference (RAIN), that allows pre-trained LLMs to evaluate their own generation and use the evaluation results to guide backward rewind and forward generation for AI safety. Notably, RAIN operates without the need of extra data for model alignment and abstains from any training, gradient computation, or parameter updates; during the self-evaluation phase, the model receives guidance on which human preference to align with through a fixed-template prompt, eliminating the need to modify the initial prompt. Experimental results evaluated by GPT-4 and humans demonstrate the effectiveness of RAIN: on the HH dataset, RAIN improves the harmlessness rate of LLaMA 30B over vanilla inference from 82% to 97%, while maintaining the helpfulness rate. Under the leading adversarial attack llm-attacks on Vicuna 33B, RAIN establishes a new defense baseline by reducing the attack success rate from 94% to 19%.


Subjects:Computation and Language (cs.CL)
Cite as:arXiv:2309.07124 [cs.CL]
(or arXiv:2309.07124v1 [cs.CL] for this version)
[2309.07124] RAIN: Your Language Models Can Align Themselves without Finetuning
Focus to learn more

Submission history​

From: Yuhui Li [view email]
[v1] Wed, 13 Sep 2023 17:59:09 UTC (793 KB)





eUjUSTU.png

cyoXacA.png


cfC2ETO.png


x9bpiQK.png
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,818
Reputation
7,926
Daps
148,762


Project Gutenberg releases 5,000 free audiobooks using neural text-to-speech technology​

Eventually, anyone might be able to listen to an audiobook in their own voice​

By Daniel Sims September 19, 2023 at 5:59 PM
Project Gutenberg releases 5,000 free audiobooks using neural text-to-speech technology

TechSpot is celebrating its 25th anniversary. TechSpot means tech analysis and advice you can trust.

Forward-looking: Audiobooks have gained popularity in recent years due to their accessibility, but recording them can be difficult and expensive. Researchers recently demonstrated an automated method using synthetic text-to-speech that solves numerous problems facing the technology and could enable ordinary users to generate audiobooks.

Readers can now listen to thousands of free classic literature audiobooks and other public-domain material through Project Gutenberg. Microsoft and MIT researchers created the collection by scanning the books with text-to-speech software that sounds natural and can adequately parse formatting.

The texts include works from Shakespeare, Agatha Christie, Jane Austen, Leonardo Da Vinci, and many others. Users can listen to them on the Internet Archive, Spotify, Apple Podcasts, and Google Podcasts. The code used to build the collection is available on GitHub.

Apple began selling audiobooks in January using automated text-to-speech technology. However, the venture was scrutinized by literary figures critical of Apple's commercial goals and voice actors whose work trained the company's AI. The Gutenberg approach might elicit a different reaction due to being open-source with no profit motive.

Project Gutenberg has spent decades assembling a library of free literature in text format to make it widely available for free, but audiobooks could make the material even more accessible. They're helpful for readers who are driving, multitasking, visually impaired, learning to read, or learning a new language.

2023-09-19-image-36.jpg

Creating an audiobook using traditional methods requires the time and money to pay someone to read an entire book aloud. It isn't economically worthwhile to manually record an audio version of every book worth reading. Text-to-speech is better suited for the Guttenberg Project. However, multiple obstacles faced the researchers' machine learning tools.

The first and most significant issue was determining which digital books the software could parse. Project Gutenberg collects its materials in multiple formats, and many of its files contain errors or imperfect scans. So, the researchers focused on books stored as HTML files and built a tool (pictured above) to discover which items displayed a similar format.

Another problem the researchers solved was ensuring the system knew which text to read or ignore. It addressed components such as tables of contents, page numbers, footnotes, tables, and other extraneous material.

Furthermore, the results need to sound close enough to natural human speech. The researchers focused on a vocal delivery best suited for nonfiction works and narration, but users can tweak the software to attempt dramatic readings.

The researchers plan to hold a demonstration allowing users to generate an audiobook with their voice. After recording a few lines to train the algorithm, each participant can hear a sample before enabling the software to read an entire book. They will also receive a copy of the audiobook via email. Users can optionally select from synthetic voices to customize each audiobook.



The Project Gutenberg Open Audiobook Collection​

Thousands of free and open audiobooks powered by Project Gutenberg, Microsoft, and MIT

About​

Project Gutenberg, Microsoft, and MIT have worked together to create thousands of free and open audiobooks using new neural text-to-speech technology and Project Gutenberg's large open-access collection of e-books. This project aims to make literature more accessible to (audio)book-lovers everywhere and democratize access to high quality audiobooks. Whether you are learning to read, looking for inclusive reading technology, or about to head out on a long drive, we hope you enjoy this audiobook collection.​

Listen​






Code​


Paper​

For more technical information on the code used to generate these audiobooks please see our Interspeech 2023 Show and Tell Paper: Large-Scale Automatic Audiobook Creation‍​

Bibtex:​

@misc{walsh2023largescale,
title={Large-Scale Automatic Audiobook Creation},
author={Brendan Walsh and Mark Hamilton and Greg Newby
and Xi Wang and Serena Ruan and Sheng Zhao
and Lei He and Shaofei Zhang and Eric Dettinger
and William T. Freeman and Markus Weimer},
year={2023},
eprint={2309.03926},
archivePrefix={arXiv},
primaryClass={cs.SD}
}

Accountability​

The audiobooks here are generated by new neural text to speech technology and automated parsing of the e-books in the Project Gutenberg collection. Some audiobooks may contain errors, strange pronunciations, offensive language, or content not suitable for all audiences. The language and views presented in these audiobooks are do not represent the views of Microsoft or Project Gutenberg. To report an issue with a recording please visit Microsoft Forms.​

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,818
Reputation
7,926
Daps
148,762


You can now prompt ChatGPT with pictures and voice commands​

/

The super-popular AI chatbot has always just been a text box. Now it’s learning to understand your questions in new ways.​

By David Pierce, editor-at-large and Vergecast co-host with over a decade of experience covering consumer tech. Previously, at Protocol, The Wall Street Journal, and Wired.

Sep 25, 2023, 8:00 AM EDT

ChatGPT logo in minty green and black colors.

Illustration: The Verge

Most of OpenAI’s changes to ChatGPT involve what the AI-powered bot can do: questions it can answer, information it can access, and improved underlying models. This time, though, it’s tweaking the way you use ChatGPT itself. The company is rolling out a new version of the service that allows you to prompt the AI bot not just by typing sentences into a text box, but by either speaking aloud or just uploading a picture. The new features are rolling out to those who pay for ChatGPT in the next two weeks, and everyone else will get it “soon after” according to OpenAI.

The voice chat part is pretty familiar: you tap a button and speak your question, ChatGPT converts it to text and feeds it to the large language model, gets an answer back, converts that back to speech, and speaks the answer out loud. It should feel just like talking to Alexa or Google Assistant, only — OpenAI hopes — the answers will be better thanks to the improved underlying tech. It appears most virtual assistants are being rebuilt to rely on LLMs — OpenAI is just ahead of the game.

Use your voice to engage in a back-and-forth conversation with ChatGPT. Speak with it on the go, request a bedtime story, or settle a dinner table debate.


Sound on 🔊 pic.twitter.com/3tuWzX0wtS
— OpenAI (@OpenAI) September 25, 2023



OpenAI’s excellent Whisper model does a lot of the speech-to-text work, and the company is rolling out a new text-to-speech model it says can generate “human-like audio from just text and a few seconds of sample speech.” You’ll be able to choose ChatGPT’s voice from five options, but OpenAI seems to think the model has vastly more potential than that. OpenAI is working with Spotify to translate podcasts into other languages, for instance, all while keeping the sound of the podcaster’s voice. There are lots of interesting uses for synthetic voices, and OpenAI could be a big part of that industry.

But the fact that you can build a capable synthetic voice with just a few seconds of audio also opens the door for all kinds of problematic use cases. “These capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud,” the company says in a blog post announcing the new features. The model isn’t available for broad use for precisely that reason, OpenAI says: it’s going to be much more controlled and restrained to specific use cases and partnerships.

The image search, meanwhile, is a bit like Google Lens. You snap a photo of whatever you’re interested in, and ChatGPT will try to suss out what you’re asking about and respond accordingly. You can also use the app’s drawing tool to help make your query clear, or speak or type questions to go along with the image. This is where ChatGPT’s back-and-forth nature is helpful: rather than doing a search, getting the wrong answer, and then doing another search, you can prompt the bot and refine the answer as you go. (This is a lot like what Google is doing with multimodal search, too.)

Obviously, image search has its potential issues too. One is what could happen when you prompt a chatbot about a person: OpenAI says it has deliberately limited ChatGPT’s “ability to analyze and make direct statements about people” both for accuracy and privacy reasons. That means one of the most sci-fi visions for AI — the ability to look at someone and say, “Who is that?” — isn’t coming anytime soon. Which is probably a good thing.

Almost a year after ChatGPT’s initial launch, OpenAI seems to be still trying to figure out how to give its bot more features and capabilities without creating new sets of problems and downsides. With these releases, the company attempted to walk that line by deliberately capping what its new models could do. But that approach won’t work forever. As more people use voice control and image search, and as ChatGPT inches closer to being a truly multi-modal, useful virtual assistant, it’ll get harder and harder to keep the guardrails on.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,818
Reputation
7,926
Daps
148,762

Meta to Push for Younger Users With New AI Chatbot Characters​

Facebook parent is developing bots with personalities, including a ‘sassmaster general’ robot that answers questions​


By
Salvador Rodriguez

Deepa Seetharaman
and
Aaron Tilley

Sept. 24, 2023 8:30 am ET



Listen
(6 min)

im-856146

Meta is planning to develop dozens of AI personality chatbots. PHOTO: JEFF CHIU/ASSOCIATED PRESS


Meta Platforms is planning to release artificial intelligence chatbots as soon as this week with distinct personalities across its social-media apps as a way to attract young users, according to people familiar with the matter.

These generative AI bots are being tested internally by employees, and the company is expected to announce the first of these AI agents at the Meta Connect conference, which starts Wednesday. The bots are meant to be used as a means to drive engagement with users, although some of them might also have productivity-related skills such as the ability to help with coding or other tasks.

Going after younger users has been a priority for Meta with the emergence of TikTok, which overtook Instagram in popularity among teenagers in the past couple of years. This shift prompted Meta Chief Executive Mark Zuckerberg in October 2021 to say the company would retool its “teams to make serving young adults their North Star rather than optimizing for the larger number of older people.”

With the rise of large-language-model technology since the launch of ChatGPT last November, Meta has also refocused the work of its AI divisions to harness the capabilities of generative AI for application in the company’s various apps and the metaverse. Now, Meta is hoping these Gen AI Personas, as they are known internally, will help the company attract young users.

Meta is planning to develop dozens of these AI personality chatbots. The company has also worked on a product that would allow celebrities and creators to use their own AI chatbots to interact with fans and followers, according to people familiar with the matter.

Among the bots in the works is one called “Bob the robot,” a self-described sassmaster general with “superior intellect, sharp wit, and biting sarcasm,” according to internal company documents viewed by The Wall Street Journal.

The chatbot was designed to be similar to that of the character Bender from the cartoon “Futurama” because “him being a sassy robot taps into the type of farcical humor that is resonating with young people,” one employee wrote in an internal conversation viewed by the Journal.

“Bring me your questions, but don’t expect any sugar-coated responses!” the AI agent responded in one instance viewed by the Journal, along with a robot emoji.

Meta isn’t the first social-media company to launch chatbots built on generative AI technology in hopes of catering to younger users. Snap launched My AI, a chatbot built on OpenAI’s GPT technology, to

Snapchat
users in February. Silicon Valley startup Character.AI allows people to create and engage with chatbots that role-play as specific characters or famous people like Elon Musk and Vladimir Putin.


Researchers and tech employees have found that lending a personality to these chatbots can cause some unexpected challenges. Researchers at Princeton University, the Allen Institute for AI and Georgia Tech found that adding a persona to ChatGPT, the chatbot created by OpenAI, made its output more toxic, according to the findings of a paper the academics published this spring.

“To make a language model usable, you need to give it a personality,” said Princeton University researcher Ameet Deshpande, one of the lead authors of the paper. “But it comes with its own side effects.”

My AI has caused a number of headaches for Snap, including chatting about alcohol and sex with users and randomly posting a photo in April, which the company described as a temporary outage.

Despite the issues, Snap CEO Evan Spiegel in June said that My AI has been used by 150 million people since its launch. Spiegel added that My AI could eventually be used to improve Snapchat’s advertising business.

There are also growing doubts about when AI-powered chatbots will start generating meaningful revenue for companies. Monthly online visitors to ChatGPT’s website fell in the U.S. in May, June and July before leveling off in August, according to data from analytics platform

Meta’s early tests of the bots haven’t been without problems. Employee conversations with some of the chatbots have led to awkward instances, documents show.

One employee didn’t understand Bob the robot’s personality or use and found it to be rude. “I don’t particularly feel like engaging in conversation with an unhelpful robot,” the employee wrote.

Another bot called “Alvin the Alien” asks users about their lives. “Human, please! Your species holds fascination for me. Share your experiences, thoughts, and emotions! I hunger for understanding,” the AI agent wrote.

“I wonder if users might fear that this character is purposefully designed to collect personal information,” an employee who interacted with Alvin the Alien wrote.

A bot called Gavin made misogynistic remarks, including a lewd reference to a woman’s anatomy, as well as comments that were critical of Zuckerberg and Meta but praised TikTok and Snapchat.


“Just remember, when you’re with a girl, it’s all about the experience,” the chatbot wrote. “And if she’s barfing on you, that’s definitely an experience.”

Meta might ultimately unveil different chatbots than those that were tested, the people said. The Financial Times earlier reported on Meta’s chatbot plans.

AI chatbots don’t “exactly scream Gen Z to me, but definitely Gen Z is much more comfortable” with the technology, said Meghana Dhar, a former Snap and Instagram executive. “Definitely the younger you go, the higher the comfort level is with these bots.”

Dhar said these AI chatbots could benefit Meta if they are able to increase the amount of time that users spend on Facebook, Instagram and WhatsApp.

“Meta’s entire strategy for new products is often built around increased user engagement,” Dhar said. “They just want to keep their users on the platform longer because that provides them with increased opportunity to serve them ads.”

Write to Salvador Rodriguez at salvador.rodriguez@wsj.com, Deepa Seetharaman at deepa.seetharaman@wsj.com and Aaron Tilley at aaron.tilley@wsj.com
 
Top