Large Language Models News & Discussions

bnew · Aug 9, 2024

Qwen2-Audio - a Qwen Collection

Audio-language model series based on Qwen2

huggingface.co

Qwen2-Audio - a Qwen Collection

Audio-language model series based on Qwen2

huggingface.co

Introduction

Qwen2-Audio is the new series of Qwen large audio-language models. Qwen2-Audio is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. We introduce two distinct audio interaction modes:

voice chat: users can freely engage in voice interactions with Qwen2-Audio without text input;
audio analysis: users could provide audio and text instructions for analysis during the interaction;

We release Qwen2-Audio-7B and Qwen2-Audio-7B-Instruct, which are pretrained model and chat model respectively.

For more details, please refer to our Blog, GitHub, and Report.

Requirements

The code of Qwen2-Audio has been in the latest Hugging face transformers and we advise you to build from source with command pip install git+GitHub - huggingface/transformers:

Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX., or you might encounter the following error:

KeyError: 'qwen2-audio'

Quickstart

In the following, we demonstrate how to use Qwen2-Audio-7B-Instruct for the inference, supporting both voice chat and audio analysis modes. Note that we have used the ChatML format for dialog, in this demo we show how to leverage apply_chat_template for this purpose.

Voice Chat Inference

In the voice chat mode, users can freely engage in voice interactions with Qwen2-Audio without text input:

bnew · Aug 9, 2024

bnew · Aug 9, 2024

OpenAI launches experimental GPT-4o Long Output model with 16X token capacity

It is a variation on its signature GPT-4o model from May, but with a massively extended output size: up to 64,000 tokens of output.

venturebeat.com

OpenAI launches experimental GPT-4o Long Output model with 16X token capacity

Carl Franzen@carlfranzen

July 30, 2024 4:07 PM

Vector art of developer in red shirt masculine presentation with short black hair typing on thin curved monitor spilling blue lines of data around him

Credit: VentureBeat made with OpenAI DALL-E 3 via ChatGPT

OpenAI is reportedly eyeing a cash crunch, but that isn’t stopping the preeminent generative AI company from continuing to release a steady stream of new models and updates.

Yesterday, the company quietly posted a webpage announcing a new large language model (LLM): GPT-4o Long Output, which is a variation on its signature GPT-4o model from May, but with a massively extended output size: up to 64,000 tokens of output instead of GPT-4o’s initial 4,000 — a 16-fold increase.

Tokens, as you may recall, refer to the numerical representations of concepts, grammatical constructions, and combinations of letters and numbers organized based on their semantic meaning behind-the-scenes of an LLM.

The word “Hello” is one token, for example, but so too is “hi.” You can see an interactive demo of tokens in action via OpenAI’s Tokenizer here. Machine learning researcher Simon Willison also has a great interactive token encoder/decoder.

By offering a 16X increase in token outputs with the new GPT-4o Long Output variant, OpenAI is now giving users — and more specifically, third-party developers building atop its application programming interface (API) — the opportunity to have the chatbot return far longer responses, up to about a 200-page novel in length.

Why is OpenAI launching a longer output model?

OpenAI’s decision to introduce this extended output capability stems from customer feedback indicating a need for longer output contexts.

An OpenAI spokesperson explained to VentureBeat: “We heard feedback from our customers that they’d like a longer output context. We are always testing new ways we can best serve our customers’ needs.”

The alpha testing phase is expected to last for a few weeks, allowing OpenAI to gather data on how effectively the extended output meets user needs.

This enhanced capability is particularly advantageous for applications requiring detailed and extensive output, such as code editing and writing improvement.

By offering more extended outputs, the GPT-4o model can provide more comprehensive and nuanced responses, which can significantly benefit these use cases.

Distinction between context and output

Already, since launch, GPT-4o offered a maximum 128,000 context window — the amount of tokens the model can handle in any one interaction, including both input and output tokens.

For GPT-4o Long Output, this maximum context window remains at 128,000.

So how is OpenAI able to increase the number of output tokens 16-fold from 4,000 to 64,000 tokens while keeping the overall context window at 128,000?

It call comes down to some simple math: even though the original GPT-4o from May had a total context window of 128,000 tokens, its single output message was limited to 4,000.

Similarly, for the new GPT-4o mini window, the total context is 128,000 but the maximum output has been raised to 16,000 tokens.

That means for GPT-4o, the user can provide up to 124,000 tokens as an input and receive up to 4,000 maximum output from the model in a single interaction. They can also provide more tokens as input but receive fewer as output, while still adding up to 128,000 total tokens.

For GPT-4o mini, the user can provide up to 112,000 tokens as an input in order to get a maximum output of 16,000 tokens back.

For GPT-4o Long Output, the total context window is still capped at 128,000. Yet, now, the user can provide up to 64,000 tokens worth of input in exchange for a maximum of 64,000 tokens back out — that is, if the user or developer of an application built atop it wants to prioritize longer LLM responses while limiting the inputs.

In all cases, the user or developer must make a choice or trade-off: do they want to sacrifice some input tokens in favor of longer outputs while still remaining at 128,000 tokens total? For users who want longer answers, the GPT-4o Long Output now offers this as an option.

Priced aggressively and affordably

The new GPT-4o Long Output model is priced as follows:

$6 USD per 1 million input tokens
$18 per 1 million output tokens

Compare that to the regular GPT-4o pricing which is $5 per million input tokens and $15 per million output, or even the new GPT-4o mini at $0.15 per million input and $0.60 per million output, and you can see it is priced rather aggressively, continuing OpenAI’s recent refrain that it wants to make powerful AI affordable and accessible to wide swaths of the developer userbase.

Currently, access to this experimental model is limited to a small group of trusted partners. The spokesperson added, “We’re conducting alpha testing for a few weeks with a small number of trusted partners to see if longer outputs help their use cases.”

Depending on the outcomes of this testing phase, OpenAI may consider expanding access to a broader customer base.

Future prospects

The ongoing alpha test will provide valuable insights into the practical applications and potential benefits of the extended output model.

If the feedback from the initial group of partners is positive, OpenAI may consider making this capability more widely available, enabling a broader range of users to benefit from the enhanced output capabilities.

Clearly, with the GPT-4o Long Output model, OpenAI hopes to address an even wider range of customer requests and power applications requiring detailed responses.

bnew · Aug 9, 2024

LG unleashes South Korea’s first open-source AI, challenging global tech giants

LG launches Exaone 3.0, South Korea's first open-source AI model, challenging global tech giants and reshaping the AI landscape with improved efficiency and multilingual capabilities.

venturebeat.com

LG unleashes South Korea’s first open-source AI, challenging global tech giants

Michael Nuñez@MichaelFNunez

August 8, 2024 9:45 AM

Credit: VentureBeat made with Midjourney

LG AI Research has launched Exaone 3.0, South Korea’s first open-source artificial intelligence model, marking the country’s entry into the competitive global AI landscape dominated by U.S. tech giants and emerging players from China and the Middle East.

The 7.8 billion parameter model, which excels in both Korean and English language tasks, aims to accelerate AI research and contribute to building a robust AI ecosystem in Korea. This move signals a strategic shift for LG, traditionally known for its consumer electronics, as it positions itself at the forefront of AI innovation. By open-sourcing Exaone 3.0, LG is not only showcasing its technological prowess but also potentially laying the groundwork for a new revenue stream in cloud computing and AI services.

Exaone 3.0 Faces Off Against Chinese and Middle Eastern AI Powerhouses

Exaone 3.0 joins a crowded field of open-source AI models, including China’s Qwen from Alibaba and the UAE’s Falcon. Qwen, which received a major update in June, has gained significant traction with over 90,000 enterprise clients and has topped performance rankings on platforms like Hugging Face, surpassing Meta’s Llama 3.1 and Microsoft’s Phi-3.

Similarly, the UAE’s Technology Innovation Institute released Falcon 2, an 11 billion parameter model in May, claiming it outperforms Meta’s Llama 3 on several benchmarks. These developments highlight the intensifying global competition in AI, with countries beyond the U.S. making significant strides. The emergence of these models from Asia and the Middle East underscores a shift in the AI landscape, challenging the notion of Western dominance in the field.

Open-source strategy: LG’s gambit to boost cloud computing and AI innovation

LG’s approach mirrors that of Chinese companies like Alibaba, which are using open-source AI as a strategy to grow cloud businesses and accelerate commercialization. This strategy serves a dual purpose: it allows LG to rapidly iterate and improve its AI models through community contributions while also creating a potential customer base for its cloud services. By offering a powerful, open-source model, LG could attract developers and enterprises to build applications on its platform, thereby driving adoption of its broader AI and cloud infrastructure.

Exaone 3.0 boasts improved efficiency, with LG claiming a 56% reduction in inference time, 35% decrease in memory usage, and 72% reduction in operational costs compared to its predecessor. These improvements are crucial in the competitive AI landscape, where efficiency can translate directly into cost savings for enterprises and improved user experiences for consumers. The model has been trained on 60 million cases of professional data related to patents, codes, math, and chemistry, with plans to expand to 100 million cases across various fields by year-end, indicating LG’s commitment to creating a versatile and robust AI system.

Exaone 3.0: South Korea’s open-source AI leap into global competition

LG’s move into open-source AI could potentially reshape the AI landscape, offering an alternative to the dominance of deep-pocketed players like OpenAI, Microsoft, and Google. It also demonstrates South Korea’s capability to create state-of-the-art AI models that can compete on a global scale. This development is particularly significant for South Korea, a country known for its technological innovation but which has, until now, been relatively quiet in the open-source AI arena.

The success of Exaone 3.0 could have far-reaching implications. For LG, it could mark a successful diversification into AI and cloud services, potentially opening up new revenue streams. For South Korea, it represents a bold step onto the global AI stage, potentially attracting international talent and investment. On a broader scale, the proliferation of open-source models like Exaone 3.0 could democratize access to advanced AI technologies, fostering innovation across industries and geographies.

As the AI race intensifies, the true measure of Exaone 3.0’s impact will lie not just in its technical specifications, but in its ability to catalyze a thriving ecosystem of developers, researchers, and businesses leveraging its capabilities. The coming months will be crucial in determining whether LG’s ambitious gambit pays off, potentially reshaping the global AI landscape in the process.

bnew · Aug 9, 2024

Meet Prompt Poet: The Google-acquired tool revolutionizing LLM prompt engineering

Prompt Poet potentially offers a look at the future direction of prompt context management across Google’s AI projects, such as Gemini.

venturebeat.com

Meet Prompt Poet: The Google-acquired tool revolutionizing LLM prompt engineering

Michael Trestman

August 8, 2024 5:32 PM

Image Credit: Logitech

In the age of artificial intelligence, prompt engineering is an important new skill for harnessing the full potential of large language models (LLMs). This is the art of crafting complex inputs to extract relevant, useful outputs from AI models like ChatGPT. While many LLMs are designed to be friendly to non-technical users, and respond well to natural-sounding conversational prompts, advanced prompt engineering techniques offer another powerful level of control. These techniques are useful for individual users, and absolutely essential for developers seeking to build sophisticated AI-powered applications.

The Game-Changer: Prompt Poet

Prompt Poet is a groundbreaking tool developed by Character.ai, a platform and makerspace for personalized conversational AIs, which was recently acquired by Google. Prompt Poet potentially offers a look at the future direction of prompt context management across Google’s AI projects, such as Gemini.

Prompt Poet offers several key advantages, and stands out from other frameworks such as Langchain in its simplicity and focus:

Low Code Approach: Simplifies prompt design for both technical and non-technical users, unlike more code-intensive frameworks.
Template Flexibility: Uses YAML and Jinja2 to support complex prompt structures.
Context Management: Seamlessly integrates external data, offering a more dynamic and data-rich prompt creation process.
Efficiency: Reduces time spent on engineering string manipulations, allowing users to focus on crafting optimal prompt text.

This article focuses on the critical concept of context in prompt engineering, specifically the components of instructions and data. We’ll explore how Prompt Poet can streamline the creation of dynamic, data-rich prompts, enhancing the effectiveness of your LLM applications.

The Importance of Context: Instructions and Data

Customizing an LLM application often involves giving it detailed instructions about how to behave. This might mean defining a personality type, a specific situation, or even emulating a historical figure. For instance:

Customizing an LLM application, such as a chatbot, often means giving it specific instructions about how to act. This might mean describing a certain type of personality type, situation, or role, or even a specific historical or fictional person. For example, when asking for help with a moral dilemma, you can ask the model to answer in the style of someone specific, which will very much influence the type of answer you get. Try variations of the following prompt to see how the details (like the people you pick) matter:

Simulate a panel discussion with the philosophers Aristotle, Karl Marx, and Peter Singer. Each should provide individual advice, comment on each other's responses, and conclude. Suppose they are very hungry.

The question: The pizza place gave us an extra pie, should I tell them or should we keep it?

Details matter. Effective prompt engineering also involves creating a specific, customized data context. This means providing the model with relevant facts, like personal user data, real-time information or specialized knowledge, which it wouldn’t have access to otherwise. This approach allows the AI to produce output far more relevant to the user’s specific situation than would be possible for an uninformed generic model.

Efficient Data Management with Prompt Templating

Data can be loaded in manually, just by typing it into ChatGPT. If you ask for advice about how to install some software, you have to tell it about your hardware. If you ask for help crafting the perfect resume, you have to tell it your skills and work history first. However, while this is ok for personal use, it does not work for development. Even for personal use, manually inputting data for each interaction can be tedious and error-prone.

This is where prompt templating comes into play. Prompt Poet uses YAML and Jinja2 to create flexible and dynamic prompts, significantly enhancing LLM interactions.

Example: Daily Planner

To illustrate the power of Prompt Poet, let’s work through a simple example: a daily planning assistant that will remind the user of upcoming events and provide contextual information to help prepare for their day, based on real-time data.

For example, you might want output like this:

Good morning! It looks like you have virtual meetings in the morning and an afternoon hike planned. Don't forget water and sunscreen for your hike since it's sunny outside.

Here are your schedule and current conditions for today:

- **09:00 AM:** Virtual meeting with the marketing team

- **11:00 AM:** One-on-one with the project manager

- **07:00 PM:** Afternoon hike at Discovery Park with friends

It's currently 65°F and sunny. Expect good conditions for your hike. Be aware of a bridge closure on I-90, which might cause delays.

To do that, we’ll need to provide at least two different pieces of context to the model, 1) customized instructions about the task, and 2) the required data to define the factual context of the user interaction.

Prompt Poet gives us some powerful tools for handling this context. We’ll start by creating a template to hold the general form of the instructions, and filling it in with specific data at the time when we want to run the query. For the above example, we might use the following Python code to create a `raw_template` and the `template_data` to fill it, which are the components of a Prompt Poet `Prompt` object.

raw_template = """

- name: system instructions

role: system

content: |

You are a helpful daily planning assistant. Use the following information about the user's schedule and conditions in their area to provide a detailed summary of the day. Remind them of upcoming events and bring any warnings or unusual conditions to their attention, including weather, traffic, or air quality warnings. Ask if they have any follow-up questions.

- name: realtime data

role: system

content: |

Weather in {{ user_city }}, {{ user_country }}:

- Temperature: {{ user_temperature }}°C

- Description: {{ user_description }}

Traffic in {{ user_city }}:

- Status: {{ traffic_status }}

Air Quality in {{ user_city }}:

- AQI: {{ aqi }}

- Main Pollutant: {{ main_pollutant }}

Upcoming Events:

{% for event in events %}

- {{ event.start }}: {{ event.summary }}

{% endfor %}

"""

The code below uses Prompt Poet’s `Prompt` class to populate data from multiple data sources into a template to form a single, coherent prompt. This allows us to invoke a daily planning assistant to provide personalized, context-aware responses. By pulling in weather data, traffic updates, AQI information and calendar events, the model can offer detailed summaries and reminders, enhancing the user experience.

You can clone and experiment with the full working code example, which also implements few-shot learning, a powerful prompt engineering technique that involves presenting the models with a small set of training examples.

# User data

user_weather_info = get_weather_info(user_city)

traffic_info = get_traffic_info(user_city)

aqi_info = get_aqi_info(user_city)

events_info = get_events_info(calendar_events)

template_data = {

"user_city": user_city,

"user_country": user_country,

"user_temperature": user_weather_info["temperature"],

"user_description": user_weather_info["description"],

"traffic_status": traffic_info,

"aqi": aqi_info["aqi"],

"main_pollutant": aqi_info["main_pollutant"],

"events": events_info

}

# Create the prompt using Prompt Poet

prompt = Prompt(

raw_template=raw_template_yaml,

template_data=template_data

)

# Get response from OpenAI

model_response = openai.ChatCompletion.create(

model="gpt-4",

messages=prompt.messages

)

Conclusion

Mastering the fundamentals of prompt engineering, particularly the roles of instructions and data, is crucial for maximizing the potential of LLMs. Prompt Poet stands out as a powerful tool in this field, offering a streamlined approach to creating dynamic, data-rich prompts.

Prompt Poet’s low-code, flexible template system makes prompt design accessible and efficient. By integrating external data sources that would not be available to an LLM’s training, data-filled prompt templates can better ensure AI responses are accurate and relevant to the user.

By using tools like Prompt Poet, you can elevate your prompt engineering skills and develop innovative AI applications that meet diverse user needs with precision. As AI continues to evolve, staying proficient in the latest prompt engineering techniques will be essential.

bnew · Aug 9, 2024

GitHub - hacksider/Deep-Live-Cam: real time face swap and one-click video deepfake with only a single image (uncensored)

real time face swap and one-click video deepfake with only a single image (uncensored) - hacksider/Deep-Live-Cam

github.com

About

real time face swap and one-click video deepfake with only a single image (uncensored)

Topics

ai artificial-intelligence faceswap webcam webcamera deepfake deep-fake ai-face video-deepfake realtime-deepfake deepfake-webcam realtime-face-changer fake-webcam ai-webcam ai-deep-fake real-time-deepfake

Disclaimer

This software is meant to be a productive contribution to the rapidly growing AI-generated media industry. It will help artists with tasks such as animating a custom character or using the character as a model for clothing etc.

The developers of this software are aware of its possible unethical applications and are committed to take preventative measures against them. It has a built-in check which prevents the program from working on inappropriate media including but not limited to nudity, graphic content, sensitive material such as war footage etc. We will continue to develop this project in the positive direction while adhering to law and ethics. This project may be shut down or include watermarks on the output if requested by law.

Users of this software are expected to use this software responsibly while abiding the local law. If face of a real person is being used, users are suggested to get consent from the concerned person and clearly mention that it is a deepfake when posting content online. Developers of this software will not be responsible for actions of end-users.

bnew · Aug 9, 2024

bnew · Aug 9, 2024

https://archive.is/J3p3A

Installing and running MiniCPM Llma3-V-2–5 which like GPT4 performance

openbmb/MiniCPM-Llama3-V-2_5 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

A GPT-4V Level Multimodal LLM on Your Phone

GitHub | Demo | WeChat

News

Pinned

[2024.08.03] MiniCPM-Llama3-V 2.5 technical report is released! See here.
[2024.07.19] MiniCPM-Llama3-V 2.5 supports vLLM now! See here.
[2024.05.28] MiniCPM-Llama3-V 2.5 now fully supports its feature in llama.cpp and ollama! Please pull the latest code of our provided forks (llama.cpp, ollama). GGUF models in various sizes are available here. MiniCPM-Llama3-V 2.5 series is not supported by the official repositories yet, and we are working hard to merge PRs. Please stay tuned! You can visit our GitHub repository for more information!
[2024.05.28] We now support LoRA fine-tuning for MiniCPM-Llama3-V 2.5, using only 2 V100 GPUs! See more statistics here.
[2024.05.23] MiniCPM-V tops GitHub Trending and HuggingFace Trending! Our demo, recommended by Hugging Face Gradio’s official account, is available here. Come and try it out!

[2024.06.03] Now, you can run MiniCPM-Llama3-V 2.5 on multiple low VRAM GPUs(12 GB or 16 GB) by distributing the model's layers across multiple GPUs. For more details, Check this link.
[2024.05.25] MiniCPM-Llama3-V 2.5 now supports streaming outputs and customized system prompts. Try it at here
[2024.05.24] We release the MiniCPM-Llama3-V 2.5 gguf, which supports llama.cpp inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now!
[2024.05.23] We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, multilingual capabilities, and inference efficiency . Click here to view more details.
[2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide efficient inference and simple fine-tuning. Try it now!

Model Summary

MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include:

Leading Performance. MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max and greatly outperforms other Llama 3-based MLLMs.
Strong OCR Capabilities. MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving an 700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro. Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences.
Trustworthy Behavior. Leveraging the latest RLAIF-V method (the newest technology in the RLHF-V [CVPR'24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves 10.3% hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. Data released.
Multilingual Support. Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from VisCPM, MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to over 30 languages including German, French, Spanish, Italian, Korean, Japanese etc. All Supported Languages.
Efficient Deployment. MiniCPM-Llama3-V 2.5 systematically employs model quantization, CPU optimizations, NPU optimizations and compilation optimizations, achieving high-efficiency deployment on edge devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a 150-fold acceleration in multimodal large model end-side image encoding and a 3-fold increase in language decoding speed.
Easy Usage. MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) llama.cpp and ollama support for efficient CPU inference on local devices, (2) GGUF format quantized models in 16 sizes, (3) efficient LoRA fine-tuning with only 2 V100 GPUs, (4) streaming output, (5) quick local WebUI demo setup with Gradio and Streamlit, and (6) interactive demos on HuggingFace Spaces.

Evaluation

Results on TextVQA, DocVQA, OCRBench, OpenCompass MultiModal Avg , MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench.

Evaluation results of multilingual LLaVA Bench

Examples

We deploy MiniCPM-Llama3-V 2.5 on end devices. The demo video is the raw screen recording on a Xiaomi 14 Pro without edition.

Demo

Click here to try out the Demo of MiniCPM-Llama3-V 2.5.

Deployment on Mobile Phone

Coming soon.

GitHub - OpenBMB/MiniCPM-V: MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone - OpenBMB/MiniCPM-V

github.com

bnew · Aug 10, 2024

Paige and Microsoft unveil next-gen AI models for cancer diagnosis

In collaboration with Microsoft, Paige has unveiled the Virchow2 and Virchow2G AI models for cancer pathology.

www.artificialintelligence-news.com

Paige and Microsoft unveil next-gen AI models for cancer diagnosis

Paige and Microsoft unveil next-generation AI models for cancer diagnosis

About the Author

By Muhammad Zulhusni | August 9, 2024
Categories: Artificial Intelligence, cloud, Healthcare, Microsoft,

Paige and Microsoft unveil next-gen AI models for cancer diagnosis

As a tech journalist, Zul focuses on topics including cloud computing, cybersecurity, and disruptive technology in the enterprise industry. He has expertise in moderating webinars and presenting content on video, in addition to having a background in networking technology.

Paige and Microsoft have unveiled the next big breakthrough in clinical AI for cancer diagnosis and treatment: Virchow2 and Virchow2G, enhanced versions of its revolutionary AI models for cancer pathology.

The Virchow2 and Virchow2G models are based on an enormous dataset that Paige has accumulated. Paige has gathered more than three million pathology slides from over 800 labs across 45 countries, on which the models were trained. Such a volume of data is, unsurprisingly, highly beneficial. This data was obtained from over 225,000 patients, all de-identified to create a rich and representative dataset encompassing all genders, races, ethnic groups, and regions across the globe.

What makes these models truly remarkable is their scope. They cover over 40 different tissue types and various staining methods, making them applicable to a wide range of cancer diagnoses. Virchow2G, with its 1.8 billion parameters, stands as the largest pathology model ever created and sets new standards in AI training, scale, and performance.

As Dr. Thomas Fuchs, founder and chief scientist of Paige, comments: “We’re just beginning to tap into what these foundation models can achieve in revolutionising our understanding of cancer through computational pathology.” He believes these models will significantly improve the future for pathologists, and he agrees that this technology is becoming an important step in the progression of diagnostics, targeted medications, and customised patient care.

Similarly, Razik Yousfi, Paige’s senior vice president of technology, states that these models are not only making precision medicine a reality but are also improving the accuracy and efficiency of cancer diagnosis, and pushing the boundaries of what’s possible in pathology and patient care.

So, how is this relevant to cancer diagnosis today? Paige has developed a clinical AI application that pathologists can use to recognise cancer in over 40 tissue types. This tool allows potentially hazardous areas to be identified more quickly and accurately. In other words, the diagnostic process becomes more efficient and less prone to errors, even for rare cancers, with the help of this tool.

Beyond diagnosis, Paige has created AI modules that can benefit life sciences and pharmaceutical companies. These tools can aid in therapeutic targeting, biomarker identification, and clinical trial design, potentially leading to more successful trials and faster development of new therapies.

The good news for researchers is that Virchow2 is available on Hugging Face for non-commercial research, while the entire suite of AI modules is now available for commercial use. This accessibility could accelerate advancements in cancer research and treatment across the scientific community.

In summary, the recently introduced AI models represent a major advancement in the fight against cancer. Paige and Microsoft have chosen the right path by combining the power of data with state-of-the-art AI technologies. These companies have created new opportunities for more accurate cancer prediction, paving the way for tailored solutions and innovative research in oncology.

(Photo by National Cancer Institute)

See also: The hidden climate cost of AI: How tech giants are struggling to go green

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Tags: artificial intelligence, healthcare, microsoft

bnew · Aug 10, 2024

1/3
🖼 Live Portrait

@tost_ai + @runpod_io template

Thanks to KwaiVGI ❤

code: GitHub - KwaiVGI/LivePortrait: Bring portraits to life!

runpod serverless: GitHub - camenduru/live-portrait-i2v-tost

runpod template: GitHub - camenduru/liveportrait-runpod

jupyter: GitHub - camenduru/LivePortrait-jupyter

tost: please try it

Tost AI

2/3

Flux.1[dev] +

Tost Upscaler +

LivePortrait

3/3

Flux.1[dev] +

Tost Upscaler +

LivePortrait

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 10, 2024

1/3
🖼 Flux.1 Dev + XLabs AI - Realism Lora

Jupyter Notebook

Thanks to XLabsAI ❤ @ComfyUI ❤ @Gradio ❤

code: GitHub - XLabs-AI/x-flux

runpod: GitHub - camenduru/flux.1-dev-tost

tost: Tost AI

jupyter: please try it

GitHub - camenduru/flux-jupyter

2/3
Awesome work! I'm really impressed with the Flux.1.

Flux.1 is now available for free to all users in Dzine!

3/3
Good bye stable diffusion I guess

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 10, 2024

1/11
We're excited to release our new research paper: IP Adapter Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts

arxiv: [2408.03209] IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts
project page (with live demo!): IP Adapter Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts

2/11
(Thread) With this model, we expand the existing IP Adapter Plus architecture with an additional clip embedding input, allowing it to refine the representation of the image based on the provided instruction input, allowing you to specify from a range of 10 tasks including:

3/11
Everything (the original IP adapter scheme)
Style (replicating Instant-ID)
Colour (replicating controlnet colour input)
Composition (replicating controlnet scribble)
Face (similar to the original IP adapter faceid model, doesn't require zoomed-in faces)
Pose
Background
Foreground
Object

4/11
Our model uses the same cross attention input scheme as IP-Adapter, so will be easily compatible with existing UIs after adapting it for the modified projection transformer.

5/11
This also means it retains full compatibility with existing models and control methods.

6/11
Not only have we released the weights here:
CiaraRowles/IP-Adapter-Instruct · Hugging Face
But the model also comes with SD3 support! Making it the first IP-Adapter-esque system to come out for SD3 or DIT based models

7/11
Thanks to Simon Donné (especially),
@esx2ve, @danteCIM ,@bostadynamics and @DoctorDukeGonzo
for all the work put in and being an awesome team :D

8/11
This looks amazing! Which model do you use in the hf space btw?

9/11
1.5, less pretty but it's the most tested and trained of the models.

10/11
Awesome, would it work with Flux?

11/11
100% actually just finishing up my implementation to start training.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 10, 2024

1/6
Flux text to image by @bfl_ml is supported in ComfyUI v0.0.4. Try the FP8 weights if you don't have 24GB of VRAM!

Hunyuan model support is also available. More details in our blog.

August 2024: Flux Support, New Frontend, For Loops, and more!

2/6
Good work as usual

3/6
super cool

4/6
Flux text in ComfyUI

5/6
Good evening. Can you clarify if it works with AMD gpus for Windows? Mine is the 7900xtx 24gb.

6/6
I admire your super fast job

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Aug 11, 2024

Forget Midjourney — Flux is the new king of AI image generation and here’s how to get access

Multiple ways to try Flux

www.tomsguide.com

Forget Midjourney — Flux is the new king of AI image generation and here’s how to get access

News

By Ryan Morrison
published August 8, 2024

Multiple ways to try Flux

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Image generated using Flux.01 running on a gaming laptop(Image credit: Flux AI/Future generated)

New AI products and services come in two ways; like a bolt of lightning with no warning, or after months of constant hype. Flux, by startup Black Forest Labs, was the former.

The AI image generation model is being dubbed the rightful heir to Stable Diffusion and it quickly went viral after its release with direct comparisons to market leader Midjourney.

The difference between Flux and Midjourney is that Flux is open-source and can run on a reasonably good laptop. This means it is, or will, also be available on many of the same multi-model platforms like Poe, Nightcafe and FreePik as Stable Diffusion.

I’ve been using it and my initial impressions are that in some areas it is better than Midjourney, especially around rendering people, but its skin textures aren’t as good as Midjourney v6.1.

What is Flux and where did it come from?

Image generated using Flux.01 running on a gaming laptop (Image credit: Flux AI/Future generated)

Flux came from AI startup Black Forest Labs. This new company was founded by some of the people responsible for most modern AI image generation technologies.

The German-based company is led by Robin Rombach, Andreas Blattmann and Dominik Lorenz, all former engineers at Stability AI, along with other leading figures in the development of diffusion-based AI models. This is the technology that also powers many AI video tools.

There are three versions of Flux.01 currently available, all text-to-image models. The first is a Pro version with a commercial license and is mainly used by companies like FreePik to offer its subscribers access to generative AI image technology.

The next two are Dev and Schnell. These are the mid-weight and fast models and in my tests — running on a laptop with an RTX 4090 — they outperform Midjourney, DALL-E and even Ideogram in adherence to the prompt, image quality and text rendering on an image.

The company is also working on a text-to-video model that it promises will offer high-quality output and be available open-source. Branding it: “State-of-the-Art Text to Video for all.”

Where can I use Flux today?

We are excited to announce the launch of Black Forest Labs. Our mission is to develop and advance state-of-the-art generative deep learning models for media and to push the boundaries of creativity, efficiency and diversity. pic.twitter.com/ilcWvJgmsX August 1, 2024

See more

If you have a well-equipped laptop you can download and run Flux.01 locally. There are some easy ways to do this including by using the Pinokio launcher. This makes it relatively trivial to install and run AI models with a couple of clicks and is free to use. It is a large file though.

However, if you’re machine isn’t up to the job there are several websites already offering access to Flux.01 and in some cases, this includes the largest commercial Pro model.

NightCafe, which is one of my favorite AI image platforms, already has access to the model and you could quickly compare that o images from other tools like Ideogram and Stable Diffusion 3.

Poe, the AI model platform, has access to Flux.01 and lets you generate the images in a chatbot-style format similar to creating pictures using tools like ChatGPT and DALL-E.

You can also get access through platforms more typically targeted at developers including Based Labs, Hugging Face and Fal.ai. FreePik, one of the largest AI image platforms on the market says it is also working to bring Flux to its site.

bnew · Aug 11, 2024

I tested Flux vs Midjourney to see which AI image generator is best — here's the winner

Creating hyperrealistic images

www.tomsguide.com

I tested Flux vs Midjourney to see which AI image generator is best — here's the winner

Face-off

By Ryan Morrison
published 9 hours ago

Creating hyperrealistic images

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

AI generated images of a street musician with a guitar created by Flux and Midjourney

(Image credit: Flux / Midjourney)

Jump to:

Flux is an artificial intelligence image generator released by AI startup Black Forest Labs in the past few weeks and it has quickly become one of the most powerful and popular tools of its kind, even giving market leader Midjourney a run for its money.

Unlike Midjourney, which is a closed and paid-for service only available from Midjourney itself, Flux is an open-source model available to download and run locally or on a range of platforms such as Freepik, NightCafe and Hugging Face.

To determine whether Flux has reached Midjourney levels of photorealism and accurate human depiction I’ve come up with 5 descriptive prompts and run them on both. I’m generating Flux images using ComfyUI installed through the Pinokio AI installer.

Creating the prompts

Both Midjourney and Flux benefit from a descriptive prompt. To get exactly what you want out of the model its good to describe not just the person but also the style, lighting and structure.

I’ve included each prompt below for you to try yourself and these should also work with Ideogram, DALL-E 3 in ChatGPT or other AI image platforms if you don’t have Midjourney or Flux but, except Ideogram, none reach the realism of Midjourney or Flux.

1. A chef in the kitchen

Chef image generated by Midjourney (Image credit: Midjourney/Future AI)

Chef image generated by Flux (Image credit: Flux AI image/Future)

The first test combines the need to generate a complex skin texture with a dynamic environment — namely a professional kitchen. The prompt asks for a woman in her mid-50s in the middle of preparing a meal.

It also asks for the depiction of sous chefs in the background and for the chef's name to be shown on a "spotless white double-breasted chef's jacket".

A seasoned chef in her mid-50s is captured in action in a bustling professional kitchen. Her salt-and-pepper hair is neatly tucked under a crisp white chef's hat, with a few strands escaping around her temples. Her face, marked with laugh lines, shows intense concentration as she tastes a sauce from a wooden spoon. Her eyes, a warm brown, narrow slightly as she considers the flavor. The chef is wearing a spotless white double-breasted chef's jacket with her name embroidered in blue on the breast pocket. Black and white checkered pants and slip-resistant clogs complete her professional attire. A colorful array of sauce stains on her apron tells the story of a busy service. Behind her, the kitchen is a hive of activity. Stainless steel surfaces gleam under bright overhead lights, reflecting the controlled chaos of dinner service. Sous chefs in white jackets move purposefully between stations, and steam rises from pots on industrial stoves. Plates of artfully arranged dishes wait on the pass, ready for service. In the foreground, a marble countertop is visible, strewn with fresh herbs and exotic spices. A stack of well-worn cookbooks sits nearby, hinting at the chef's dedication to her craft and continuous learning. The overall scene captures the intensity, precision, and passion of high-end culinary artistry.

Winner: Midjourney

Midjourney wins for the realism of the main character. It isn't perfect and I prefer the dynamism of the Flux image but the challenge is creating accurate humans and Midjourney is closer with better skin texture.

2. A street musician

Street musician image generated by Midjourney (Image credit: Midjourney/Future AI image)

Street musician image generated by Flux (Image credit: Flux AI image/Future)

The next prompt asks both AI image generators to show a street musician in his late 30s performing on a busy city corner lost in the moment of the music.

Part of the prompt requires the inclusion of an appreciative passerby, coins in a guitar case and city life blurring in motion behind the main character.

A street musician in his late 30s is frozen in a moment of passionate performance on a busy city corner. His long, dark dreadlocks are caught mid-sway, some falling over his face while others dance in the air around him. His eyes are closed in deep concentration, brows slightly furrowed, as his weathered hands move deftly over the strings of an old, well-loved acoustic guitar. The musician is wearing a vibrant, hand-knitted sweater that's a patchwork of blues, greens, and purples. It hangs loosely over distressed jeans with artistic patches on the knees. On his feet are scuffed brown leather boots, tapping in rhythm with his music. Multiple colorful braided bracelets adorn his wrists, adding to his bohemian appearance. He stands on a gritty sidewalk, with a battered guitar case open at his feet. It's scattered with coins and bills from appreciative passersby, along with a few fallen autumn leaves. Behind him, city life unfolds in a blur of motion: pedestrians hurry past, yellow taxis honk in the congested street, and neon signs begin to flicker to life as dusk settles over the urban landscape. In the foreground, slightly out of focus, a child tugs on her mother's hand, trying to stop and listen to the music. The scene captures the raw energy and emotion of street performance against the backdrop of a bustling, indifferent city.

Winner: Midjourney

Midjourney wins again for the realism of the character. The texture quality of v6.1 once again puts it just ahead. It is also overall a better image in terms of structure, layout and background.

Large Language Models News & Discussions

Veteran

​

Voice Chat Inference​

Veteran

Veteran

OpenAI launches experimental GPT-4o Long Output model with 16X token capacity​

Why is OpenAI launching a longer output model?​

Distinction between context and output​

Priced aggressively and affordably​

Future prospects​

Veteran

LG unleashes South Korea’s first open-source AI, challenging global tech giants​

Exaone 3.0 Faces Off Against Chinese and Middle Eastern AI Powerhouses​

Open-source strategy: LG’s gambit to boost cloud computing and AI innovation​

Exaone 3.0: South Korea’s open-source AI leap into global competition​

Veteran

​

Meet Prompt Poet: The Google-acquired tool revolutionizing LLM prompt engineering​

​

The Game-Changer: Prompt Poet​

​

The Importance of Context: Instructions and Data​

Efficient Data Management with Prompt Templating​

Example: Daily Planner​

​

Conclusion​

Veteran

About​

Topics​

Disclaimer​

Veteran

Veteran

Installing and running MiniCPM Llma3-V-2–5 which like GPT4 performance​

A GPT-4V Level Multimodal LLM on Your Phone​

​

News​

​

Pinned​

​

Model Summary​

​

Evaluation​

​

Examples​

​

Demo​

​

Deployment on Mobile Phone​

Veteran

Paige and Microsoft unveil next-gen AI models for cancer diagnosis​

About the Author​

Veteran

Veteran

Veteran

Veteran

Veteran

​

Forget Midjourney — Flux is the new king of AI image generation and here’s how to get access​

What is Flux and where did it come from?​

Where can I use Flux today?​

Veteran

I tested Flux vs Midjourney to see which AI image generator is best — here's the winner​

Creating the prompts​

1. A chef in the kitchen​

2. A street musician​

Voice Chat Inference

OpenAI launches experimental GPT-4o Long Output model with 16X token capacity

Why is OpenAI launching a longer output model?

Distinction between context and output

Priced aggressively and affordably

Future prospects

LG unleashes South Korea’s first open-source AI, challenging global tech giants

Exaone 3.0 Faces Off Against Chinese and Middle Eastern AI Powerhouses

Open-source strategy: LG’s gambit to boost cloud computing and AI innovation

Exaone 3.0: South Korea’s open-source AI leap into global competition

Meet Prompt Poet: The Google-acquired tool revolutionizing LLM prompt engineering

The Game-Changer: Prompt Poet

The Importance of Context: Instructions and Data

Efficient Data Management with Prompt Templating

Example: Daily Planner

Conclusion

About

Topics

Disclaimer

Installing and running MiniCPM Llma3-V-2–5 which like GPT4 performance

A GPT-4V Level Multimodal LLM on Your Phone

News

Pinned

Model Summary

Evaluation

Examples

Demo

Deployment on Mobile Phone

Paige and Microsoft unveil next-gen AI models for cancer diagnosis

About the Author

Forget Midjourney — Flux is the new king of AI image generation and here’s how to get access

What is Flux and where did it come from?

Where can I use Flux today?

I tested Flux vs Midjourney to see which AI image generator is best — here's the winner

Creating the prompts

1. A chef in the kitchen

2. A street musician