bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573


Belgium's imec reports breakthroughs with new ASML chip printing machine​


By Toby Sterling

August 7, 20246:26 AM EDTUpdated 3 days ago

Illustration shows ASML logo


A smartphone with a displayed ASML logo is placed on a computer motherboard in this illustration taken March 6, 2023. REUTERS/Dado Ruvic/Illustration/File Photo Purchase Licensing Rights
, opens new tab

AMSTERDAM, Aug 7 (Reuters) - Belgium's imec, one of the world's top semiconductor R&D firms, on Wednesday reported several computer chip-making breakthroughs at a joint laboratory it operates with ASML (ASML.AS)
, opens new tab, using the Dutch company's newest 350 million euro ($382 million) chip printing machine.
imec said it had successfully printed circuitry as small or smaller than the best currently in commercial production, for both logic and memory chips, in a single pass under ASML's new "High NA" tool.

The development suggests leading chipmakers will be able to use the tool as planned in the coming several years to make generations of smaller, faster chips.

High NA will be "highly instrumental to continue the dimensional scaling of logic and memory technologies," imec CEO Luc Van den Hove said in a statement.
imec noted that many other chemicals and tools needed for the rest of the chipmaking process had been used for the tests and appear to be falling into place for commercial manufacturing.

ASML is the biggest supplier of equipment to computer chip makers, thanks to its dominance in lithography systems - huge machines that use beams of light to help create circuitry.

The High NA tool's ability to print smaller features in fewer steps should save chipmakers money and help justify the tool's lofty price tag.

Reuters reported on Monday Intel is purchasing the first two High NA tools, with a third expected to go to TSMC (2330.TW)
, opens new tab - which makes chips for Nvidia (NVDA.O), opens new tab and Apple (AAPL.O)
, opens new tab - later this year.

"A second tool is required for the volume of wafers and experiments needed to support a development line,” Intel director of lithography Mark Philips told Reuters in an email.

Other chipmakers that have ordered a High NA tool include Samsung Electronics (005930.KS)
, opens new tab, and memory specialists SK Hynix (000660.KS), opens new tab and Micron (MU.O)
, opens new tab.
($1 = 0.9163 euros)



 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573




Installing and running MiniCPM Llma3-V-2–5 which like GPT4 performance​




A GPT-4V Level Multimodal LLM on Your Phone​

GitHub | Demo | WeChat

News​

📌 Pinned​



  • [2024.08.03] MiniCPM-Llama3-V 2.5 technical report is released! See here.

  • [2024.07.19] MiniCPM-Llama3-V 2.5 supports vLLM now! See here.

  • [2024.05.28] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 now fully supports its feature in llama.cpp and ollama! Please pull the latest code of our provided forks (llama.cpp, ollama). GGUF models in various sizes are available here. MiniCPM-Llama3-V 2.5 series is not supported by the official repositories yet, and we are working hard to merge PRs. Please stay tuned! You can visit our GitHub repository for more information!

  • [2024.05.28] 💫 We now support LoRA fine-tuning for MiniCPM-Llama3-V 2.5, using only 2 V100 GPUs! See more statistics here.

  • [2024.05.23] 🔥🔥🔥 MiniCPM-V tops GitHub Trending and HuggingFace Trending! Our demo, recommended by Hugging Face Gradio’s official account, is available here. Come and try it out!


  • [2024.06.03] Now, you can run MiniCPM-Llama3-V 2.5 on multiple low VRAM GPUs(12 GB or 16 GB) by distributing the model's layers across multiple GPUs. For more details, Check this link.

  • [2024.05.25] MiniCPM-Llama3-V 2.5 now supports streaming outputs and customized system prompts. Try it at here

  • [2024.05.24] We release the MiniCPM-Llama3-V 2.5 gguf, which supports llama.cpp inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now!

  • [2024.05.23] 🔍 We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, multilingual capabilities, and inference efficiency 🌟📊🌍🚀. Click here to view more details.

  • [2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide efficient inference and simple fine-tuning. Try it now!

Model Summary​


MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include:


  • 🔥 Leading Performance. MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max and greatly outperforms other Llama 3-based MLLMs.

  • 💪 Strong OCR Capabilities. MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving an 700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro. Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences.

  • 🏆 Trustworthy Behavior. Leveraging the latest RLAIF-V method (the newest technology in the RLHF-V [CVPR'24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves 10.3% hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. Data released.

  • 🌏 Multilingual Support. Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from VisCPM, MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to over 30 languages including German, French, Spanish, Italian, Korean, Japanese etc. All Supported Languages.

  • 🚀 Efficient Deployment. MiniCPM-Llama3-V 2.5 systematically employs model quantization, CPU optimizations, NPU optimizations and compilation optimizations, achieving high-efficiency deployment on edge devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a 150-fold acceleration in multimodal large model end-side image encoding and a 3-fold increase in language decoding speed.

  • 💫 Easy Usage. MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) llama.cpp and ollama support for efficient CPU inference on local devices, (2) GGUF format quantized models in 16 sizes, (3) efficient LoRA fine-tuning with only 2 V100 GPUs, (4) streaming output, (5) quick local WebUI demo setup with Gradio and Streamlit, and (6) interactive demos on HuggingFace Spaces.




Evaluation​


Results on TextVQA, DocVQA, OCRBench, OpenCompass MultiModal Avg , MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench.

v2KE3wqQgM05ZW3dH2wbx.png

Evaluation results of multilingual LLaVA Bench

llavabench_compare.png




Examples​


cases_all.png

We deploy MiniCPM-Llama3-V 2.5 on end devices. The demo video is the raw screen recording on a Xiaomi 14 Pro without edition.

ticket.gif

meal_plan.gif


1-4.gif

Demo​


Click here to try out the Demo of MiniCPM-Llama3-V 2.5.

Deployment on Mobile Phone​


Coming soon.

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573


Paige and Microsoft unveil next-gen AI models for cancer diagnosis​


Paige and Microsoft unveil next-generation AI models for cancer diagnosis


About the Author​

By Muhammad Zulhusni | August 9, 2024
Categories: Artificial Intelligence, cloud, Healthcare, Microsoft,

Paige and Microsoft unveil next-gen AI models for cancer diagnosis
As a tech journalist, Zul focuses on topics including cloud computing, cybersecurity, and disruptive technology in the enterprise industry. He has expertise in moderating webinars and presenting content on video, in addition to having a background in networking technology.

Paige and Microsoft have unveiled the next big breakthrough in clinical AI for cancer diagnosis and treatment: Virchow2 and Virchow2G, enhanced versions of its revolutionary AI models for cancer pathology.

The Virchow2 and Virchow2G models are based on an enormous dataset that Paige has accumulated. Paige has gathered more than three million pathology slides from over 800 labs across 45 countries, on which the models were trained. Such a volume of data is, unsurprisingly, highly beneficial. This data was obtained from over 225,000 patients, all de-identified to create a rich and representative dataset encompassing all genders, races, ethnic groups, and regions across the globe.

What makes these models truly remarkable is their scope. They cover over 40 different tissue types and various staining methods, making them applicable to a wide range of cancer diagnoses. Virchow2G, with its 1.8 billion parameters, stands as the largest pathology model ever created and sets new standards in AI training, scale, and performance.

As Dr. Thomas Fuchs, founder and chief scientist of Paige, comments: “We’re just beginning to tap into what these foundation models can achieve in revolutionising our understanding of cancer through computational pathology.” He believes these models will significantly improve the future for pathologists, and he agrees that this technology is becoming an important step in the progression of diagnostics, targeted medications, and customised patient care.

Similarly, Razik Yousfi, Paige’s senior vice president of technology, states that these models are not only making precision medicine a reality but are also improving the accuracy and efficiency of cancer diagnosis, and pushing the boundaries of what’s possible in pathology and patient care.

So, how is this relevant to cancer diagnosis today? Paige has developed a clinical AI application that pathologists can use to recognise cancer in over 40 tissue types. This tool allows potentially hazardous areas to be identified more quickly and accurately. In other words, the diagnostic process becomes more efficient and less prone to errors, even for rare cancers, with the help of this tool.

Beyond diagnosis, Paige has created AI modules that can benefit life sciences and pharmaceutical companies. These tools can aid in therapeutic targeting, biomarker identification, and clinical trial design, potentially leading to more successful trials and faster development of new therapies.

The good news for researchers is that Virchow2 is available on Hugging Face for non-commercial research, while the entire suite of AI modules is now available for commercial use. This accessibility could accelerate advancements in cancer research and treatment across the scientific community.

In summary, the recently introduced AI models represent a major advancement in the fight against cancer. Paige and Microsoft have chosen the right path by combining the power of data with state-of-the-art AI technologies. These companies have created new opportunities for more accurate cancer prediction, paving the way for tailored solutions and innovative research in oncology.

(Photo by National Cancer Institute)

See also: The hidden climate cost of AI: How tech giants are struggling to go green

Want to learn more about AI and big data from industry leaders?
Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Tags: artificial intelligence, healthcare, microsoft
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573

1/5
Google DeepMind has developed an AI-powered robot that's ready to play table tennis. 🤖🏓

This is the first agent to achieve amateur human-level performance in the sport.

Curious about how it works? Let's dive in. 🧵
/search?q=#AI /search?q=#TableTennis /search?q=#Robotics /search?q=#Innovation /search?q=#OpenAI /search?q=#GPT-4o

2/5
Mind boggling stuff. AI evolving rapidly. Fascinating yet unnerving.

3/5
AI Agents will change everything! Devin AI is at the forefront here.

Guess what, it created and marketed the first product in the world without human intervention. Wana have an exposure to this?

/search?q=#devin from @1stSolanaAICoin

Mcap today: ~USD 2m 🤩

4/5
AI Agents will change everything! Devin AI is at the forefront here.

Guess what, it created and marketed the first product in the world without human intervention. Wana have an exposure to this?

/search?q=#devin from @1stSolanaAICoin

Mcap today: ~USD 2m 🤩

5/5
Eu teria pra jogar pingpong :smile:


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573


1/3
🖼 Live Portrait 🔥 @tost_ai + @runpod_io template🥳

Thanks to KwaiVGI ❤

🧬code: GitHub - KwaiVGI/LivePortrait: Bring portraits to life!
🍇runpod serverless: GitHub - camenduru/live-portrait-i2v-tost
🍇runpod template: GitHub - camenduru/liveportrait-runpod
🍊jupyter: GitHub - camenduru/LivePortrait-jupyter
🥪tost: please try it 🐣 Tost AI

2/3
👑 Flux.1[dev] + 🍞 Tost Upscaler + 📺 LivePortrait

3/3
👑 Flux.1[dev] + 🍞 Tost Upscaler + 📺 LivePortrait


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573

1/3
🖼 Flux.1 Dev + XLabs AI - Realism Lora 🔥 Jupyter Notebook 🥳

Thanks to XLabsAI ❤ @ComfyUI ❤ @Gradio ❤

🧬code: GitHub - XLabs-AI/x-flux
🍇runpod: GitHub - camenduru/flux.1-dev-tost
🥪tost: Tost AI
🍊jupyter: please try it 🐣 GitHub - camenduru/flux-jupyter

2/3
Awesome work! I'm really impressed with the Flux.1. 🥳Flux.1 is now available for free to all users in Dzine!

3/3
Good bye stable diffusion I guess


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573










1/11
For too long, users have lived under the software lottery tyranny of fused attention implementations.

No longer.

Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch.
FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention
1/10

2/11
Although the original causal attention is still popular, many variants have shown up since. Alibi, SWA, Softcapping, PrefixLM, document masking, etc. Even worse, folks often want combinations of these!

Today, each one requires a special kernel: https://nitter.poast.org/andersonbcdefg/status/1800907703688339569

2/10

3/11
FlexAttention solves this by accepting an user-defined function, score_mod. score_mod is given the "attention score" of two tokens as well as the "position" of this score.

We then use torch.compile to generate a fused attention kernel (forwards and backwards!)

3/10

4/11
This API is surprisingly expressive! In particular, it also allows loading from *captured* tensors (i.e. tensors that aren't directly passed in).

For example, this is how you implement Alibi or tanh soft-capping.

You can also mask out values by returning -inf.

4/10

5/11
Although masking can be implemented with score_mod, masking also has an additional property - we can completely skip computing masked values!

Thus, our kernel also supports blocksparsity based off a mask_mod. For example, for causal masking this results in a ~2x speedup.

5/10

6/11
But of course, you're not just limited to causal masking. Sliding Window Attention can also be trivially implemented with this API.

As FlexAttention can leverage this further sparsity , this is ~20x faster than F.sdpa and 3x faster than FA2 with a causal mask.

6/10

7/11
Just like with `score_mod`, `mask_mod` also supports loading from closed-over inputs. For example, PrefixLM can be implemented like so.

Notably, we do not need to recompile when prefix_length changes (although we do need to recompute BlockMask).

7/10

8/11
Since we're leveraging torch.compile's robust graph-capture mechanism, we can also handle higher order transforms of mask_mods. For example, let's say you had a prefixlm mask_mod - you can automatically convert that one into one that works with jagged sequences!

8/10

9/11
And, of course, FlexAttention is performant as well. In our benchmarks, it's within 10% of FA2's performance on Ampere, and about 25% of FA3's performance on Hopper. We expect these to also improve over time :smile:

Basically, it's 90%+ of the performance for 1% of the work.

9/10

10/11
Lastly, one of the things we found really fun about FlexAttention is that we were able to leverage a lot of existing PyTorch infra in fun ways. For example, the flexibility of torch.compile to handle "captured" tensors gives the API far more flexibility.

I also personally think FlexAttention is a good example of how ML compilers can be useful *even* in a world where Attention is All You Need. I think there's a lot of potential in work along this direction :smile:

Check out the blog post for more details! FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention

And check out the Attention Gym for more examples on how to use FlexAttention: GitHub - pytorch-labs/attention-gym: Helpful tools and examples for working with flex-attention

Finally, if you find this kind of work interesting (building foundational infra for deep learning), the PyTorch team might be a good fit. DM me if you're interested!

10/10

11/11
Also, go follow @drisspg and @yanboliang, who did most of the work on this project!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GUZRxPbboAA2ERM.jpg

GUZXwwIWgAAGE_H.jpg

GUZfLqya8AAYJnj.png

GUZfOJla8AIzPX1.png

GUZfX_gagAAai2c.jpg

GUZf-koa8AAEkD4.jpg

GUZhnUma8AYhggI.jpg

GUZh1NBa8AgJMj2.jpg

GUZicE2a4AAX14P.jpg

GUZjGhLa8AAMzb3.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573

1/6
Flux text to image by @bfl_ml is supported in ComfyUI v0.0.4. Try the FP8 weights if you don't have 24GB of VRAM!

Hunyuan model support is also available. More details in our blog.

August 2024: Flux Support, New Frontend, For Loops, and more!

2/6
Good work as usual 🙏

3/6
super cool

4/6
Flux text in ComfyUI

5/6
Good evening. Can you clarify if it works with AMD gpus for Windows? Mine is the 7900xtx 24gb.

6/6
I admire your super fast job


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GUXSDgTW0AASXhp.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573


▷ Online Chatbots​




▷ Roleplaying Chatbots​


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573

Forget Midjourney — Flux is the new king of AI image generation and here’s how to get access​

News

By Ryan Morrison
published August 8, 2024

Multiple ways to try Flux

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

zwBqBZRAiyq6mxunh86npd.png.webp


Image generated using Flux.01 running on a gaming laptop(Image credit: Flux AI/Future generated)

New AI products and services come in two ways; like a bolt of lightning with no warning, or after months of constant hype. Flux, by startup Black Forest Labs, was the former.

The AI image generation model is being dubbed the rightful heir to Stable Diffusion and it quickly went viral after its release with direct comparisons to market leader Midjourney.

The difference between Flux and Midjourney is that Flux is open-source and can run on a reasonably good laptop. This means it is, or will, also be available on many of the same multi-model platforms like Poe, Nightcafe and FreePik as Stable Diffusion.

I’ve been using it and my initial impressions are that in some areas it is better than Midjourney, especially around rendering people, but its skin textures aren’t as good as Midjourney v6.1.

What is Flux and where did it come from?​


sGwb5rMQ8NenuziHL8Gajd.png.webp


Image generated using Flux.01 running on a gaming laptop (Image credit: Flux AI/Future generated)

Flux came from AI startup Black Forest Labs. This new company was founded by some of the people responsible for most modern AI image generation technologies.

The German-based company is led by Robin Rombach, Andreas Blattmann and Dominik Lorenz, all former engineers at Stability AI, along with other leading figures in the development of diffusion-based AI models. This is the technology that also powers many AI video tools.

There are three versions of Flux.01 currently available, all text-to-image models. The first is a Pro version with a commercial license and is mainly used by companies like FreePik to offer its subscribers access to generative AI image technology.

The next two are Dev and Schnell. These are the mid-weight and fast models and in my tests — running on a laptop with an RTX 4090 — they outperform Midjourney, DALL-E and even Ideogram in adherence to the prompt, image quality and text rendering on an image.

The company is also working on a text-to-video model that it promises will offer high-quality output and be available open-source. Branding it: “State-of-the-Art Text to Video for all.”

Where can I use Flux today?​

We are excited to announce the launch of Black Forest Labs. Our mission is to develop and advance state-of-the-art generative deep learning models for media and to push the boundaries of creativity, efficiency and diversity. pic.twitter.com/ilcWvJgmsX August 1, 2024

See more

If you have a well-equipped laptop you can download and run Flux.01 locally. There are some easy ways to do this including by using the Pinokio launcher. This makes it relatively trivial to install and run AI models with a couple of clicks and is free to use. It is a large file though.

However, if you’re machine isn’t up to the job there are several websites already offering access to Flux.01 and in some cases, this includes the largest commercial Pro model.

NightCafe, which is one of my favorite AI image platforms, already has access to the model and you could quickly compare that o images from other tools like Ideogram and Stable Diffusion 3.

Poe, the AI model platform, has access to Flux.01 and lets you generate the images in a chatbot-style format similar to creating pictures using tools like ChatGPT and DALL-E.

You can also get access through platforms more typically targeted at developers including Based Labs, Hugging Face and Fal.ai. FreePik, one of the largest AI image platforms on the market says it is also working to bring Flux to its site.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573

I tested Flux vs Midjourney to see which AI image generator is best — here's the winner​

Face-off

By Ryan Morrison
published 9 hours ago

Creating hyperrealistic images

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

AI generated images of a street musician with a guitar created by Flux and Midjourney

(Image credit: Flux / Midjourney)

Jump to:


Flux is an artificial intelligence image generator released by AI startup Black Forest Labs in the past few weeks and it has quickly become one of the most powerful and popular tools of its kind, even giving market leader Midjourney a run for its money.

Unlike Midjourney, which is a closed and paid-for service only available from Midjourney itself, Flux is an open-source model available to download and run locally or on a range of platforms such as Freepik, NightCafe and Hugging Face.

To determine whether Flux has reached Midjourney levels of photorealism and accurate human depiction I’ve come up with 5 descriptive prompts and run them on both. I’m generating Flux images using ComfyUI installed through the Pinokio AI installer.

Creating the prompts​


Both Midjourney and Flux benefit from a descriptive prompt. To get exactly what you want out of the model its good to describe not just the person but also the style, lighting and structure.

I’ve included each prompt below for you to try yourself and these should also work with Ideogram, DALL-E 3 in ChatGPT or other AI image platforms if you don’t have Midjourney or Flux but, except Ideogram, none reach the realism of Midjourney or Flux.


1. A chef in the kitchen​


Midjourney


Chef image generated by Midjourney (Image credit: Midjourney/Future AI)

Flux AI image


Chef image generated by Flux (Image credit: Flux AI image/Future)

The first test combines the need to generate a complex skin texture with a dynamic environment — namely a professional kitchen. The prompt asks for a woman in her mid-50s in the middle of preparing a meal.

It also asks for the depiction of sous chefs in the background and for the chef's name to be shown on a "spotless white double-breasted chef's jacket".
A seasoned chef in her mid-50s is captured in action in a bustling professional kitchen. Her salt-and-pepper hair is neatly tucked under a crisp white chef's hat, with a few strands escaping around her temples. Her face, marked with laugh lines, shows intense concentration as she tastes a sauce from a wooden spoon. Her eyes, a warm brown, narrow slightly as she considers the flavor. The chef is wearing a spotless white double-breasted chef's jacket with her name embroidered in blue on the breast pocket. Black and white checkered pants and slip-resistant clogs complete her professional attire. A colorful array of sauce stains on her apron tells the story of a busy service. Behind her, the kitchen is a hive of activity. Stainless steel surfaces gleam under bright overhead lights, reflecting the controlled chaos of dinner service. Sous chefs in white jackets move purposefully between stations, and steam rises from pots on industrial stoves. Plates of artfully arranged dishes wait on the pass, ready for service. In the foreground, a marble countertop is visible, strewn with fresh herbs and exotic spices. A stack of well-worn cookbooks sits nearby, hinting at the chef's dedication to her craft and continuous learning. The overall scene captures the intensity, precision, and passion of high-end culinary artistry.

Winner: Midjourney

Midjourney wins for the realism of the main character. It isn't perfect and I prefer the dynamism of the Flux image but the challenge is creating accurate humans and Midjourney is closer with better skin texture.


2. A street musician​


Midjourney


Street musician image generated by Midjourney (Image credit: Midjourney/Future AI image)

Flux AI image


Street musician image generated by Flux (Image credit: Flux AI image/Future)

The next prompt asks both AI image generators to show a street musician in his late 30s performing on a busy city corner lost in the moment of the music.

Part of the prompt requires the inclusion of an appreciative passerby, coins in a guitar case and city life blurring in motion behind the main character.
A street musician in his late 30s is frozen in a moment of passionate performance on a busy city corner. His long, dark dreadlocks are caught mid-sway, some falling over his face while others dance in the air around him. His eyes are closed in deep concentration, brows slightly furrowed, as his weathered hands move deftly over the strings of an old, well-loved acoustic guitar. The musician is wearing a vibrant, hand-knitted sweater that's a patchwork of blues, greens, and purples. It hangs loosely over distressed jeans with artistic patches on the knees. On his feet are scuffed brown leather boots, tapping in rhythm with his music. Multiple colorful braided bracelets adorn his wrists, adding to his bohemian appearance. He stands on a gritty sidewalk, with a battered guitar case open at his feet. It's scattered with coins and bills from appreciative passersby, along with a few fallen autumn leaves. Behind him, city life unfolds in a blur of motion: pedestrians hurry past, yellow taxis honk in the congested street, and neon signs begin to flicker to life as dusk settles over the urban landscape. In the foreground, slightly out of focus, a child tugs on her mother's hand, trying to stop and listen to the music. The scene captures the raw energy and emotion of street performance against the backdrop of a bustling, indifferent city.

Winner: Midjourney

Midjourney wins again for the realism of the character. The texture quality of v6.1 once again puts it just ahead. It is also overall a better image in terms of structure, layout and background.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573


3. The gardener​


Midjourney


Gardening image created with Midjourney (Image credit: Midjourney/Future AI image)

Flux AI image


Gardening image created by Flux (Image credit: Flux AI image/Future)

Generating images of older people can always be a struggle for AI image generators because of the more complex skin texture. Here we want a woman in her 80s caring for plants in a rooftop garden.

The image depicts elements of the scene including climbing vines and a golden evening light with the city skyline looming large behind our gardener.
An elderly woman in her early 80s is tenderly caring for plants in her rooftop garden, set against a backdrop of a crowded city. Her silver hair is tied back in a loose bun, with wispy strands escaping to frame her kind, deeply wrinkled face. Her blue eyes twinkle with contentment as she smiles at a ripe tomato cradled gently in her soil-stained gardening gloves. She's wearing a floral print dress in soft pastels, protected by a well-worn, earth-toned apron. Comfortable slip-on shoes and a wide-brimmed straw hat complete her gardening outfit. A pair of reading glasses hangs from a beaded chain around her neck, ready for when she needs to consult her gardening journal. The rooftop around her is transformed into a green oasis. Raised beds burst with a variety of vegetables and flowers, creating a colorful patchwork. Trellises covered in climbing vines stand tall, and terracotta pots filled with herbs line the edges. A small greenhouse is visible in one corner, its glass panels reflecting the golden evening light. In the background, the city skyline looms large - a forest of concrete and glass that stands in stark contrast to this vibrant garden. The setting sun casts a warm glow over the scene, highlighting the lush plants and the serenity on the woman's face as she finds peace in her urban Eden.

Winner: Midjourney

Once again Midjourney wins because of the texture quality. It struggled a little with the gloved fingers but it was better than Flux. That doesn't mean Flux isn't a good image but it isn't as good as Midjourney.


4. Paramedic in an emergency​


Midjourney


Paramedic image generated by Midjourney (Image credit: Midjourney/Future AI image)

Flux AI image


Paramedic image generated by Flux (Image credit: Flux AI image/Future)

For this prompt I went with something more action heavy, focusing on a paramedic in the moment of rushing to the ambulance on a rainy day. This included a description of water droplets clinging to eyelashes and reflective strips.

This was a more challenging prompt for AI image generators as it has to capture the darker environment. 'Golden hour' light is easier for AI than night and twilight.
A young paramedic in her mid-20s is captured in a moment of urgent action as she rushes out of an ambulance on a rainy night. Her short blonde hair is plastered to her forehead by the rain, and droplets cling to her eyelashes. Her blue eyes are sharp and focused, reflecting the flashing lights of the emergency vehicles. Her expression is one of determination and controlled urgency. She's wearing a dark blue uniform with reflective strips that catch the light, the jacket partially unzipped to reveal a light blue shirt underneath. A stethoscope hangs around her neck, bouncing slightly as she moves. Heavy-duty black boots splash through puddles, and a waterproof watch is visible on her wrist, its face illuminated for easy reading in the darkness. In her arms, she carries a large red medical bag, gripping it tightly as she navigates the wet pavement. Behind her, the ambulance looms large, its red and blue lights casting an eerie glow over the rain-slicked street. Her partner can be seen in the background, wheeling out a gurney from the back of the vehicle. In the foreground, blurred by the rain and motion, concerned onlookers gather under umbrellas near what appears to be a car accident scene just out of frame. The wet street reflects the emergency lights, creating a dramatic kaleidoscope of color against the dark night. The entire scene pulses with tension and the critical nature of the unfolding emergency.

Winner: Draw

I don't think either AI image generator won this round. Both have washed out and over 'plastic' face textures likely caused by the lighting issues. Midjourney does a slightly better job matching the description of the scene.


5. The retired astronaut​


Midjourney


Retired astronaut image by Midjourney (Image credit: Midjourney/Future AI image)

Flux AI image


Retired astronaut image by Flux (Image credit: Flux AI image/Future)

Finally we have a scene in a school. Here I've asked the AI models to generate a retired astronaut in his late 60s giving a presentation about space.

He is well presented in good health depicting a NASA logo. The background is well described with posters, quotes and people watching as he speaks.
A retired astronaut in his late 60s is giving an animated presentation at a science museum. His silver hair is neatly trimmed, and despite his age, he stands tall and straight, a testament to years of rigorous physical training. His blue eyes sparkle with enthusiasm as he gestures towards a large scale model of the solar system suspended from the ceiling. He's dressed in a navy blue blazer with a small, subtle NASA pin on the lapel. Underneath, he wears a light blue button-up shirt and khaki slacks. On his left wrist is a watch that looks suspiciously like the ones worn on space missions. His hands, though showing signs of age, move with the precision and control of someone used to operating in zero gravity. Around him, a diverse group of students listen with rapt attention. Some furiously scribble notes, while others have their hands half-raised, eager to ask questions. The audience is a mix of ages and backgrounds, all united by their fascination with space exploration. The walls of the presentation space are adorned with large, high-resolution photographs of galaxies, nebulae, and planets. Inspirational quotes about exploration and discovery are interspersed between the images. In one corner, a genuine space suit stands in a glass case, adding authenticity to the presenter's words. Sunlight streams through large windows, illuminating particles of dust floating in the air, reminiscent of stars in the night sky. The entire scene is bathed in a sense of wonder and possibility, as the retired astronaut bridges the gap between Earth and the cosmos for his eager audience.

Winner: Flux

I am giving this one to Flux. It won because it had skin texture and human realism on par or slightly better than Midjourney but with a much better overall image structure including more realistic background people.

Flux vs Midjourney: Which model wins​

Header Cell - Column 0MidjourneyFlux
A chef in the kitchen🌅
A street musician🌅
The gardener🌅
Paramedic in an emergency🌅🌅
The retired astronaut🌅

This was almost a clean sweep for Midjourney and it was mainly driven by the improvements Midjourney has made in skin texture rendering with v6.1.

I don't think it was as clear as it looks on paper though as in many images Flux had a better overall image structure and was better at backgrounds. I've also found Flux is more consistent with text rendering than Midjourney — but this test was about people and creating realistic digital humans.

What it does show is that even at the bleeding edge of AI image generation there are still tells in every image that sell it as AI generated.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,409
Reputation
8,215
Daps
156,573


TikTok could get an AI-powered video generator after ByteDance drops new AI model — here's what we know​

News

By Ryan Morrison
published August 8, 2024

Only in China for now

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Jimeng AI

(Image credit: Jimeng AI/ByteDance)

ByteDance, the Chinese company behind TikTok and viral video editor CapCut, has released its first AI text-to-video model designed to compete with the yet-to-be-released Sora from OpenAI — but for now, it's only available in China.

Jimeng AI was built by Faceu Technology, a company owned by ByteDance that produces the CaptCut video editing app and is available for iPhone and Android as well as online.

To get access you have to log in with a Douyin account, the Chinese version of TikTok which suggests if it does come to other regions it will be linked to TikTok or CapCut. It is possible, but purely speculative, that a version of Jimeng will be built into CapCut in the future.

ByteDance isn’t the only Chinese company building out AI video models. Kuaishou is one of China's largest video apps and last month it made Kling AI video available outside of China for the first time. It is one of my favorite AI tools with impressive motion quality and video realism.


What is Jimeng AI?​


Jimeng AI

(Image credit: Jimeng AI/ByteDance)

Jimeng AI is a text-to-video model trained and operated by Faceu Technology, the Chinese company behind the CapCut video editor. Like Kling, Sora, Runway and Luma Labs Dream Machine it takes a text input and generates a few seconds of realistic video content.

Branding itself the "one-stop AI creation platform" you can generate video from text or images and it gives you control over camera movement and first and last frame input. This is something most modern AI video generators offer where you give it two images and it fills in the moments between them.

The focus for Faceu has been on ensuring its model can understand and accurately follow Chinese text prompts and convert abstract ideas into visual works.

How does Jimeng AI compare?[/HEADING]

From the video clips I’ve seen on social media and the Jimeng website, it appears to be closer to Runway Gen-2 or Pika Labs than Sora, Gen-3 or even Kling. Video motion appears slightly blurred or shaky and output is more comic than realism.

What I haven't been able to confirm, as it isn't available outside of China, is how long each video clip is at initial generation or whether you can extend a clip.

Most tools including Kling start at 5 seconds where Runway is 10 seconds and Sora is reportedly 15 seconds. Many of them also allow for multiple extensions to that initial clip.

I think Jimeng being mobile-first and tied to apps like Douyin and CapCut put it in a different category to the likes of Kling and Dream Machine. It is better compared to the likes of the Captions App or Diffuse in that its content is primarily aimed at social video than production.
 
Top