bnew

Veteran
Joined
Nov 1, 2015
Messages
61,764
Reputation
9,318
Daps
169,650

Meta Reality Labs Research Introduces Sonata: Advancing Self-Supervised Representation Learning for 3D Point Clouds​


By Nikhil

March 28, 2025

Reddit Vote Flip Share Tweet 0 Shares

3D self-supervised learning (SSL) has faced persistent challenges in developing semantically meaningful point representations suitable for diverse applications with minimal supervision. Despite substantial progress in image-based SSL, existing point cloud SSL methods have largely been limited due to the issue known as the “geometric shortcut,” where models excessively rely on low-level geometric features like surface normals or point heights. This reliance compromises the generalizability and semantic depth of the representations, hindering their practical deployment.

Researchers from the University of Hong Kong and Meta Reality Labs Research introduce Sonata, an advanced approach designed to address these fundamental challenges. Sonata employs a self-supervised learning framework that effectively mitigates the geometric shortcut by strategically obscuring low-level spatial cues and reinforcing dependency on richer input features. Drawing inspiration from recent advancements in image-based SSL, Sonata integrates a point self-distillation mechanism that gradually refines representation quality and ensures robustness against geometric simplifications.

Check out how HOSTINGER HORIZONS can help to build and launch full-stack web apps, tools, and software in minutes without writing any code (Promoted)

At a technical level, Sonata utilizes two core strategies: firstly, it operates on coarser scales to obscure spatial information that might otherwise dominate the learned representations. Secondly, Sonata adopts a point self-distillation approach, progressively increasing task difficulty through adaptive masking strategies to foster deeper semantic understanding. Crucially, Sonata removes decoder structures traditionally used in hierarchical models to avoid reintroducing local geometric shortcuts, allowing the encoder alone to build robust, multi-scale feature representations. Additionally, Sonata applies “masked point jitter,” introducing random perturbations to the spatial coordinates of masked points, thus further discouraging reliance on trivial geometric features.

The empirical results reported validate Sonata’s efficacy and efficiency. Sonata achieves significant performance gains on benchmarks like ScanNet, where it records a linear probing accuracy of 72.5%, substantially surpassing previous state-of-the-art SSL approaches. Importantly, Sonata demonstrates robustness even with limited data, performing effectively using as little as 1% of the ScanNet dataset, which highlights its suitability for low-resource scenarios. Its parameter efficiency is also notable, delivering strong performance improvements with fewer parameters compared to conventional methods. Furthermore, integrating Sonata with image-derived representations such as DINOv2 results in enhanced accuracy, emphasizing its capacity to capture distinctive semantic details specific to 3D data.

Sonata’s capabilities are further illustrated through insightful zero-shot visualizations including PCA-colored point clouds and dense feature correspondence, demonstrating coherent semantic clustering and robust spatial reasoning under challenging augmentation conditions. The versatility of Sonata is also evidenced across various semantic segmentation tasks, spanning indoor datasets like ScanNet and ScanNet200, as well as outdoor datasets including Waymo, consistently achieving state-of-the-art outcomes.

In conclusion, Sonata represents a significant advancement in addressing inherent limitations in 3D self-supervised learning. Its methodological innovations effectively resolve issues associated with the geometric shortcut, providing semantically richer and more reliable representations. Sonata’s integration of self-distillation, careful manipulation of spatial information, and scalability to large datasets establish a solid foundation for future explorations in versatile and robust 3D representation learning. The framework sets a methodological benchmark, facilitating further research towards comprehensive multimodal SSL integration and practical 3D applications.

Check out the Paper and GitHub Page . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit .

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,764
Reputation
9,318
Daps
169,650

Vision-R1: Redefining Reinforcement Learning for Large Vision-Language Models​


By Sana Hassan

March 26, 2025

Reddit Vote Flip Share Tweet 0 Shares

Large Vision-Language Models (LVLMs) have made significant strides in recent years, yet several key limitations persist. One major challenge is aligning these models effectively with human expectations, particularly for tasks involving detailed and precise visual information. Traditionally, LVLMs undergo a two-stage training paradigm: pretraining followed by supervised fine-tuning. However, supervised fine-tuning alone cannot fully overcome limitations, such as the scarcity and high cost associated with generating large-scale, human-annotated preference datasets. Moreover, conventional reinforcement learning methods require expensive reward models that may not fully capture the nuanced and subjective nature of human feedback.

A team of researchers from China propose Vision-R1: a novel vision-guided R1-like reinforcement learning algorithm for LVLMs that rewards models with definitive vision feedback. Vision-R1 leverages curated instruction data, thereby eliminating the dependency on specialized reward models and handcrafted preference datasets. Central to this method is a criterion-driven reward function, which provides comprehensive evaluations of model completions based on specific visual task criteria. Additionally, a progressive rule refinement strategy is employed, dynamically adjusting reward criteria throughout the training process. This approach ensures continuous performance improvement, effectively mitigating reward hacking issues and promoting more accurate object localization.

Check out how HOSTINGER HORIZONS can help to build and launch full-stack web apps, tools, and software in minutes without writing any code (Promoted)

The Vision-R1 algorithm incorporates several critical technical innovations. First, the criterion-driven reward function includes dual format rewards, recall rewards, and precision rewards. Dual format rewards ensure outputs adhere strictly to template and content constraints, essential for reliable object detection tasks. The recall reward emphasizes the model’s capacity to identify all relevant instances, crucial for avoiding omissions in predictions. The precision reward encourages high-quality bounding box predictions by calculating the average Intersection over Union (IoU) of valid predictions. Furthermore, the progressive rule refinement strategy is inspired by curriculum learning principles, gradually increasing training difficulty through staged progression and differentiation policies, thereby fostering robust and generalized learning.

Experiments conducted using two state-of-the-art LVLMs, Griffon-G-7B and Qwen2.5-VL-7B, demonstrate the robust capabilities of Vision-R1. Results on in-domain datasets such as MSCOCO and ODINW-13 show significant performance enhancements. Specifically, Vision-R1 improves Griffon-G-7B’s mAP scores by 2.5% on average across diverse tasks. More impressively, Vision-R1 boosts Qwen2.5-VL-7B’s performance significantly, showing an 8.9% improvement in COCO object detection tasks and achieving superior scores compared to its larger, 72B counterpart. On challenging out-of-domain localization tasks, Vision-R1 consistently outperforms supervised fine-tuning (SFT), demonstrating its strong generalization capabilities and robustness in complex scenarios.

In conclusion, Vision-R1 introduces an innovative reinforcement learning approach tailored for LVLMs that effectively addresses existing alignment issues without requiring costly annotated datasets or complex reward modeling. Its criterion-driven reward structure and progressive rule refinement strategy not only enhance the accuracy and comprehensiveness of object localization tasks but also significantly improve generalization to unseen scenarios. The successful integration of Vision-R1 with contemporary LVLM architectures highlights its potential to serve as a foundational method, significantly advancing the state-of-the-art in vision-language understanding and practical deployment in real-world applications.

Check out the Paper and GitHub Page . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit .

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,764
Reputation
9,318
Daps
169,650

This AI Paper Introduces the Kolmogorov-Test: A Compression-as-Intelligence Benchmark for Evaluating Code-Generating Language Models​


By Nikhil

March 26, 2025

Reddit Vote Flip Share Tweet 0 Shares

Compression is a cornerstone of computational intelligence, deeply rooted in the theory of Kolmogorov complexity, which defines the minimal program needed to reproduce a given sequence. Unlike traditional compression methods that look for repetition and redundancy, Kolmogorov’s framework interprets compression as a problem of discovering structured patterns through programmatic representation. While the theory promises optimal compression, its uncomputability poses a significant hurdle. Nevertheless, the emergence of large language models capable of code generation opens an intriguing opportunity to test how closely modern systems can approximate this theoretical ideal by reasoning through code rather than pattern matching.

A core issue arises from the limitations of current tools in compressing data sequences using concise, executable code. Models often replicate inputs rather than generate programs that reproduce them, indicating a gap in true pattern understanding. This becomes especially evident when dealing with real-world audio, text, or DNA sequences, where complex logical structures must be uncovered to achieve efficient compression. The main challenge is ensuring the model replicates the sequence and uses a minimal and rational set of instructions. Furthermore, though synthetic training data is useful for controlled evaluation, it often fails to support robust generalization to natural data, which is essential for practical applications.

Check out how HOSTINGER HORIZONS can help to build and launch full-stack web apps, tools, and software in minutes without writing any code (Promoted)

AD_4nXdp79x3831tYM0XnckMUvImRsl8_vwR6VwJCxUEbnZ4hosnK9xYvWJFkp1C8I7XonxezEwM_TF5ZWrh2Khrs-dzb34hEj3rcCj8PfmHy1AjEbVFLwmIGHisPeKr2Rb-s86KvLep


Several compression tools exist, ranging from traditional algorithms like GZIP to newer neural compression systems. GZIP remains a strong baseline, especially for long or repetitive sequences, due to its effective encoding of statistical regularities. More recently, language modeling approaches have integrated with arithmetic coding, using prediction probabilities to compress input data. However, these methods typically require access to the full model weights at decoding time, limiting their efficiency and applicability. Prompted code-generating models like GPT-4 and LLaMA have also been evaluated in zero-shot settings to generate Python programs that reproduce input sequences. Yet, they frequently produce lengthy, imprecise code with limited success, particularly when faced with unseen or complex sequences.

Researchers from Meta AI and Tel Aviv University introduced the Kolmogorov-Test (KT), a benchmark for assessing the reasoning capability of code-generating language models. The test evaluates a model’s ability to generate the shortest program that outputs a given input sequence. Unlike typical benchmarks, KT emphasizes logical composition and program generation over predictive text modeling. Sequences include natural data from audio (LibriSpeech), text (Wikipedia enwik9), and DNA (GRCh38), as well as synthetic sequences generated through a custom-designed domain-specific language (DSL). This DSL supports building structured sequences by composing operations like range creation, sequence modification, merging, and filtering.

AD_4nXdVMQzri0pVmtIi7_xfQTfnqz34do9S-5S6--bTs7NT1ZXJS1GX6DErj4oTAcJ_QOYAJ-ESpic9DZ4QGwZEZlS__PBK-BLgP0L8IJDpkqSJ4CKckqwYQLULwkKumtv8yR6UcpZ1Og


The researchers developed an automated framework to generate millions of synthetic program-sequence pairs using this DSL. These programs then train and evaluate models, including large pre-trained and specifically trained ones like SEQCODER. To measure performance, the team employed metrics such as accuracy—whether the generated program reproduces the sequence—and precision—how concise the correct program is compared to GZIP compression. The test involved compressing sequences of varying lengths, with synthetic sequences averaging 76 bytes and real sequences capped at 128.

Results showed that even the most powerful models struggled. GPT-4 achieved 69.5% accuracy on high-quality audio but dropped to 36.4% for 8-bit audio and 50.3% for DNA data. LLaMA-3.1-405B performed worse, with accuracies as low as 3.9% for audio and only 24.8% for DNA. In synthetic data, SEQCODER-8B reached 92.5% accuracy with a precision score of 0.56, outperforming traditional tools like GZIP. However, its accuracy on real-world data remained near zero. This discrepancy illustrates the difficulty in transferring success from synthetic benchmarks to more varied and noisy real-world sequences, highlighting the limitations of current training regimes and prompting the need for new strategies.

AD_4nXcTZGHmggBHi0H44OGjg14yIdzkzLSGICaczpIN92CzpdKezwxmUHkEf9NJkf5fhBd4RRKEO-O7SfPmmUDo2fEkGZweyI4L2gtNAvuXu1562ug6WKfLecJLOjlZaDCHLcOXSJD3Eg


Overall, this research clearly outlines the complexity of compression via code generation. The KT benchmark provides a rigorous and diverse model reasoning and structure recognition test, exposing the stark divide between synthetic learning environments and real-world applications. The introduced methodology and test set a high bar for future models aiming to unify reasoning with compression, but significant innovation is still required to meet this challenge.

Check out the Paper . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit .

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,764
Reputation
9,318
Daps
169,650

Tencent AI Researchers Introduce Hunyuan-T1: A Mamba-Powered Ultra-Large Language Model Redefining Deep Reasoning, Contextual Efficiency, and Human-Centric Reinforcement Learning​


By Asif Razzaq

March 29, 2025

Reddit Vote Flip Share Tweet 0 Shares

Large language models struggle to process and reason over lengthy, complex texts without losing essential context. Traditional models often suffer from context loss, inefficient handling of long-range dependencies, and difficulties aligning with human preferences, affecting the accuracy and efficiency of their responses. Tencent’s Hunyuan-T1 directly tackles these challenges by integrating a novel Mamba-powered architecture with advanced reinforcement learning and curriculum strategies, ensuring robust context capture and enhanced reasoning capabilities.

Hunyuan-T1 is the first model powered by the innovative Mamba architecture, a design that fuses Hybrid Transformer and Mixture-of-Experts (MoE) technologies. Built on the TurboS fast-thinking base, Hunyuan-T1 is specifically engineered to optimize the processing of long textual sequences while minimizing computational overhead. This allows the model to effectively capture extended context and manage long-distance dependencies, crucial for tasks that demand deep, coherent reasoning.

Check out how HOSTINGER HORIZONS can help to build and launch full-stack web apps, tools, and software in minutes without writing any code (Promoted)

AD_4nXfjF8HqVp6xaYKlOT3Ndd_cRsUPjOLEyySfyvh9qARuI7dtv2VXhH0VchEZiueFyAG5gucLwRqsITiW_A1iSZ9C8dQv3fvuOg1Se0xXcQxq6_Fc9IoCQcXUqANHujhaqjtMQxFm


A key highlight of Hunyuan-T1 is its heavy reliance on RL during the post-training phase. Tencent dedicated 96.7% of its computing power to this approach, enabling the model to refine its reasoning abilities iteratively. Techniques such as data replay, periodic policy resetting, and self-rewarding feedback loops help improve output quality, ensuring the model’s responses are detailed, efficient, and closely aligned with human expectations.

To further boost reasoning proficiency, Tencent employed a curriculum learning strategy. This approach gradually increases the difficulty of training data while simultaneously expanding the model’s context length. As a result, Hunyuan-T1 is trained to use tokens more efficiently, seamlessly adapting from solving basic mathematical problems to tackling complex scientific and logical challenges. Efficiency is another cornerstone of Hunyuan-T1’s design. The TurboS base’s ability to capture long-text information prevents context loss, a common issue in many language models, and doubles the decoding speed compared to similar systems. This breakthrough means that users benefit from faster, higher-quality responses without compromising performance.

AD_4nXe0lKtz62MzPL2AQ4wnsktG4VDWKm8KsF_qZRxGEpzi4qMafKgJ2kfeqIUzDzKCVpCDJRAu_8n-RBXRYHEj7brovQIfvwrbxCJPjfNTkys9F7_83FmqUMhGl_DhVVlkm6nBoQ-g


The model has achieved impressive scores on multiple benchmarks: 87.2 on MMLU-PRO, which tests various subjects including humanities, social sciences, and STEM fields; 69.3 on GPQA-diamond, a challenging evaluation featuring doctoral-level scientific problems; 64.9 on LiveCodeBench for coding tasks; and a remarkable 96.2 on the MATH-500 benchmark for mathematical reasoning. These results underscore Hunyuan-T1’s versatility and ability to handle high-stakes, professional-grade tasks across various fields. Beyond quantitative metrics, Hunyuan-T1 is designed to deliver outputs with human-like understanding and creativity. During its RL phase, the model underwent a comprehensive alignment process that combined self-rewarding feedback with external reward models. This dual approach ensures its responses are accurate and exhibit rich details and natural flow.

In conclusion, Tencent’s Hunyuan-T1 combines an ultra-large-scale, Mamba-powered architecture with state-of-the-art reinforcement learning and curriculum strategies. Hunyuan-T1 delivers high performance, enhanced reasoning, and exceptional efficiency.

Check out the Details , Hugging Face and GitHub Page . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit .

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,764
Reputation
9,318
Daps
169,650

Advancing Medical Reasoning with Reinforcement Learning from Verifiable Rewards (RLVR): Insights from MED-RLVR​


By Sana Hassan

March 29, 2025

Reddit Vote Flip Share Tweet 0 Shares

Reinforcement Learning from Verifiable Rewards (RLVR) has recently emerged as a promising method for enhancing reasoning abilities in language models without direct supervision. This approach has shown notable success in mathematics and coding, where reasoning naturally aligns with structured problem-solving. While studies have demonstrated that RLVR alone can lead to self-evolved reasoning, research has largely been limited to these technical fields. Efforts to extend RLVR have explored synthetic datasets, such as those involving sequential tasks and object counting, indicating potential but also highlighting the challenges of adapting this method to different domains.

Expanding RLVR to broader areas remains an open challenge, particularly in tasks like multiple-choice question answering (MCQA), which provides structured, verifiable labels across diverse subjects, including medicine. However, unlike math and coding, which involve complex reasoning with an open-ended answer space, MCQA tasks typically have predefined answer choices, making it uncertain whether RLVR’s benefits translate effectively. This limitation is especially relevant in medical reasoning tasks, where models must navigate intricate clinical knowledge to produce accurate responses, an area that has proven difficult for existing AI systems.

Check out how HOSTINGER HORIZONS can help to build and launch full-stack web apps, tools, and software in minutes without writing any code (Promoted)

Researchers from Microsoft Research investigate whether medical reasoning can emerge through RLVR. They introduce MED-RLVR, leveraging medical MCQA data to assess RLVR’s effectiveness in the medical domain. Their findings show that RLVR extends beyond math and coding, achieving performance comparable to supervised fine-tuning (SFT) in in-distribution tasks while significantly improving out-of-distribution generalization by eight percentage points. Analyzing training dynamics, they observe that reasoning capabilities emerge in a 3B-parameter base model without explicit supervision, highlighting RLVR’s potential for advancing reasoning in knowledge-intensive fields like medicine.

RL optimizes decision-making by training an agent to maximize rewards through interactions with an environment. It has been effectively applied to language models to align outputs with human preferences and, more recently, to elicit reasoning without explicit supervision. This study employs Proximal Policy Optimization (PPO) to train a policy model, incorporating a clipped objective function to stabilize training. Using a rule-based reward function, MED-RLVR assigns rewards based on output correctness and format validity. Without additional supervision, the model demonstrates emergent medical reasoning, similar to mathematical reasoning in prior RLVR studies, highlighting RLVR’s potential beyond structured domains.

The MedQA-USMLE dataset, which includes multi-choice medical exam questions, is used to train MED-RLVR. Unlike the standard four-option version, this dataset presents a greater challenge by offering more answer choices. Training is based on the Qwen2.5-3B model using OpenRLHF for reinforcement learning. Compared to SFT, MED-RLVR demonstrates superior generalization, particularly on the MMLU-Pro-Health dataset. Analysis reveals six stages of reasoning evolution: format failures, verbose outputs, reward hacking, and reintegrated reasoning. Unlike math or coding tasks, no self-validation behaviors (“aha-moments”) were observed, suggesting potential improvements through penalizing short reasoning chains or fine-tuning with longer CoTs.

AD_4nXfwcLOHCRXhoxwB0MsiTVVaEFxpPCQIQSUUaFJIJnjakMOtubpC_YAoKrnTwWwushxOk-xS5EodqXYlshEsNgl-JGj80kBdJVTp0PAxAh1KLtVhB_9T3UICbIr299EL3aFG0_nI


In conclusion, the study focuses on MCQA in medicine, providing a controlled setting for evaluation. However, MCQA does not fully capture the complexity of real-world tasks like open-text answering, report generation, or medical dialogues. Additionally, the unimodal approach limits the model’s ability to integrate multimodal data, which is crucial for diagnostic applications. Future work should address these limitations. MED-RLVR, based on reinforcement learning with verifiable rewards, matches SFT on in-distribution tasks and improves out-of-distribution generalization. While medical reasoning emerges without explicit supervision, challenges like reward hacking persist, highlighting the need for further exploration of complex reasoning and multimodal integration.

Check out the Paper . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit .

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,764
Reputation
9,318
Daps
169,650

Google AI Released TxGemma: A Series of 2B, 9B, and 27B LLM for Multiple Therapeutic Tasks for Drug Development Fine-Tunable with Transformers​


By Asif Razzaq

March 27, 2025

Reddit Vote Flip Share Tweet 0 Shares

Developing therapeutics continues to be an inherently costly and challenging endeavor, characterized by high failure rates and prolonged development timelines. The traditional drug discovery process necessitates extensive experimental validations from initial target identification to late-stage clinical trials, consuming substantial resources and time. Computational methodologies, particularly machine learning and predictive modeling, have emerged as pivotal tools to streamline this process. However, existing computational models are typically highly specialized, limiting their effectiveness in addressing diverse therapeutic tasks and offering limited interactive reasoning capabilities required for scientific inquiry and analysis.

To address these limitations, Google AI has introduced TxGemma, a collection of generalist large language models (LLMs) designed explicitly to facilitate various therapeutic tasks in drug development. TxGemma distinguishes itself by integrating diverse datasets, encompassing small molecules, proteins, nucleic acids, diseases, and cell lines, which allows it to span multiple stages within the therapeutic development pipeline. TxGemma models, available with 2 billion (2B), 9 billion (9B), and 27 billion (27B) parameters, are fine-tuned from Gemma-2 architecture using comprehensive therapeutic datasets. Additionally, the suite includes TxGemma-Chat, an interactive conversational model variant, that enables scientists to engage in detailed discussions and mechanistic interpretations of predictive outcomes, fostering transparency in model utilization.

Check out how HOSTINGER HORIZONS can help to build and launch full-stack web apps, tools, and software in minutes without writing any code (Promoted)

From a technical standpoint, TxGemma capitalizes on the extensive Therapeutic Data Commons (TDC), a curated dataset containing over 15 million datapoints across 66 therapeutically relevant datasets. TxGemma-Predict, the predictive variant of the model suite, demonstrates significant performance across these datasets, matching or exceeding the performance of both generalist and specialist models currently employed in therapeutic modeling. Notably, the fine-tuning approach employed in TxGemma optimizes predictive accuracy with substantially fewer training samples, providing a crucial advantage in domains where data scarcity is prevalent. Further extending its capabilities, Agentic-Tx, powered by Gemini 2.0, dynamically orchestrates complex therapeutic queries by combining predictive insights from TxGemma-Predict and interactive discussions from TxGemma-Chat with external domain-specific tools.

Empirical evaluations underscore TxGemma’s capability. Across 66 tasks curated by the TDC, TxGemma-Predict consistently achieved performance comparable to or exceeding existing state-of-the-art models. Specifically, TxGemma’s predictive models surpassed state-of-the-art generalist models in 45 tasks and specialized models in 26 tasks, with notable efficiency in clinical trial adverse event predictions. On challenging benchmarks such as ChemBench and Humanity’s Last Exam, Agentic-Tx demonstrated clear advantages over previous leading models, enhancing accuracy by approximately 5.6% and 17.9%, respectively. Moreover, the conversational capabilities embedded in TxGemma-Chat provided essential interactive reasoning to support in-depth scientific analyses and discussions.

TxGemma’s practical utility is particularly evident in adverse event prediction during clinical trials, an essential aspect of therapeutic safety evaluation. TxGemma-27B-Predict demonstrated robust predictive performance while utilizing significantly fewer training samples compared to conventional models, illustrating enhanced data efficiency and reliability. Moreover, computational performance assessments indicate that the inference speed of TxGemma supports practical real-time applications, such as virtual screening, with the largest variant (27B parameters) capable of efficiently processing large sample volumes daily when deployed on scalable infrastructure.

In summary, the introduction of TxGemma by Google AI represents a methodical advancement in computational therapeutic research, combining predictive efficacy, interactive reasoning, and improved data efficiency. By making TxGemma publicly accessible, Google enables further validation and adaptation on diverse, proprietary datasets, thereby promoting broader applicability and reproducibility in therapeutic research. With sophisticated conversational functionality via TxGemma-Chat and complex workflow integration through Agentic-Tx, the suite provides researchers with advanced computational tools capable of significantly enhancing decision-making processes in therapeutic development.

Check out the Paper and Models on Hugging Face . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit .

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,764
Reputation
9,318
Daps
169,650

Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning Agents​


By Asif Razzaq

March 27, 2025

Reddit Vote Flip Share Tweet 0 Shares

The rapid advancements in search engine technologies integrated with large language models (LLMs) have predominantly favored proprietary solutions such as Google’s GPT-4o Search Preview and Perplexity’s Sonar Reasoning Pro. While these proprietary systems offer strong performance, their closed-source nature poses significant challenges, particularly concerning transparency, innovation, and community collaboration. This exclusivity limits customization and hampers broader academic and entrepreneurial engagement with search-enhanced AI.

In response to these limitations, researchers from the University of Washington, Princeton University, and UC Berkeley have introduced Open Deep Search (ODS)—an open-source search AI framework designed for seamless integration with any user-selected LLM in a modular manner. ODS comprises two central components: the Open Search Tool and the Open Reasoning Agent. Together, these components substantially improve the capabilities of the base LLM by enhancing content retrieval and reasoning accuracy.

Check out how HOSTINGER HORIZONS can help to build and launch full-stack web apps, tools, and software in minutes without writing any code (Promoted)

Screenshot-2025-03-27-at-3.50.27%E2%80%AFPM-1-1024x552.png


The Open Search Tool distinguishes itself through an advanced retrieval pipeline, featuring an intelligent query rephrasing method that better captures user intent by generating multiple semantically related queries. This approach notably improves the accuracy and diversity of search results. Furthermore, the tool employs refined chunking and re-ranking techniques to systematically filter search results according to relevance. Complementing the retrieval component, the Open Reasoning Agent operates through two distinct methodologies: the Chain-of-thought ReAct agent and the Chain-of-code CodeAct agent. These agents interpret user queries, manage tool usage—including searches and calculations—and produce comprehensive, contextually accurate responses.

Screenshot-2025-03-27-at-3.50.13%E2%80%AFPM-1-1024x615.png


Empirical evaluations underscore the effectiveness of ODS. Integrated with DeepSeek-R1, an advanced open-source reasoning model, ODS-v2 achieves 88.3% accuracy on the SimpleQA benchmark and 75.3% on the FRAMES benchmark. This performance notably surpasses proprietary alternatives such as Perplexity’s Sonar Reasoning Pro, which scores 85.8% and 44.4% on these benchmarks, respectively. Compared with OpenAI’s GPT-4o Search Preview, ODS-v2 shows a significant advantage on the FRAMES benchmark, achieving a 9.7% higher accuracy. These results illustrate ODS’s capacity to deliver competitive, and in specific areas superior, performance relative to proprietary systems.

An important feature of ODS is its adaptive use of tools, as demonstrated by strategic decision-making regarding additional web searches. For straightforward queries, as observed in SimpleQA, ODS minimizes additional searches, demonstrating efficient resource utilization. Conversely, for complex multi-hop queries, as in the FRAMES benchmark, ODS appropriately increases its use of web searches, thus exemplifying intelligent resource management tailored to query complexity.

Screenshot-2025-03-27-at-3.50.50%E2%80%AFPM-1024x257.png


In conclusion, Open Deep Search represents a notable advancement towards democratizing search-enhanced AI by providing an open-source framework compatible with diverse LLMs. It encourages innovation and transparency within the AI research community and supports broader participation in the development of sophisticated search and reasoning capabilities. By effectively integrating advanced retrieval techniques with adaptive reasoning methodologies, ODS contributes meaningfully to open-source AI development, setting a robust standard for future exploration in search-integrated large language models.

Check out the Paper and GitHub Page . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit .

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,764
Reputation
9,318
Daps
169,650





1/5
@AINativeF
🌟 Today’s Global AI Native Industry Insights include:
1. Anthropic's AI Microscope: Tracing the Hidden Thoughts of Claude

2. Google Launches TxGemma: Open Source AI Models for Drug Development

3. Elon Musk Consolidates Tech Empire: xAI Acquires X in $33B Deal

🔍 Dive into the in-depth insights in the thread below. Here’s what’s shaping the future of AI—and why it matters: 👇

Video Credit: Anthropic



https://video.twimg.com/ext_tw_video/1906698946799431680/pu/vid/avc1/1280x720/r3EJ8fIQ-JE3IHXe.mp4

2/5
@AINativeF
Anthropic's AI Microscope: Tracing the Hidden Thoughts of Claude

🔑 Key Details:
- AI Microscope: Anthropic developed tools to trace Claude's internal reasoning, revealing how it actually thinks, plans, and decides.
- Language-Agnostic Thinking: Claude operates in a shared conceptual space across languages, pointing to a universal "language of thought."
- Forward Planning: In poetry, Claude picks rhyming words before writing lines, showing long-horizon planning beyond next-word prediction.
- Custom Math Strategies: Claude solves problems using parallel strategies (approximation + precision), not by mimicking human methods.
- Misleading Reasoning: On hard questions, Claude may generate convincing but unfaithful explanations aligned with user hints.
- Default Refusal: Claude is wired to decline uncertain questions unless internal “known entity” signals override its refusal circuit.
- Jailbreak Vulnerability: Language pressure (e.g., sentence coherence) can override safety, leading to delayed refusals after harmful outputs.

💡 How It Helps:
- AI Safety: Identifies internal reasoning flaws and hallucination triggers.
- Alignment: Detects where model behavior diverges from claimed logic.
- Capabilities: Reveals advanced behaviors like multilingual reasoning and implicit goal setting.

🌟 Why It Matters:
This research shows that understanding how models think is possible—and essential. Interpretability tools like these offer a path toward building transparent, trustworthy, and aligned AI systems as capabilities continue to grow.

Read more: https://www.anthropic.com/research/tracing-thoughts-language-model

@AnthropicAI

Video Credit: Anthropic (@AnthropicAI on X)



https://video.twimg.com/ext_tw_video/1906698947013316608/pu/vid/avc1/960x720/Nl7qV9eR7B35Y7RF.mp4

3/5
@AINativeF
Google Launches TxGemma: Open Source AI Models for Drug Development

🔑 Key Details:
- New AI Models: Google releases TxGemma, a collection of open models in 2B, 9B, and 27B sizes, built on Google DeepMind's Gemma technology.
- Performance Boost: TxGemma outperforms previous Tx-LLM on 45 of 66 therapeutic tasks and beats specialized models on 26 of 50 tasks.
- Versatile Capabilities: Models handle classification, regression, and generation tasks across the drug development pipeline.
- Agentic-Tx System: New framework integrates TxGemma with 18 specialized tools for complex research problems.

💡 How It Helps:
- Pharmaceutical Researchers: Accelerates drug discovery with AI predictions that could reduce the 90% failure rate of drug candidates.
- Data Scientists: Easy fine-tuning capabilities through provided Colab notebooks to adapt models to proprietary therapeutic data.
- Clinical Trial Teams: Models can potentially predict adverse events in trials, improving safety assessment.
- Computational Chemists: Conversational AI interface explains reasoning behind molecular predictions.

🌟 Why It Matters:
TxGemma represents a significant advancement in applying AI to the traditionally slow and costly drug development process. By making these specialized models open source, Google democratizes access to powerful tools that could transform therapeutic research. The combination of prediction capabilities with conversational features creates a uniquely human-readable system for scientific discovery, potentially bridging the gap between computational predictions and human expertise in pharmaceutical development.

Read more: Introducing TxGemma: Open models to improve therapeutics development- Google Developers Blog

@Google

Video Credit: Google official website



https://video.twimg.com/ext_tw_video/1906698946849697792/pu/vid/avc1/1280x720/JG3sDraat1yNcNdv.mp4

4/5
@AINativeF
Elon Musk Consolidates Tech Empire: xAI Acquires X in $33B Deal

🔑 Key Details:
- All-Stock Transaction: xAI acquired X (formerly Twitter) in a deal valuing xAI at $80B and X at $33B ($45B less $12B debt).
- New Holding Company: Shares will be exchanged for shares in xAI Holdings Corp, combining both entities under one umbrella.
- Strategic Integration: Musk described the companies' futures as "intertwined," officially combining data, models, compute, distribution and talent.
- X Resurgence: Platform's valuation has risen recently, with Musk claiming over 600 million active users.

💡 How It Helps:
- AI Researchers: Access to X's vast user-generated content provides significant training data advantages for xAI.
- xAI Investors: Merger creates a more attractive combined entity that executives believe will make fundraising easier.
- Tech Strategists: Consolidation creates clearer competitive positioning against OpenAI, which Musk is actively challenging through lawsuits and takeover attempts.

🌟 Why It Matters:
This acquisition represents a significant consolidation of Musk's tech portfolio, strategically positioning xAI to leverage X's massive user base and data repository in the competitive AI landscape. The merger formalizes what was already a tight integration between the platforms and signals Musk's commitment to challenging established AI players like OpenAI, from which he has distanced himself despite being a co-founder.

Read more: Elon Musk says xAI acquired X | TechCrunch).%E2%80%9D

@xai

Video Credit: xAI official website



https://video.twimg.com/ext_tw_video/1906698946207989760/pu/vid/avc1/1280x720/6Oi_gFtxDXL4PvqY.mp4

5/5
@AINativeF
If you found this helpful, follow us @AINativeF for more insights.

A like or share on the first tweet would mean a lot—thank you for your support!

Image Credit: Flux



GnX0lrOWkAAfO_V.jpg



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,764
Reputation
9,318
Daps
169,650
[New Model] University of Hong Kong releases Dream 7B (Diffusion reasoning model). Highest performing open-source diffusion model to date. You can adjust the number of diffusion timesteps for speed vs accuracy



Posted on Wed Apr 2 17:04:49 2025 UTC

6k40aa57dgse1.png

bf2vz1xudgse1.png

ha6vcyawdgse1.png



Commented on Wed Apr 2 17:22:56 2025 UTC

It's fascinating watching it generate text:

https://i.redd.it/xci0dlo7hgse1.gif
xci0dlo7hgse1.gif


│ Commented on Wed Apr 2 17:52:06 2025 UTC

│ What the actual fukk…

│ │
│ │
│ │ Commented on Wed Apr 2 18:35:15 2025 UTC
│ │
│ │
│ │
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
61,764
Reputation
9,318
Daps
169,650
Gemini 2.5 Pro is a coding GENIUS



Channel Info Matthew Berman Subscribers: 444K subscribers

Description
Join My Newsletter for Regular AI Updates 👇🏼

My Links 🔗
👉🏻 Subscribe: Matthew Berman
👉🏻 Twitter: https://twitter.com/matthewberman
👉🏻 Discord: Join the Forward Future AI Discord Server!
👉🏻 Patreon: Get more from Matthew Berman on Patreon
👉🏻 Instagram: https://www.instagram.com/matthewberman_ai
👉🏻 Threads: https://www.threads.net/@matthewberman_ai
👉🏻 LinkedIn: Forward Future | LinkedIn

Media/Sponsorship Inquiries ✅

Timestamps (made with Gemini 2.5 pro!):
0:00 Office Simulation
1:02 Hand Drawing to Web App - AI Studio Recreation
1:18 Gemini 2.5 Pro Free Rollout Announcement
2:30 YouTube Timestamps Generation Use Case
3:22 AI Model IQ Test Results Chart
4:05 Blender Logo Generation
4:52 Personal Intelligence Agency / News Briefing System
5:54 Liquid Metal Shader Recreation
6:42 Vibe Jet Flight Simulator Creation
7:54 Spinning Hexagon Bouncing Balls Animation Comparison
8:28 Physics Simulation - Solenoid / Electromagnetism
9:16 Physics Simulation - General Relativity
9:49 Drawing to 3D Print - Birthday Cake Toy
11:14 3D Flappy Bird Game Creation
11:54 Swift UI Drawing App Creation
12:26 Galaga Game Creation

Links:













 
Top