bnew

Veteran
Joined
Nov 1, 2015
Messages
57,333
Reputation
8,496
Daps
159,998

LLMpedia
A collection of research papers on Language Models curated by the GPT maestro itself.
Every week dozens of papers are published on Language Models. It is impossible to keep up with the latest research. That's why we created LLMpedia, a collection of papers on Language Models curated by the GPT maestro itself.

Each week GPT will sweep through the latest LLM related papers and select the most interesting ones. The maestro will then summarize the papers and provide its own analysis, including a novelty, technical depth and readability score. We hope you enjoy this collection and find it useful.

If you have any questions, head to the Chat section and consult the GPT maestro.

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,333
Reputation
8,496
Daps
159,998

Computer Science > Computation and Language​

[Submitted on 17 Oct 2023]

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection​

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi
Despite their remarkable capabilities, large language models (LLMs) often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad hoc approach that augments LMs with retrieval of relevant knowledge, decreases such issues. However, indiscriminately retrieving and incorporating a fixed number of retrieved passages, regardless of whether retrieval is necessary, or passages are relevant, diminishes LM versatility or can lead to unhelpful response generation. We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM's quality and factuality through retrieval and self-reflection. Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. Experiments show that Self-RAG (7B and 13B parameters) significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks. Specifically, Self-RAG outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models.
Comments:30 pages, 2 figures, 12 tables
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:arXiv:2310.11511 [cs.CL]
(or arXiv:2310.11511v1 [cs.CL] for this version)

https://arxiv.org/pdf/2310.11511.pdf
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,333
Reputation
8,496
Daps
159,998

Help | Advanced Search
All fields Title Author Abstract Comments Journal reference ACM classification MSC classification Report number arXiv identifier DOI ORCID arXiv author ID Help pages Full text
Search

Computer Science > Computer Vision and Pattern Recognition​

[Submitted on 6 Nov 2023]

CogVLM: Visual Expert for Pretrained Language Models​

Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, Jiazheng Xu, Bin Xu, Juanzi Li, Yuxiao Dong, Ming Ding, Jie Tang
We introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language model, CogVLM bridges the gap between the frozen pretrained language model and image encoder by a trainable visual expert module in the attention and FFN layers. As a result, CogVLM enables deep fusion of vision language features without sacrificing any performance on NLP tasks. CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and ranks the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., surpassing or matching PaLI-X 55B. Codes and checkpoints are available at this https URL.
Subjects:Computer Vision and Pattern Recognition (cs.CV)
Cite as:arXiv:2311.03079 [cs.CV]
(or arXiv:2311.03079v1 [cs.CV] for this version)
[2311.03079] CogVLM: Visual Expert for Pretrained Language Models
Focus to learn more

https://arxiv.org/pdf/2311.03079.pdf




R85JUT7.jpeg




pear_grounding.png

compare-min.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,333
Reputation
8,496
Daps
159,998
could you imagine what tech like this is gonna do for blind people "watching" movies or moving throughout the world?

the tech exists now, all we need is a better mobile processing power or more efficient software to run it on mobile devices.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
57,333
Reputation
8,496
Daps
159,998

About​

llama.cpp with BakLLaVA model describes what does it see

🍰 Bakllava Llama C++ Tutorial 🦙

Welcome to the delicious world of Bakllava Llama with C++! Follow these steps to get your code running and indulge in AI sweetness! 😋

🚨 Properly tested only with Apple silicon chip

youtube installation guide

similar relevant project: Be My Eyes" web app



 
Top