The A.I Megathread (LLM , GPT , Development)

bnew · Sep 29, 2023

RealFill: Reference-Driven Generation for Authentic Image Completion

realfill.github.io

RealFill

Reference-Driven Generation for Authentic Image Completion

Luming Tang1,2, Nataniel Ruiz1, Qinghao Chu1, Yuanzhen Li1, Aleksander Holynski1, David E. Jacobs1,
Bharath Hariharan2, Yael Pritch1, Neal Wadhwa1, Kfir Aberman1, Michael Rubinstein1

1Google Research, 2Cornell University
arXiv

RealFill is able to complete the image with what should have been there.

Abstract

Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions, but the content these models hallucinate is necessarily inauthentic, since the models lack sufficient context about the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin.

Method

Authentic Image Completion: Given a few reference images (up to five) and one target image that captures roughly the same scene (but in a different arrangement or appearance), we aim to fill missing regions of the target image with high-quality image content that is faithful to the originally captured scene. Note that for the sake of practical benefit, we focus particularly on the more challenging, unconstrained setting in which the target and reference images may have very different viewpoints, environmental conditions, camera apertures, image styles, or even moving objects.

RealFill: For a given scene, we first create a personalized generative model by fine-tuning a pre-trained inpainting diffusion model on the reference and target images. This fine-tuning process is designed such that the adapted model not only maintains a good image prior, but also learns the contents, lighting, and style of the scene in the input images. We then use this fine-tuned model to fill the missing regions in the target image through a standard diffusion sampling process.

Results

Given the reference images on the left, RealFill is able to either uncrop or inpaint the target image on the right, resulting in high-quality images that are both visually compelling and also faithful to the references, even when there are large differences between references and targets including viewpoint, aperture, lighting, image style, and object motion.

Comparison with Baselines

A comparison of RealFill and baseline methods. Transparent white masks are overlayed on the unaltered known regions of the target images.

Paint-by-Example does not achieve high scene fidelity because it relies on CLIP embeddings, which only capture high-level semantic information.
Stable Diffusion Inpainting produces plausible results, they are inconsistent with the reference images because prompts have limited expressiveness.

In contrast, RealFill generates high-quality results that have high fidelity with respect to the reference images.

Limitations

RealFill needs to go through a gradient-based fine-tuning process on input images, rendering it relatively slow.
When viewpoint change between reference and target images is very large, RealFill tends to fail at recovering the 3D scene, especially when there is only a single reference image.
Because RealFill mainly relies on the image prior inherited from the base pre-trained model, it also fails to handle cases where that are challenging for the base model, such as text for Stable Diffusion.

Acknowledgements

We would like to thank Rundi Wu, Qianqian Wang, Viraj Shah, Ethan Weber, Zhengqi Li, Kyle Genova, Boyang Deng, Maya Goldenberg, Noah Snavely, Ben Poole, Ben Mildenhall, Alex Rav-Acha, Pratul Srinivasan, Dor Verbin and Jon Barron for their valuable discussion and feedbacks, and thank Zeya Peng, Rundi Wu, Shan Nan for their contribution to the evaluation dataset. A special thanks to Jason Baldridge, Kihyuk Sohn, Kathy Meier-Hellstern, and Nicole Brichtova for their feedback and support for the project.

RealFill: Image completion using diffusion models | Hacker News

news.ycombinator.com

bnew · Sep 30, 2023

Navigating the Jagged Technological Frontier | Digital Data Design Institute at Harvard

In collaboration with Boston Consulting Group (BCG), new research from Digital Data Design Institute at Harvard chair and co-founder Karim Lakhani and In collaboration with Boston Consulting Group (BCG), new research from Digital Data Design Institute at Harvard chair and co-founder Karim...

d3.harvard.edu

Navigating the Jagged Technological Frontier

Industry 4.0

Written by

D^3 Faculty

In collaboration with Boston Consulting Group (BCG), new research from Digital Data Design Institute at Harvard chair and co-founder Karim Lakhani and others explores field experimental evidence of the effects of AI on knowledge worker productivity and quality. It involved evaluating the performance of 758 consultants, which make up 7% of the individual contributor workforce of the company. The tasks spanned a consultant’s daily work, including creativity, analytical thinking, writing proficiency, and persuasiveness.

Key Findings

For tasks within the AI frontier, ChatGPT-4 significantly increased performance, boosting speed by over 25%, human-rated performance by over 40%, and task completion by over 12%.
The study introduces the concept of a “jagged technological frontier,” where AI excels in some tasks but falls short in others.
Two distinct patterns of AI use emerged: “Centaurs,” who divided and delegated tasks between themselves and the AI, and “Cyborgs,” who integrated their workflow with the AI.

Shifting the Debate

The paper argues that the focus should move beyond the binary decision of adopting or not adopting AI. Instead, we should evaluate the value of different configurations and combinations of humans and AI for various tasks within the knowledge workflow.

bnew · Sep 30, 2023

[2309.08674] Fake News Detectors are Biased against Texts Generated by Large Language Models

Fake News Detectors are Biased against Texts Generated by Large Language Models

Jinyan Su, Terry Yue Zhuo, Jonibek Mansurov, Di Wang, Preslav Nakov

The spread of fake news has emerged as a critical challenge, undermining trust and posing threats to society. In the era of Large Language Models (LLMs), the capability to generate believable fake content has intensified these concerns. In this study, we present a novel paradigm to evaluate fake news detectors in scenarios involving both human-written and LLM-generated misinformation. Intriguingly, our findings reveal a significant bias in many existing detectors: they are more prone to flagging LLM-generated content as fake news while often misclassifying human-written fake news as genuine. This unexpected bias appears to arise from distinct linguistic patterns inherent to LLM outputs. To address this, we introduce a mitigation strategy that leverages adversarial training with LLM-paraphrased genuine news. The resulting model yielded marked improvements in detection accuracy for both human and LLM-generated news. To further catalyze research in this domain, we release two comprehensive datasets, \texttt{GossipCop++} and \texttt{PolitiFact++}, thus amalgamating human-validated articles with LLM-generated fake and real news.

Comments:	The first two authors contributed equally
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2309.08674 [cs.CL]
	(or arXiv:2309.08674v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2309.08674 Focus to learn more

Submission history

From: Terry Yue Zhuo [view email]
[v1] Fri, 15 Sep 2023 18:04:40 UTC (9,845 KB)

https://arxiv.org/pdf/2309.08674.pdf

bnew · Sep 30, 2023

Weights & Biases

stability.wandb.io

StableLM-3B-4E1T

Technical report for StableLM-3B-4E1T
Jonathan Tow, Marco Bellagente, Dakota Mahan, Carlos Riquelme Ruiz

Comment
Model ArchitectureTraining DataTraining ProcedureDownstream ResultsSystem DetailsConclusionAcknowledgmentsReferences

StableLM-3B-4E1T is a 3 billion (3B) parameter language model pre-trained under the multi-epoch regime to study the impact of repeated tokens on downstream performance. Given prior success in this area (https://arxiv.org/pdf/2205.05131.pdfTaylor et al., 2022 and Tay et al., 2023), we train on 1 trillion (1T) tokens for 4 epochs following the observations of Muennighoff et al. (2023) in "Scaling Data-Constrained Language Models" in which they find "training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data." Further inspiration for the token count is taken from "Go smol or go home" (De Vries, 2023), which suggests a 2.96B model trained for 2.85 trillion tokens achieves a similar loss to a Chinchilla compute-optimal 9.87B language model (k_n =0.3kn=0.3).https://github.com/orgs/Stability-AI/projects/8?pane=issue&itemId=36926940

Model Architecture

Checkpoint: stabilityai/stablelm-3b-4e1t

The model is a decoder-only transformer similar to the LLaMA (Touvron et al., 2023) https://arxiv.org/abs/2307.09288architecture with the following modifications:

Parameters	Hidden Size	Layers	Heads	Sequence Length
2,795,443,200	2560	32	32	4096

- Position Embeddings: Rotary Position Embeddings (Su et al., 2021) applied to the first 25% of head embedding dimensions for improved throughput following Black et al. (2022).
- Normalization: LayerNorm (Ba et al., 2016) with learned bias terms as opposed to RMSNorm (Zhang & Sennrich, 2019).
- Tokenizer: GPT-NeoX (Black et al., 2022).

Training Data

The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the HuggingFace Hub: Falcon RefinedWeb extract (Penedo et al., 2023), RedPajama-Data (Together Computer, 2023) and The Pile (Gao et al., 2020), both without the Books3 subset, and StarCoder (Li et al., 2023). The complete list is provided in Table 1.

Table 1: Open-source datasets used for multi-epoch training. Note that the total token count does not account for the reduced size after downsampling C4, Common Crawl (2023), and GitHub to obtain 1T tokens.

Given the large amount of web data, we recommend fine-tuning the base StableLM-3B-4E1T for your downstream tasks.

Training Procedure

The model is trained for 972k steps in bfloat16 precision with a global context length of 4096 instead of the multi-stage ramp-up from 2048-to-4096 as done for StableLM-Alpha v2. The batch size is set to 1024 (4,194,304 tokens). We optimize with AdamW (Loshchilov and Hutter, 2017) and use linear warmup for the first 4.8k steps, followed by a cosine decay schedule to 4% of the peak learning rate. Early instabilities are attributed to extended periods in high learning rate regions. We do not incorporate dropout (Srivastava et al., 2014) due to the model's relatively small size. Detailed hyperparameters are provided in the model config here.

During training, we evaluate natural language benchmarks and observe steady improvements over the course of training until the tail end of the learning rate decay schedule. For this reason, we decided to linearly cool down the learning rate towards 0, similar to Zhai et al. (2021), in hopes of squeezing out performance. We plan to explore alternative schedules in future work.

Furthermore, our initial stage of pre-training relies on the flash-attention API (Tri Dao, 2023) with its out-of-the-box triangular causal masking support. This forces the model to attend similarly to different documents in a packed sequence. In the cool-down stage, we instead reset position IDs and attention masks at EOD tokens for all packed sequences after empirically observing improved sample quality (read: less repetition) in a concurrent experiment. We hypothesize that this late adjustment leads to the notable degradation in byte-length normalized accuracies of Arc Easy (Clark et al., 2018) and SciQ (Welbl et al., 2017).

Figure 1: Toy demonstration of attention mask resetting.

Data composition was modified during the cool-down. Specifically, we remove Ubuntu IRC, OpenWebText, HackerNews, and FreeLaw for quality control and further NSFW filtering while upsampling C4. The distribution shift is likely responsible for the increased loss (+0.02 nats) from the initial stage.

See the plots below for validation dynamics across our hold-out set and common NLP benchmarks.

Note: The released checkpoint is taken from step 970k according to validation loss and average downstream performance.

Downstream Results

The following zero-shot evaluations are performed with EleutherAI's lm-evaluation-harness using the lm-bench branch of Stability AI's fork.

Table 2: Zero-shot performance across popular language modeling and common sense reasoning benchmarks. lm-eval results JSONs can be found in the evals directory of the StableLM repo.

StableLM-3B-4E1T achieves state-of-the-art performance (September 2023) at the 3B parameter scale for open-source models and is competitive with many of the popular contemporary 7B models, even outperforming our most recent 7B StableLM-Base-Alpha-v2.

System Details

- Hardware: StableLM-3B-4E1T was trained on the Stability AI cluster across 256 NVIDIA A100 40GB GPUs (AWS P4d instances). Training began on August 23, 2023, and took approximately 30 days to complete.
- Software: We use a fork of gpt-neox (EleutherAI, 2021), train under 2D parallelism (Data and Tensor Parallel) with ZeRO-1 (Rajbhandari et al., 2019), and rely on flash-attention as well as SwiGLU and Rotary Embedding kernels from FlashAttention-2 (Dao et al., 2023).

Note: TFLOPs are estimated using GPT-NeoX's get_flops function.

Weights & Biases

stability.wandb.io

Conclusion

StableLM-3B-4E1T provides further evidence for the claims in Muennighoff et al. (2023) at the trillion token scale, suggesting multi-epoch training as a valid approach to improving downstream performance when working under data constraints.

Acknowledgments

We thank our MLOp team members, Richard Vencu and Sami Kama, for 30 days of uninterrupted pre-training; Reshinth Adithyan, James Baicoianu, Nathan Cooper, Christian Laforte, Nikhil Pinnaparaju, and Enrico Shippole, for fruitful discussions and guidance.

bnew · Sep 30, 2023

AI language models can exceed PNG and FLAC in lossless compression, says study

Is compression equivalent to general intelligence? DeepMind digs up more potential clues.

arstechnica.com

AI language models can exceed PNG and FLAC in lossless compression, says study

Is compression equivalent to general intelligence? DeepMind digs up more potential clues.

BENJ EDWARDS - 9/28/2023, 11:43 AM

Enlarge
Getty Images
83WITH

READER COMMENTS

83WITH
BENJ EDWARDSBenj Edwards is an AI and Machine Learning Reporter for Ars Technica. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Promoted Comments

redleader

But what about decompression rate? FLAC has always been noteworthy for being an asymmetrical codec which takes more computational power to compress than to decompress (potentially a lot more, depending on the settings used). If this new AI codec requires a lot of number crunching to decode, it may not be such a big win in all situations.

In terms of a practical format, FLAC/PNG are designed to be incredibly fast and lightweight because they have to be integrated into mobile devices, web browsers, etc without consuming huge amounts of memory and power. For example, FLAC is designed to be able to decode CD audio losslessly in realtime on DSP cores with single-digit MHz and tens of kilobytes of RAM while using the absolute lowest amount of battery. I'm not sure how much memory Chinchilla 70B requires, but seeing as the model has 70 billion parameters, I suspect it will not fit into 64 KB of memory on a low power embedded audio device.
September 28, 2023 at 4:25 pm

bnew · Sep 30, 2023

GitHub - RahulSChand/gpu_poor: Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization

Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization - RahulSChand/gpu_poor

github.com

Can my GPU run this LLM?

Calculate how much GPU memory you need & breakdown of where it goes for training/inference of any LLM model with quantization (GGML/bitsandbytes), inference frameworks (vLLM/llama.cpp/HF) & QLoRA.

Link: LLM memory check

Purpose

I made this to check if you can run a particular LLM on your GPU. Useful to figure out the following

What quantization I should use to fit any model on my GPU?
What max context length my GPU can handle?
What kind of finetuning can I do? Full? LoRA? QLoRA?
What max batch size I can use during finetuning?
What is consuming my GPU memory? What should I change to fit the LLM on my GPU?

The output is the total vRAM & the breakdown of where the vRAM goes (in MB). It looks like below

{
"Total": 4000,
"KV Cache": 1000,
"Model Size": 2000,
"Activation Memory": 500,
"Grad & Optimizer memory": 0,
"cuda + other overhead": 500
}

Can't we just look at the model size & figure this out?

Finding which LLMs your GPU can handle isn't as easy as looking at the model size because during inference (KV cache) takes susbtantial amount of memory. For example, with sequence length 1000 on llama-2-7b it takes 1GB of extra memory (using hugginface LlamaForCausalLM, with exLlama & vLLM this is 500MB). And during training both KV cache & activations & quantization overhead take a lot of memory. For example, llama-7b with bnb int8 quant is of size ~7.5GB but it isn't possible to finetune it using LoRA on data with 1000 context length even with RTX 4090 24 GB. Which means an additional 16GB memory goes into quant overheads, activations & grad memory.

How to use

Model Name/ID/Size

You can either enter the model id of a huggingface model (e.g. meta-llama/Llama-2-7b). Currently I have hardcoded & saved model configs of top 3k most downlaoded LLMs on huggingface.
If you have a custom model or your hugginface id isn't available then you can either upload a json config (example) or just enter your model size (e.g. 7 billion for llama-2-7b)

Options

Inference: Find vRAM for inference using either HuggingFace implementation or vLLM or GGML
Training : Find vRAM for either full model finetuning or finetuning using LoRA (currently I have hardcoded r=8 for LoRA config) or using QLoRA.

Quantization

Currently it supports: bitsandbytes (bnb) int8/int4 & GGML (QK_8, QK_6, QK_5, QK_4, QK_2). The latter are only for inference while bnb int8/int4 can be used for both training & inference

Context Len/Sequence Length

What is the length of your prompt+new maximum tokens generated. Or for training this is the sequence length of your training data. Batch sizes are 1 for inference & can be specified for training. The option to specify batch sizes for inference needs to be added.

How reliable are the numbers?

The results can vary depending on your model, input data, cuda version & what quant you are using & it is impossible to predict exact values. I have tried to take these into account & make sure the results are within 500MB. Below table I cross-check 3b,7b & 13b model memories given by the website vs. what what I get on my RTX 4090 & 2060 GPUs. All values are within 500MB.

How are the values calculated?

Total memory = model size + kv-cache + activation memory + optimizer/grad memory + cuda etc. overhead

Model size = this is your .bin file size (divide it by 2 if Q8 quant & by 4 if Q4 quant).
KV-Cache = Memory taken by KV (key-value) vectors. Size = (2 x sequence length x hidden size) per layer. For huggingface this (2 x 2 x sequence length x hidden size) per layer
Activation Memory = When you use LoRA even though your model params don't have grad their results still need to be stored to do backward through them (these take the most memory). There is no simple formula here, it depends on the implementation.
Optimizer/Grad memory = Memory taken by .grad tensors & tensors associated with the optimizer (running avg etc.)
Cuda etc. overhead = Around 500-1GB memory is taken by CUDA whenever cuda is loaded, this varies. Also there are additional overheads when you use any quantization (like bitsandbytes). Again no straightforward formula

Why are the results wrong?

Sometimes the answers might be very wrong in which case please open an issue here & I will try to fix it.

TODO

Add support for exLlama
Add QLora
Add way to measure approximste tokens/s you can get for a particular GPU
Improve logic to get hyper-params from size (since hidden layer/intermediate size/number of layers can vary for a particular size)
Add AWQ

LLM memory check

Web site created using create-react-app

rahulschand.github.io

GrudgeBooty · Sep 30, 2023

@bnew thanks for keeping this updated :salute:

Do you work in/with the AI field?

bnew · Sep 30, 2023

GrudgeBooty said:
@bnew thanks for keeping this updated

Do you work in/with the AI field?

nah, I just find it interesting like the dialup days of the internet. exciting times ahead :banderas:

bnew · Sep 30, 2023

Bye Bye Llama-2, Mistral 7B is Taking Over: Get Started With Mistral 7B Instruct Model

https://medium.com/@qendelai?source=post_page-----1504ff5f373c--------------------------------
Qendel AI
·
Follow
6 min read

https://medium.com/m/signin?actionU...-----------------post_audio_button-----------
“No LLM has been most popular > 2 months”, says Dr. M Waleed Kadous, a chief scientist at Anyscale, in his recent AI conference presentation.

Llama 2 has already been taking the Open Source LLM space by storm, but not anymore. Mistral AI, a small creative team, now open-sources a new model that beats Llama models.

As presented by the team, the Mistral 7B model

Outperforms Llama 2 13B on all benchmarks
Outperforms Llama 1 34B on many benchmarks
Approaches CodeLlama 7B performance on code, while remaining good at English tasks
Uses Grouped-query attention (GQA) for faster inference
Uses Sliding Window Attention (SWA) to handle longer sequences at a smaller cost

Here’s Mistral 7B model performance comparison with the Llama-2 family:

Mistral 7B model performance comparison with the Llama-2 family. Source.
Mistral 7B also has an instruct fine-tuned (chat) version, and it outperforms all 7B models on MT-Bench — as shown below:

Mistral 7B Instruct Model comparison on MT Bench. Source.
Now, let’s dive into the steps to get started with the Mistral 7B Instruct Model on Google Colab.

Getting Started With Mistral 7B Instruct Model

Step 1:

Install essential libraries: transformers, torch, accelerate, bitsandbytes, and langchain.

!pip install git+https://github.com/huggingface/transformers torch accelerate bitsandbytes langchain
Step 2:

Import libraries and set variables

Code:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

device = 'cuda' if torch.cuda.is_available() else 'cpu'

Step 3:

Download the Mistral 7B Instruct model and its tokenizer

In my case, I am downloading a 4-bit version of the model to fit on my limited Colab GPU, but you can download an 8-bit version or the full model as long as your machine can handle it.

GrudgeBooty · Sep 30, 2023

bnew said:
nah, I just find it interesting like the dialup days of the internet. exciting times ahead

Same! I run a few chatbots locally on my PC, so I'm interested what could happen with Chatbots and Augmented and Virtual Reality. Reminds me of Star Trek's Holodeck

bnew · Oct 1, 2023

bnew · Oct 1, 2023

https://archive.ph/maYCJ

https://archive.ph/jlIE0

https://archive.ph/7zIs1

https://archive.ph/KDadL

https://archive.ph/3xydS

bnew · Oct 1, 2023

https://archive.ph/8Z3pv

bnew · Oct 1, 2023

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

Large language models (LLMs) are used to generate content for a wide range of tasks, and are set to reach a growing audience in coming years due to integration in product interfaces like ChatGPT or search engines like Bing. This intensifies the need to ensure that models are aligned with human preferences and do not produce unsafe, inaccurate or toxic outputs. While alignment techniques like reinforcement learning with human feedback (RLHF) and red-teaming can mitigate some safety concerns and improve model capabilities, it is unlikely that an aggregate fine-tuning process can adequately represent the full range of users' preferences and values. Different people may legitimately disagree on their preferences for language and conversational norms, as well as on values or ideologies which guide their communication. Personalising LLMs through micro-level preference learning processes may result in models that are better aligned with each user. However, there are several normative challenges in defining the bounds of a societally-acceptable and safe degree of personalisation. In this paper, we ask how, and in what ways, LLMs should be personalised. First, we review literature on current paradigms for aligning LLMs with human feedback, and identify issues including (i) a lack of clarity regarding what alignment means; (ii) a tendency of technology providers to prescribe definitions of inherently subjective preferences and values; and (iii) a 'tyranny of the crowdworker', exacerbated by a lack of documentation in who we are really aligning to. Second, we present a taxonomy of benefits and risks associated with personalised LLMs, for individuals and society at large. Finally, we propose a three-tiered policy framework that allows users to experience the benefits of personalised alignment, while restraining unsafe and undesirable LLM-behaviours within (supra-)national and organisational bounds.

Comments:	19 pages, 1 table
Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY)
Cite as:	arXiv:2303.05453 [cs.CL]
	(or arXiv:2303.05453v1 [cs.CL] for this version)
	[2303.05453] Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback Focus to learn more

Submission history

From: Hannah Rose Kirk Miss [view email]
[v1] Thu, 9 Mar 2023 17:52:07 UTC (414 KB)

https://arxiv.org/pdf/2303.05453.pdf

bnew · Oct 1, 2023

https://archive.ph/83eNQ

The A.I Megathread (LLM , GPT , Development)

Veteran

RealFill​

Reference-Driven Generation for Authentic Image Completion​

RealFill is able to complete the image with what should have been there.​

Abstract​

Method​

Results​

Comparison with Baselines​

Limitations​

Acknowledgements​

Veteran

Navigating the Jagged Technological Frontier​

Key Findings​

Shifting the Debate​

Veteran

Fake News Detectors are Biased against Texts Generated by Large Language Models​

Submission history​

Veteran

StableLM-3B-4E1T​

Model Architecture​

Training Data​

Training Procedure​

Downstream Results​

System Details​

Conclusion​

Acknowledgments​

Veteran

AI language models can exceed PNG and FLAC in lossless compression, says study​

Is compression equivalent to general intelligence? DeepMind digs up more potential clues.​

FURTHER READING​

FURTHER READING​

READER COMMENTS​

Promoted Comments​

Veteran

Can my GPU run this LLM?​

Purpose​

Can't we just look at the model size & figure this out?​

How to use​

Model Name/ID/Size​

Options​

Quantization​

Context Len/Sequence Length​

How reliable are the numbers?​

How are the values calculated?​

Why are the results wrong?​

TODO​

Rookie

Veteran

Veteran

Bye Bye Llama-2, Mistral 7B is Taking Over: Get Started With Mistral 7B Instruct Model​

Getting Started With Mistral 7B Instruct Model​

Rookie

Veteran

Veteran

Veteran

Veteran

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback​

Submission history​

Veteran

RealFill

Reference-Driven Generation for Authentic Image Completion

RealFill is able to complete the image with what should have been there.

Abstract

Method

Results

Comparison with Baselines

Limitations

Acknowledgements

Navigating the Jagged Technological Frontier

Key Findings

Shifting the Debate

Fake News Detectors are Biased against Texts Generated by Large Language Models

Submission history

StableLM-3B-4E1T

Model Architecture

Training Data

Training Procedure

Downstream Results

System Details

Conclusion

Acknowledgments

AI language models can exceed PNG and FLAC in lossless compression, says study

Is compression equivalent to general intelligence? DeepMind digs up more potential clues.

FURTHER READING

FURTHER READING

READER COMMENTS

Promoted Comments

Can my GPU run this LLM?

Purpose

Can't we just look at the model size & figure this out?

How to use

Model Name/ID/Size

Options

Quantization

Context Len/Sequence Length

How reliable are the numbers?

How are the values calculated?

Why are the results wrong?

TODO

Bye Bye Llama-2, Mistral 7B is Taking Over: Get Started With Mistral 7B Instruct Model

Getting Started With Mistral 7B Instruct Model

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

Submission history