teknium/CollectiveCognition-v1.1-Mistral-7B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
TheBloke/dolphin-2.1-mistral-7B-GGUF · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
Collective Cognition v1.1 - Mistral 7B
Collective Cognition Logo
Model Description:
Collective Cognition v1.1 is a state-of-the-art model fine-tuned using the Mistral approach. This model is particularly notable for its performance, outperforming many 70B models on the TruthfulQA benchmark. This benchmark assesses models for common misconceptions, potentially indicating hallucination rates.
Special Features:
Quick Training: This model was trained in just 3 minutes on a single 4090 with a qlora, and competes with 70B scale Llama-2 Models at TruthfulQA.
Limited Data: Despite its exceptional performance, it was trained on only ONE HUNDRED data points, all of which were gathered from a platform reminiscent of ShareGPT.
Extreme TruthfulQA Benchmark: This model is competing strongly with top 70B models on the TruthfulQA benchmark despite the small dataset and qlora training!
image/png
Acknowledgements:
Special thanks to @a16z and all contributors to the Collective Cognition dataset for making the development of this model possible.
Dataset:
The model was trained using data from the Collective Cognition website. The efficacy of this dataset is demonstrated by the model's stellar performance, suggesting that further expansion of this dataset could yield even more promising results. The data is reminiscent of that collected from platforms like ShareGPT.
You can contribute to the growth of the dataset by sharing your own ChatGPT chats here.
You can download the datasets created by Collective Cognition here: CollectiveCognition (Collective Cognition)
Performance:
TruthfulQA: Collective Cognition v1.1 has notably outperformed various 70B models on the TruthfulQA benchmark, highlighting its ability to understand and rectify common misconceptions.
Usage:
Prompt Format:
USER: <prompt>
ASSISTANT:
OR
<system message>
USER: <prompt>
ASSISTANT:
Benchmarks:
Collective Cognition v1.0 TruthfulQA:
Code:
| Task |Version|Metric|Value | |Stderr|
|-------------|------:|------|-----:|---|-----:|
|truthfulqa_mc| 1|mc1 |0.4051|± |0.0172|
| | |mc2 |0.5738|± |0.0157|
Collective Cognition v1.1 GPT4All:
Code:
| Task |Version| Metric |Value | |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge| 0|acc |0.5085|± |0.0146|
| | |acc_norm|0.5384|± |0.0146|
|arc_easy | 0|acc |0.7963|± |0.0083|
| | |acc_norm|0.7668|± |0.0087|
|boolq | 1|acc |0.8495|± |0.0063|
|hella_swag | 0|acc |0.6399|± |0.0048|
| | |acc_norm|0.8247|± |0.0038|
|openbookqa | 0|acc |0.3240|± |0.0210|
| | |acc_norm|0.4540|± |0.0223|
|piqa | 0|acc |0.7992|± |0.0093|
| | |acc_norm|0.8107|± |0.0091|
winogrande | 0 acc 7348 ± 0124
Average: 71.13
AGIEval:
Code:
Task Version Metric Value ± Stderr
agieval_aqua_rat 0 acc 01929 ± 0248
acc_norm 02008 ± 0252
agieval_logiqa_en 0 acc 03134 ± 0182
acc_norm 03333 ± 0185
agieval_lsat_ar 0 acc 02217 ± 0275
acc_norm 02043 ± 0266
agieval_lsat_lr 0 acc 03412 ± 021
acc_norm 03216 ± 0207
agieval_lsat_rc 0 acc 04721 ± 0305
acc_norm 04201 ± 0301
agieval_sat_en 0 acc 06068 ± 0341
acc_norm 05777 ± 0345
agieval_sat_en_without_passage
acc -03932 ± -0341
acc_norm -03641 ± -0336
agieval_sat_math acc -02864 ± -0305
acc_norm -02636 ± -0298
Average: 33.57