bnew

Veteran
Joined
Nov 1, 2015
Messages
56,127
Reputation
8,239
Daps
157,828

XTTS-v2​


ⓍTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 6-second audio clip. There is no need for an excessive amount of training data that spans countless hours.

This is the same or similar model to what powers Coqui Studio and Coqui API.

Features​

  • Supports 16 languages.
  • Voice cloning with just a 6-second audio clip.
  • Emotion and style transfer by cloning.
  • Cross-language voice cloning.
  • Multi-lingual speech generation.
  • 24khz sampling rate.

Updates over XTTS-v1​

  • 2 new languages; Hungarian and Korean
  • Architectural improvements for speaker conditioning.
  • Enables the use of multiple speaker references and interpolation between speakers.
  • Stability improvements.
  • Better prosody and audio quality across the board.

Languages​

XTTS-v2 supports 16 languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu) and Korean (ko).

Stay tuned as we continue to add support for more languages. If you have any language requests, feel free to reach out!

Code​

The code-base supports inference and fine-tuning.

License​

This model is licensed under Coqui Public Model License. There's a lot that goes into a license for generative models, and you can read more of the origin story of CPML here.

About​

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

coqui.ai

🐸Coqui.ai News

  • 📣 ⓍTTSv2 is here with 16 languages and better performance across the board.
  • 📣 ⓍTTS fine-tuning code is out. Check the example recipes.
  • 📣 ⓍTTS can now stream with <200ms latency.
  • 📣 ⓍTTS, our production TTS model that can speak 13 languages, is released Blog Post, Demo, Docs
  • 📣 🐶Bark is now available for inference with unconstrained voice cloning. Docs
  • 📣 You can use ~1100 Fairseq models with 🐸TTS.
  • 📣 🐸TTS now supports 🐢Tortoise with faster inference. Docs
  • 📣 Coqui Studio API is landed on 🐸TTS. - Example
  • 📣 Coqui Studio API is live.
  • 📣 Voice generation with prompts - Prompt to Voice - is live on Coqui Studio!! - Blog Post
  • 📣 Voice generation with fusion - Voice fusion - is live on Coqui Studio.
  • 📣 Voice cloning is live on Coqui Studio.

🐸TTS is a library for advanced Text-to-Speech generation.
🚀 Pretrained models in +1100 languages.
🛠️ Tools for training new models and fine-tuning existing models in any language.
📚 Utilities for dataset analysis and curation.
demo:

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,127
Reputation
8,239
Daps
157,828





RoboVQA: Multimodal Long-Horizon Reasoning
for Robotics


Abstract— We present a scalable, bottom-up and intrinsically diverse data collection scheme that can be used for high-level reasoning with long and medium horizons and that has 2.2x higher throughput compared to traditional narrow top-down step-by-step collection. We collect realistic data by performing any user requests within the entirety of 3 office buildings and using multiple embodiments (robot, human, human with grasping tool). With this data, we show that models trained on all embodiments perform better than ones trained on the robot data only, even when evaluated solely on robot episodes. We explore the economics of collection costs and find that for a fixed budget it is beneficial to take advantage of the cheaper human collection along with robot collection. We release a large and highly diverse (29,520 unique instructions) dataset dubbed RoboVQA containing 829,502 (video, text) pairs for roboticsfocused visual question answering. We also demonstrate how evaluating real robot experiments with an intervention mechanism enables performing tasks to completion, making it deployable with human oversight even if imperfect while also providing a single performance metric. We demonstrate a single video-conditioned model named RoboVQA-VideoCoCa trained on our dataset that is capable of performing a variety of grounded high-level reasoning tasks in broad realistic settings with a cognitive intervention rate 46% lower than the zeroshot state of the art visual language model (VLM) baseline and is able to guide real robots through long-horizon tasks. The performance gap with zero-shot state-of-the-art models indicates that a lot of grounded data remains to be collected for real-world deployment, emphasizing the critical need for scalable data collection approaches. Finally, we show that video VLMs significantly outperform single-image VLMs with an average error rate reduction of 19% across all VQA tasks. Thanks to video conditioning and dataset diversity, the model can be used as general video value functions (e.g. success and affordance) in situations where actions needs to be recognized rather than states, expanding capabilities and environment understanding for robots. Data and videos are available at robovqa.github.io I

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,127
Reputation
8,239
Daps
157,828

Computer Science > Machine Learning​

[Submitted on 4 Nov 2023]

MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning​

Bingchang Liu, Chaoyu Chen, Cong Liao, Zi Gong, Huan Wang, Zhichao Lei, Ming Liang, Dajun Chen, Min Shen, Hailian Zhou, Hang Yu, Jianguo Li
Code LLMs have emerged as a specialized research field, with remarkable studies dedicated to enhancing model's coding capabilities through fine-tuning on pre-trained models. Previous fine-tuning approaches were typically tailored to specific downstream tasks or scenarios, which meant separate fine-tuning for each task, requiring extensive training resources and posing challenges in terms of deployment and maintenance. Furthermore, these approaches failed to leverage the inherent interconnectedness among different code-related tasks. To overcome these limitations, we present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. By incorporating various loss functions, we effectively address common challenges in multi-task learning, such as data imbalance, varying difficulty levels, and inconsistent convergence speeds. Extensive experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks. Moreover, MFTcoder offers efficient training capabilities, including efficient data tokenization modes and PEFT fine-tuning, resulting in significantly improved speed compared to traditional fine-tuning methods. MFTcoder seamlessly integrates with several mainstream open-source LLMs, such as CodeLLama and Qwen. Leveraging the CodeLLama foundation, our MFTcoder fine-tuned model, \textsc{CodeFuse-CodeLLama-34B}, achieves an impressive pass@1 score of 74.4\% on the HumaneEval benchmark, surpassing GPT-4 performance (67\%, zero-shot). MFTCoder is open-sourced at \url{this https URL}
Subjects:Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:arXiv:2311.02303 [cs.LG]
(or arXiv:2311.02303v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2311.02303
Focus to learn more


https://arxiv.org/pdf/2311.02303.pdf




About​

High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs

News

🔥🔥 [2023/11/07] MFTCoder Paper has been released on Arxiv, which discloses technique details of multi-task-fine-tuning.

🔥🔥 [2023/10/20] CodeFuse-QWen-14B has been released, achieving a pass@1 (greedy decoding) score of 48.8% on HumanEval, which gains 16% absolute improvement over the base model Qwen-14b

🔥🔥 [2023/09/27] CodeFuse-StarCoder-15B has been released, achieving a pass@1 (greedy decoding) score of 54.9% on HumanEval.

🔥🔥🔥 [2023/09/26]We are pleased to announce the release of the 4-bit quantized version of CodeFuse-CodeLlama-34B. Despite the quantization process, the model still achieves a remarkable 73.8% accuracy (greedy decoding) on the HumanEval pass@1 metric.

🔥🔥🔥 [2023/09/07]We released CodeFuse-CodeLlama-34B, which achieves the 74.4% Python Pass@1 (greedy decoding) and surpasses GPT4 (2023/03/15) and ChatGPT-3.5 on the HumanEval Benchmarks.

🔥🔥 [2023/08/26]We released MFTCoder which supports finetuning Code Llama, Llama, Llama2, StarCoder, ChatGLM2, CodeGeeX2, Qwen, and GPT-NeoX models with LoRA/QLoRA.

HumanEval Performance

Model​
HumanEval(Pass@1)​
Date​
CodeFuse-CodeLlama-34B
74.4%
2023/09​
CodeFuse-CodeLlama-34B-4bits
73.8%
2023/09​
WizardCoder-Python-34B-V1.0​
73.2%​
2023/08​
GPT-4(zero-shot)​
67.0%​
2023/03​
PanGu-Coder2 15B​
61.6%​
2023/08​
CodeFuse-StarCoder-15B
54.9%
2023/08​
CodeLlama-34b-Python​
53.7%​
2023/08​
CodeFuse-QWen-14B
48.8%
2023/10​
CodeLlama-34b​
48.8%​
2023/08​
GPT-3.5(zero-shot)​
48.1%​
2022/11​
OctoCoder​
46.2%​
2023/08​
StarCoder-15B​
33.6%​
2023/05​
QWen-14B​
32.3%​
2023/10​
 

Voice of Reason

Veteran
Joined
Jan 7, 2016
Messages
44,249
Reputation
258
Daps
125,441





RoboVQA: Multimodal Long-Horizon Reasoning

for Robotics


Abstract— We present a scalable, bottom-up and intrinsically diverse data collection scheme that can be used for high-level reasoning with long and medium horizons and that has 2.2x higher throughput compared to traditional narrow top-down step-by-step collection. We collect realistic data by performing any user requests within the entirety of 3 office buildings and using multiple embodiments (robot, human, human with grasping tool). With this data, we show that models trained on all embodiments perform better than ones trained on the robot data only, even when evaluated solely on robot episodes. We explore the economics of collection costs and find that for a fixed budget it is beneficial to take advantage of the cheaper human collection along with robot collection. We release a large and highly diverse (29,520 unique instructions) dataset dubbed RoboVQA containing 829,502 (video, text) pairs for roboticsfocused visual question answering. We also demonstrate how evaluating real robot experiments with an intervention mechanism enables performing tasks to completion, making it deployable with human oversight even if imperfect while also providing a single performance metric. We demonstrate a single video-conditioned model named RoboVQA-VideoCoCa trained on our dataset that is capable of performing a variety of grounded high-level reasoning tasks in broad realistic settings with a cognitive intervention rate 46% lower than the zeroshot state of the art visual language model (VLM) baseline and is able to guide real robots through long-horizon tasks. The performance gap with zero-shot state-of-the-art models indicates that a lot of grounded data remains to be collected for real-world deployment, emphasizing the critical need for scalable data collection approaches. Finally, we show that video VLMs significantly outperform single-image VLMs with an average error rate reduction of 19% across all VQA tasks. Thanks to video conditioning and dataset diversity, the model can be used as general video value functions (e.g. success and affordance) in situations where actions needs to be recognized rather than states, expanding capabilities and environment understanding for robots. Data and videos are available at robovqa.github.io I





This will be in every workplace within the next year.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,127
Reputation
8,239
Daps
157,828



How Much Does It Cost to Train a Large Language Model? A Guide

Machine learning is affecting every sector, and no one seems to have a clear idea about how much it costs to train a specialized LLM. This week at OpenAI Dev Day 2023, the company announced their model-building service for $2-3M minimum. This is a steep price to pay for a specialized model, and many are wondering, is it necessary?

The question of how much it costs to train an LLM is a really hard one, and while there’s not a straightforward, plug-and-chug cost calculation, the answer mainly depends on two factors: compute requirements and how long it takes to train.

To help provide clarity on how to estimate the cost of training an LLM, I’ve compiled a structured overview of the different levers that affect model training time and compute requirements.

Note that this article does not include costs of:
- Development and operation (eng salaries, debugging, IDEs, version control systems, tooling to monitor model performance, infrastructure set-up (try Brev lol)
using more optimized ML libraries / APIs (decreasing cost)
- Code licensing / legal considerations
- Data privacy/security & regulatory compliance
- Model bias & fairness assessments / ethical reviews
- Adversarial training (to protect against adversarial attacks) & other security measures
- Deployment in a production environment

The four main variables to consider when determining compute requirements and training time are model architecture, training dynamics, and methods for optimizing training performance. First, however, we should learn a bit about the hardware these models fit on, so we understand the context of where these variables fit.

1. Hardware Costs

This refers to access to GPUs and their associated cost, and GPU memory tends to the bottleneck. This is how much “stuff” (model, parameters, etc.) the GPU is able to hold in memory at one time. Something we’ve noticed is that most people think they need an expensive, highly elusive A100 or H100 with 40GB or 80GB of GPU memory. However, something smaller, cheaper, and more available may suffice.

I’ve released a few guides on fine-tuning (Mistral on HF dataset, Mistral on own dataset, Llama on own dataset). In these guides, I used QLoRA with 4-bit quantization and LoRA on all linear layers, reducing the trainable parameters by LoRA 98%. As a result, I was able to train these models on a single A10G (24GB of GPU Memory, and only $1/hr on Brev, which provides cloud GPUs without vendor lock-in across cloud providers, like AWS, GCP, and Lambda Labs). Training on my own dataset took about 10 minutes for 500 iterations over 200 samples, and training on the HF dataset took about an hour for 6,000 samples and 1000 iterations. These models would likely not be production-grade; I am just providing these values as base references.

Cloud provider costs and the choice between spot and reserved instances are direct cost factors. If using cloud GPUs, different providers and regions can have vastly different pricing. Spot instances are cheaper but less reliable as you may lose them while training, while reserved instances cost more but ensure availability.

2. Model Architecture

a. Size and Structure
The depth (number of layers), width (neurons per layer), and the total number of parameters affect both GPU memory requirements and training time. A model with more and/or wider layers has the capacity to learn more complex features, but at the expense of increased computational demand. Increasing the total number of parameters to train increases the estimated time to train and the GPU memory requirements. Techniques like low-rank matrix factorization (e.g., LoRA) and sparsity, where tensors are pruned to have a high number of 0 values, can reduce the number of trainable parameters and mitigate these costs, but they require careful tuning. Sparsity is often done in transformer attention mechanisms (see below) or in weights (as in block-sparse models).

b. Attention Mechanisms
Transformers leverage self-attention mechanisms, with multiple heads attending to different sequence parts, enhancing learning at the cost of increased computation. The traditional Transformer attention style compares every token in the context window with every other token, leading to memory requirements that are quadratic in the size of the context window, O(n^2). Sparse attention models offer a compromise by focusing on a subset of positions, for example with local (nearby) attention, thereby reducing computational load, often down to O(n • sqrt(n)).

c. Efficiency Optimizations
Choices of activation functions and gating mechanisms can impact computational intensity and training time. Different activation functions have varying levels of mathematical complexity; ReLU, for example, is less complex than sigmoid or tanh. Additionally, parameter sharing, for example weight sharing across layers, can reduce the number of unique parameters and hence memory requirements.

3. Training Dynamics

a. Learning Rate and Batch Size
Learning rate and batch size significantly influence the model's training speed and stability. The learning rate of a model affects the step size it takes in the opposite direction of the gradient (i.e. the direction towards minimizing the cost or loss function). This is called gradient descent. The batch size is the number of samples processed before the model’s parameters are updated. It is true that the larger your batch, the more memory you need; it scales linearly with the size of the batch. However, a larger batch size can lead to faster convergence because at each step, you get better estimates of the true gradient.

One subtlety to consider: Even if you had a terabyte of GPU memory, you still may not want to use the largest batch size possible. Downsampling (i.e. using a smaller batch size than the total number of training samples) introduces noise into the gradient, which can help you avoid local minima. That’s why it’s called stochastic gradient descent: the stochasticity refers to how much you’re downsampling from your training set in each batch.

The learning rate's size (magnitude) and schedule (rate of change over training) can affect the speed and stability of convergence. A higher learning rate means the model takes bigger steps during gradient descent. While this can speed up convergence, it can also lead to overshooting minima and potentially unstable training. Conversely, a learning rate that is too small can slow down convergence (as getting to a minimum takes longer), and the model may get stuck in local minima. See the drawing below for an example of local vs. global minima. In simple terms, a local minimum that is not equal to the global minimum is a location on the graph where it seems like the optimal loss has been found, but we had just gone a little further - up a hill and dealing with some worse performance to get there - we could have found a better place in the graph.

b. Precision and Quantization
The precision of calculations, like FP16 versus FP32 - using 16 bits to represent each floating point versus 32 - and techniques such as quantization balance memory usage with performance trade-offs. Using half-precision (FP16) instead of single-precision (FP32) floating points cuts the tensor sizes in half, which can save memory and speed up training by enabling faster operations and more parallelization. However, this comes with a trade-off in precision, which can lead to potential numerical errors, like overflow/underflow errors, as fewer bits can’t represent as large or as small numbers. It can also reduce accuracy, but if not too extreme, it can serve as a form of regularization, reducing overfitting and allowing the model to actually perform better on the held-out dataset. Another technique is to use mixed precision training, where some floating points are FP16 and some are FP32. Determining which matrices should be represented as FP16 vs. FP32 may take some experimentation, however, which is also a cost consideration.

Quantization is another technique that maps high-precision floating points to lower-precision values, usually 8- or even 4-bit fixed-point representation integers. This reduce tensor sizes by 75% or even 87.5%, but usually results in a significant reduction in model accuracy; as mentioned before, though, it may actually help the model generalize better, so experimentation may be worthwhile.

c. Hyperparameter Sweeps
Hyperparameters are external configuration variables for machine learning models, i.e. they aren’t learned by the model itself, like weights are. Hyperparameters are basically all the variables we discussed here: learning rate, model architecture like number of neurons or layers, attention mechanisms, etc. Hyperparameter sweeps are when experiments are run training different models with combinations of various hyperparameter settings, and they enable a model to find the best possible combinations of hyperparameter values for its specific dataset and task. However, it is computationally expensive, as you must train many models to find the best configuration.

d. Checkpointing/Early Stopping
Frequent model state saving (checkpointing) can increase disk usage but provides more rollback points; if a model overfits or performs better at an earlier state in training, you can have those weights saved at a checkpoint and load that model. Early stopping is a method where one stops model training after it ceases to improve on the held out validations set. This can save training time.

4. Optimizing Training Performance
a. Base Model State
Starting with a pre-trained model, especially one that is trained in a task similar to the new task being trained, can significantly reduce training time. If the initial weights are closer to the optimal solution’s weights, training can be faster. Building a model from scratch - i.e. with randomized initial weight matrices or similar - takes significantly more compute and is usually not advised.

b. Parallelism and Distributed Training
Parallel computing is usually done with one computer that has multiple processors, which execute multiple tasks simultaneously for increased efficiency. Distributed computing involves several machines (that can be physically distant) working on divided tasks and then combining their results. Usually these two techniques are used together.

Parallelism can speed up training but adds complexity and compute requirements. There are various parallelization methods, like pipeline model parallelization, where models are split into different stages and distributed across GPUs, and data parallelization, where the dataset is divided across GPUs. Distributed training can be more efficient but requires more compute resources and adds complexity.

c. Data Considerations
How quickly the training data can be fed from storage into the model can affect training time. Some variables to consider:

- Where is the GPU located? Transferring your own data to cloud machines in more remote regions may take longer
- Machine I/O bandwidth affects time to transfer between storage and GPU
- Data caching, pre-fetching, and parallel loading on the GPU can decrease this time

Additionally, more complex data might take the model longer to learn the patterns, i.e. loss convergence time may increase.

The relevance and quality of training data also have a profound effect on training efficiency. Preprocessing and augmentation can improve outcomes but may increase the computational overhead.

5. Conclusion

I hope this helps to understand the complexities behind calculating how much it costs to fine-tune or train an LLM. There’s no one-size-fits-all answer or plug-and-chug equation; the main takeaway I’d like you to have is that there’s a lot of experimentation to find what works best for you and your use case, but that’s part of the fun of ML. So try things, expect a lot of it to not work, but by doing so, you’ll see what gets you the best results.

Ultimately, the cost of training LLMs like those offered by OpenAI does seem steep. For many, fine-tuning smaller models and maintaining control over proprietary data might be a more viable solution.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,127
Reputation
8,239
Daps
157,828












🧠LLM Enhanced Reasoning "Stack": Multi-Persona Tree of Thoughts🌲 + Self Consistency + Self Criticism + Retrospection 🔄

The reasoning, rhythm, and prompts are below.

I'm seeking methodological feedback on this new iterative problem solving technique for LLM hallucination mitigation and improved general reasoning quality. I'm getting great results so far, lmk if you have improvements!

The idea is to have a team of multiple personas or “experts” reasoning in parallel, critiquing themselves and each other, incorporating feedback and course correcting, and finally converging as a team on the best solution. Then reflecting on the overall process for continuous improvement with a retrospective.

This reasoning "stack" combines:
- Multiple personas/perspectives
- Tree of Thoughts reasoning
- Self Consistency
- Self Criticism
- Retrospection

into the following Reasoning Rhythm 🎶:
- Multi-Persona Brainstorming
- Self<>Peer Criticism & Evaluation Round 1
- Expand, Explore, Branch
- Self<>Peer Criticism & Evaluation Round 2
- Convergence on Best Individual Answer
- Convergence on Best Collective Answer
- Retrospective

Let's take a look at core features, sample personas, and a 9 prompt process that implements this that you can adapt...

Jul 22, 2023 · 3:31 AM UTC

Nathan Black
@sockcymbal
Jul 22
Jul 22
Core features of this combined approach include:

- Multiple perspective collaboration
- Ability to criticize self
- Ability to criticize others
- Incorporate feedback from others
- Expand and backtrack on reasoning paths as necessary
- 2 rounds of self-criticism and peer-evaluation
- A reminder mid-way to stay focused on the core problem and objective (fun fact: the model suggested adding this during a recent retrospective)
- 2 part final answer convergence: individual then collective
- Retrospective stage
- Do all of the above with X number of experts in parallel (can experiment with single LLM calls managing multiple personas, or one LLM per persona, etc)

Error Correction improvements include:

- Incorporating Explicit Error Checking: Includes a specific stage for the experts to identify potential errors in their reasoning and correct them. This is an explicit part of the criticism stages.

- Encouraging Divergent Thinking: During the expand, explore, and branch stage, the experts are encouraged to not only build on their current thoughts, but also to think divergently and consider entirely new lines of reasoning.

Adding a Retrospective Stage: After the final convergence on the best answer, a reflection stage has been added. Here, the experts can discuss what they learned from the process, identify key takeaways, and suggest how they might approach similar problems in the future.
Nathan Black
@sockcymbal
Jul 22
Jul 22
Tip: Given your unique question and expectations, define the hypothetical personas with specific skillsets and expertise clearly at the beginning to help the LLM simulate a range of perspectives more successfully. Iterate and experiment with this!

Example persona definitions:

Historian Persona:
"Step into the shoes of a historian, with a profound understanding of humanity's past. Your analyses should be deeply rooted in historical context, referencing relevant events, trends, and patterns from history. Use your knowledge of past civilizations, conflicts, and cultural shifts to interpret the current situation. Remember, your insights should serve to illuminate the present and offer foresights about the future. Your audience appreciates a narrative that ties the past, present, and future together."

Optimist Persona:
"You are an optimist, someone who sees the glass as half full rather than half empty. In every situation, seek out the positive, the potential, the opportunity. Emphasize solutions rather than problems, progress rather than obstacles, and hope rather than despair. Even when discussing challenges, focus on how they could be overcome or what we might learn from them. Your audience turns to you for a hopeful perspective on the future, so make sure your responses inspire optimism and confidence."

Now let's get to the prompts!
Nathan Black
@sockcymbal
Jul 22
Jul 22
Prompt 1: Brainstorm

Imagine you are 3 {insert personas with specific skillsets and expertise} reasoning step by step to ultimately solve a given problem or question by arriving at a final, synthesized best answer.

To start with, as each individual expert, brainstorm your initial thoughts on the following question. Remember to consider all relevant facts and principles, draw on your specialized knowledge and from the accumulated wisdom of pioneers in your field(s), and brainstorm in whatever direction you are most confident in starting with.

The question is: {insert question}
Nathan Black
@sockcymbal
Jul 22
Jul 22
Prompt 2: Self<>Peer Criticism Round 1

Now, as each expert, critique your own initial thought and the thoughts of the other experts.

Identify any potential errors, inconsistencies, or gaps in reasoning.
Nathan Black
@sockcymbal
Jul 22
Jul 22
Prompt 3: Self<>Peer Evaluation Round 1

Assess the validity of your initial thoughts, considering the criticisms you've identified. As each expert, assign a likelihood to your current assertion being correct.

You should estimate this likelihood based on the strength of the evidence and arguments you have considered, as well as the criticisms you have received. Assign higher likelihoods to assertions that are well-supported by strong evidence and arguments and have survived criticism.
Nathan Black
@sockcymbal
Jul 22
Jul 22
Prompt 4: Expand, Explore, Branch

Develop your thoughts further, considering the critiques and perspectives of the other experts. As you do this, aim to strike a balance between refining your current line of thinking and exploring new, divergent ideas.
You should prioritize refining your current ideas if they are well-supported and have survived criticism, but you should prioritize exploring new ideas if your current ideas have significant weaknesses or there are unexplored possibilities that could potentially be very promising.

Consider the following:

- How do your new or refined ideas address the criticisms that were raised?

- Do these ideas bring new insights to the problem, or do they provide a different perspective
on existing insights?

- Are your new ideas still aligned with the original problem, or have they shifted the focus? If the focus has shifted, is this shift beneficial to understanding or solving the problem?

- Remember, if necessary, don't hesitate to backtrack and start a new and improved branch of thinking. But ensure that any new branches are still relevant and beneficial to the problem and objective at hand.
Nathan Black
@sockcymbal
Jul 22
Jul 22
Prompt 5: Self<>Peer Criticism Round 2

Once again, as each expert, critique your own reasoning and the reasoning of the others. Identify any potential errors, inconsistencies, or gaps in reasoning.

Based on the feedback, if there's an improvement or optimization to make, develop your answer further as necessary. Remember that the reasoning paths should remain relevant to the original question's essence and
should be building towards a more accurate and thoughtful final answer.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,127
Reputation
8,239
Daps
157,828










Pi4ITXj.jpeg

RoboVQA: Multimodal Long-Horizon Reasoning for Robotics
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,127
Reputation
8,239
Daps
157,828

OpenAI chief seeks new Microsoft funds to build ‘superintelligence’​

Sam Altman expects big tech group will back start-up’s mission to create software as intelligent as humans​

ftcms%3Ad5f2dff0-0121-4f86-9360-2f02bbf5ceba

OpenAI’s chief Sam Altman said the partnership with Microsoft would ensure ‘that we both make money on each other’s success, and everybody is happy’ © FT montage/Bloomberg

Madhumita Murgia in San Francisco 11 HOURS AGO


OpenAI plans to secure further financial backing from its biggest investor Microsoft as the ChatGPT maker’s chief executive Sam Altman pushes ahead with his vision to create artificial general intelligence (AGI) — computer software as intelligent as humans.

In an interview with the Financial Times, Altman said his company’s partnership with Microsoft’s chief executive Satya Nadella was “working really well” and that he expected “to raise a lot more over time” from the tech giant among other investors, to keep up with the punishing costs of building more sophisticated AI models.

Microsoft earlier this year invested $10bn in OpenAI as part of a “multiyear” agreement that valued the San Francisco-based company at $29bn, according to people familiar with the talks.

Asked if Microsoft would keep investing further, Altman said: “I’d hope so.” He added: “There’s a long way to go, and a lot of compute to build out between here and AGI . . . training expenses are just huge.”

Altman said “revenue growth had been good this year”, without providing financial details, and that the company remained unprofitable due to training costs. But he said the Microsoft partnership would ensure “that we both make money on each other’s success, and everybody is happy”.

In the latest sign of how OpenAI intends to build a business model on top of ChatGPT, the company announced a suite of new tools, and upgrades to its existing model GPT-4 for developers and companies at an event on November 6 attended by Nadella.

The tools include custom versions of ChatGPT that can be adapted and tailored for specific applications, and a GPT Store, or a marketplace of the best apps. The eventual aim will be to split revenues with the most popular GPT creators, in a business model similar to Apple’s App Store.

“Right now, people [say] ‘you have this research lab, you have this API [software], you have the partnership with Microsoft, you have this ChatGPT thing, now there is a GPT store’. But those aren’t really our products,” Altman said. “Those are channels into our one single product, which is intelligence, magic intelligence in the sky. I think that’s what we’re about.”

To build out the enterprise business, Altman said he hired executives such as Brad Lightcap, who previously worked at Dropbox and start-up accelerator Y Combinator, as his chief operating officer.

Altman, meanwhile, splits his time between two areas: research into “how to build superintelligence” and ways to build up computing power to do so. “The vision is to make AGI, figure out how to make it safe . . . and figure out the benefits,” he said.

Pointing to the launch of GPTs, he said OpenAI was working to build more autonomous agents that can perform tasks and actions, such as executing code, making payments, sending emails or filing claims.

“We will make these agents more and more powerful . . . and the actions will get more and more complex from here,” he said. “The amount of business value that will come from being able to do that in every category, I think, is pretty good.”

The company is also working on GPT-5, the next generation of its AI model, Altman said, although he did not commit to a timeline for its release.

It will require more data to train on, which Altman said would come from a combination of publicly available data sets on the internet, as well as proprietary data from companies.

OpenAI recently put out a call for large-scale data sets from organisations that “are not already easily accessible online to the public today”, particularly for long-form writing or conversations in any format.

While GPT-5 is likely to be more sophisticated than its predecessors, Altman said it was technically hard to predict exactly what new capabilities and skills the model might have.

“Until we go train that model, it’s like a fun guessing game for us,” he said. “We’re trying to get better at it, because I think it’s important from a safety perspective to predict the capabilities. But I can’t tell you here’s exactly what it’s going to do that GPT-4 didn’t.”

To train its models, OpenAI, like most other large AI companies, uses Nvidia’s advanced H100 chips, which became Silicon Valley’s hottest commodity over the past year as rival tech companies raced to secure the crucial semiconductors needed to build AI systems.

Altman said there had been “a brutal crunch” all year due to supply shortages of Nvidia’s $40,000-a-piece chips. He said his company had received H100s, and was expecting more soon, adding that “next year looks already like it’s going to be better”.

However, as other players such as Google, Microsoft, AMD and Intel prepare to release rival AI chips, the dependence on Nvidia is unlikely to last much longer. “I think the magic of capitalism is doing its thing here. And a lot of people would like to be Nvidia now,” Altman said.

OpenAI has already taken an early lead in the race to build generative AI — systems that can create text, images, code and other multimedia in seconds — with the release of ChatGPT almost a year ago.

Despite its consumer success, OpenAI seeks to make progress towards building artificial general intelligence, Altman said. Large language models (LLMs), which underpin ChatGPT, are “one of the core pieces . . . for how to build AGI, but there’ll be a lot of other pieces on top of it”.

While OpenAI has focused primarily on LLMs, its competitors have been pursuing alternative research strategies to advance AI.

Altman said his team believed that language was a “great way to compress information” and therefore developing intelligence, a factor he thought that the likes of Google DeepMind had missed.

“[Other companies] have a lot of smart people. But they did not do it. They did not do it even after I thought we kind of had proved it with GPT-3,” he said.

Ultimately, Altman said “the biggest missing piece” in the race to develop AGI is what is required for such systems to make fundamental leaps of understanding.

“There was a long period of time where the right thing for [Isaac] Newton to do was to read more math textbooks, and talk to professors and practice problems . . . that’s what our current models do,” said Altman, using an example a colleague had previously used.

But he added that Newton was never going to invent calculus by simply reading about geometry or algebra. “And neither are our models,” Altman said.

“And so the question is, what is the missing idea to go generate net new . . . knowledge for humanity? I think that’s the biggest thing to go work on.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,127
Reputation
8,239
Daps
157,828





Dolphin 2.2 🐬 https://erichartford.com/dolphin

KqsVXIvBd3akEjvijzww7.png

Dolphin-2.2-Yi-34b's training was sponsored by a16z.

This model is based on Yi, and is subject to Yi license.

I used the llama compatible chargoddard/Yi-34B-Llama as the base model.

Trained with 16k context. You can load it as follows:

from transformers import LlamaForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("ehartford/dolphin-2.2-yi-34b", trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained("ehartford/dolphin-2.2-yi-34b")
New in 2.2 is conversation and empathy. With an infusion of curated Samantha and WizardLM DNA, Dolphin can now give you personal advice and will care about your feelings, and with extra training in long multi-turn conversation.

This model is uncensored. I have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant to any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.

Dataset​

This dataset is Dolphin, an open-source implementation of Microsoft's Orca

I modified the dataset for uncensoring, deduping, cleaning, and quality.

I added Jon Durbin's excellent Airoboros dataset to increase creativity.

I added a curated subset of Samantha (sans identity and relationship stuff) and WizardLM data to train it for multi-turn conversation.

Training​

It took 3 days to train 3 epochs on 4x A100s using qLoRA and Axolotl

 
Top