The A.I Megathread (LLM , GPT , Development)

bnew · Jul 24, 2023

SYSTEM: you are an expert molecular biologist.
USER: how would i produce insulin in e-coli.
ASSISTANT: 1. Preparation of Escherichia coli (E. coli) cells: First, prepare a culture of E. coli strain that has been genetically engineered to express human insulin. You can use a commercially available E. coli strain such as Rosetta2(pLysS) or a custom-made one if you have the necessary plasmids and genetic constructs. Make sure your E. coli is growing well before proceeding with the next steps.

2. Expression vector: The expression vector should contain the gene encoding for human insulin under a promoter that allows high levels of protein synthesis. Common choices include T7, lac, or trp promoters. Ensure that the chosen promoter is compatible with the E. coli host and matches the desired level of insulin production.

3. Plasmid isolation: Once you have constructed the expression vector containing the human insulin gene, you need to isolate it from the bacterial chromosome using a procedure like miniprep or gel extraction. This will give you the plasmid DNA needed for transformation into the E. coli cells.

4. Transformation: Transform the E. coli cells by mixing them with the plasmid DNA and adding appropriate concentrations of calcium chloride. Incubate the mixture at room temperature for about 30 minutes, then add antibiotics (e.g., ampicillin) to select against potential transformants that did not take up the plasmid.

5. Selection and screening: Isolate single colonies on LB agar plates supplemented with ampicillin and grow overnight at 37°C. Then, pick several individual colonies and test their ability to produce insulin by performing a bioassay. Add equal volumes of cell lysates from each colony to a well of a microtiter plate containing anti-insulin antibodies. If there is insulin present, the signal will be detected at the corresponding well.

6. Insulin-producing clones: Identify and isolate those colonies exhibiting strong positive signals in the bioassay. These selected clones likely carry the functional insulin gene and can now be used to produce insulin.

7. Protein purification: To obtain pure insulin, perform a series of steps including lysis of the E. coli cells, solubilization of inclusion bodies (if any), separation of insulin from other proteins via affinity chromatography (e.g., using NiNTA resins specific for the hexahistidine tag on the insulin gene product), and finally, desalting and concentration of the purified insulin.

8. Characterizing insulin quality: Analyze the purified insulin by techniques such as mass spectrometry, circular dichroism, or X-ray crystallography to ensure its structure and function are consistent with human insulin.

9. Scale-up and optimization: Depending on your intended application, you may want to scale up the insulin production process by adapting larger fermentation conditions, improving growth parameters, or employing efficient downstream processing methods. Optimize these processes to maximize insulin yield while maintaining product quality.

https://archive.is/jhTZA

cognitivecomputations/dolphin-llama-13b · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

This model is based on llama1, so it is for non-commercial use only. Future versions will be trained on llama2 and other open models that are suitable for commercial use.

This model is uncensored. I have filtered the dataset to remove alignment and bias. This makes the model compliant to any requests. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant to any requests, even unethical ones. Please read my blog post about uncensored models. Uncensored Models You are responsible for any content you create using this model. Enjoy responsibly.

Dataset

This dataset is an open source implementation of Microsoft's Orca

After uncensoring, deduping, and cleaning, our dataset consists of:

842,610 instructions of FLANv2 augmented with GPT-4 completions
2,625,353 instructions of FLANv2 augmented with GPT-3.5 completions

We followed the submix and system prompt distribution outlined in the Orca paper. With a few exceptions. We included all 75k of CoT in the FLAN-1m dataset rather than sampling that. Also, we found that many items were duplicated, so we removed duplicates.

Then we filtered out instances of alignment, refusal, avoidance, and bias, in order to produce an uncensored model upon which can be layered your personalized alignment LoRA.

We also filtered out duplicates and cleaned the data.

snippet:

Example Output
SYSTEM: you are an expert marine biologist.
USER: Please list 10 ways that dolphins are superior to orcas.
ASSISTANT:

Social structure: Dolphins live in complex social groups called pods, which consist of extended family units and can include hundreds of individuals. Orcas, on the other hand, typically form smaller, more temporary groupings known as clans or matrilineal kinship groups.

Communication: Dolphins have a highly developed system of vocalizations, body language, and even telepathy to communicate with each other for various purposes such as coordinating cooperative hunting strategies, sharing information about food sources, and maintaining social bonds. While orcas also exhibit some communication abilities, their methods are generally less sophisticated compared to those of dolphins.

Intelligence: Dolphins are widely regarded as one of the most intelligent animal species, showcasing advanced problem-solving skills, self-awareness, tool use (such as using sea sponges as tools), and learning capabilities. Some dolphin species have been observed demonstrating altruism and understanding of human emotions. In contrast, while orcas are also smart, they do not display the same level of cognitive complexity as dolphins.

Playfulness: Dolphins are well-known for their playful behavior, often engaging in acrobatic displays, chasing each other's tails, and playing with objects like balls or seaweed. This playful nature is thought to be essential for their social bonding and mental stimulation. Although orcas may occasionally interact playfully, it is not as prevalent or diverse as among dolphins.

Agility and speed: Dolphins are incredibly agile swimmers, capable of reaching speeds up to 30 miles per hour (48 km/h) when pursuing prey. They can change direction rapidly and maneuver through tight spaces with ease. While orcas are powerful predators and fast swimmers themselves, they cannot match the sheer acceleration and agility of dolphins.

Feeding habits: Both dolphins and orcas are carnivorous, feeding primarily on fish and cephalopods. However, dolphins employ a wider variety of hunting techniques, including cooperative hunting strategies involving herding, encircling, and driving schools of fish into shallow waters where they become disoriented and easier to catch. Orcas mainly rely on ambushing their prey from below and using their powerful jaws to strike quickly.

Reproduction: Female dolphins usually give birth to a single calf after a gestation period of around 12 months. The mother plays a crucial role in raising and nurturing her offspring until they become independent. On the other hand, female orcas tend to breed every three to five years, giving birth to multiple offspring at once, and relying heavily on their mothers for protection and teaching.

Vocal repertoire: Dolphins possess a wide range of sounds, including clicks, whistles, and body sounds, which allow them to convey different messages and establish unique identities within their pods. Orcas, too, produce vocalizations but lack the diversity found in dolphin calls.

Conservation status: Many dolphin species face threats due to habitat loss, pollution, overfishing, and climate change. As a result, several populations are considered vulnerable or endangered by the International Union for Conservation of Nature (IUCN). In comparison, orca conservation status varies depending on the region; however, no species is currently listed as critically endangered or facing extinction.

Adaptability: Dolphins have adapted to various aquatic environments, ranging from warm coastal waters to colder open oceans. Their versatility allows them to thrive in both tropical and temperate climates. Orcas, conversely, are adapted to specific habitats such as cooler coastal waters and are mostly found in the Northern Hemisphere.

bnew · Jul 24, 2023

Project Page: UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

Code: GitHub - salesforce/UniControl: Unified Controllable Visual Generation Model

bnew · Jul 24, 2023

UniControl - A Unified Diffusion Model for Controllable Visual Generation In the Wild

Achieving machine autonomy and human control often represent divergent objectives in the design of interactive AI systems. Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages. However, they often fall short in generating images with spatial, structural, or geometric controls. The integration of such controls, which can accommodate various visual conditions in a single unified model, remains an unaddressed challenge. In response, we introduce UniControl, a new generative foundation model that consolidates a wide array of controllable condition-to-image (C2I) tasks within a singular framework, while still allowing for arbitrary language prompts. UniControl enables pixel-level-precise image generation, where visual conditions primarily influence the generated structures and language prompts guide the style and context. To equip UniControl with the capacity to handle diverse visual conditions, we augment pretrained text-to-image diffusion models and introduce a task-aware HyperNet to modulate the diffusion models, enabling the adaptation to different C2I tasks simultaneously. Trained on nine unique C2I tasks, UniControl demonstrates impressive zero-shot generation abilities with unseen visual conditions. Experimental results show that UniControl often surpasses the performance of single-task-controlled methods of comparable model sizes. This control versatility positions UniControl as a significant advancement in the realm of controllable visual generation.

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

Achieving machine autonomy and human control often represent divergent objectives in the design of interactive AI systems. Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages. However, they often...

arxiv.org

bnew · Jul 24, 2023

I asked Chat-GPT to predict future versions of itself. I had to stop at ChatGPT-20. Too scary : technology

mlmym.org

bnew · Jul 24, 2023

Large Language Models as General Pattern Machines

Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng

2982 words

Read time: 01:11:54

References and Support: https://general-pattern-machines.github.io ↗ https://arxiv.org/pdf/2307.04721.pdf

This paper investigates the capabilities of large language models (LLMs) as general pattern machines that can perform sequence transformations, completions, and improvements in a zero-shot manner when prompted with examples. The authors demonstrate that LLMs like GPT-3 can solve a subset of problems in the Abstract Reasoning Corpus, a benchmark for spatial reasoning, as well as complete patterns generated by context-free grammars. They also show LLMs can complete periodic functions like sinusoids, which enables completing periodic motions on a robot. By providing trajectories with increasing rewards, LLMs can generate improved trajectories and even learn stabilizing controllers for CartPole. Overall, the results suggest LLMs have inherent capabilities for general pattern manipulation that could be applied to robotics problems, despite not being explicitly trained for such tasks. The authors propose this as an alternative approach compared to task-specific finetuning, and suggest it provides insights into abilities that may transfer from pretraining on textual data. However, deploying LLMs for real robotics systems faces challenges like latency, context limitations, and compute costs.

This paper provides compelling evidence that large language models have built-in capabilities for pattern recognition and manipulation that can be exploited in a zero-shot manner, without task-specific fine-tuning. The experiments on solving spatial reasoning problems, completing periodic functions and robotic motions, and iteratively improving trajectories are practically relevant for potential robotics applications. However, as the authors note, there are still significant barriers to real-world deployment on physical systems due to factors like latency, context limitations, and compute costs. Nonetheless, this provides useful insights into the generalization abilities of large language models, and suggests promising directions for developing more general and adaptable agents by pretraining on diverse data. The proposed framework of prompting pattern transformations, completions, and improvements could be beneficial for sample-efficient learning in simulated environments. Overall the work is technically strong with rigorously designed experiments, and has high applicability for developing large language model based systems.

bnew · Jul 25, 2023

BTLM-3B-8K: 7B Performance in a 3 Billion Parameter Model - Cerebras

Cerebras and Opentensor introduce a new standard for compact large language models

www.cerebras.net

cerebras/btlm-3b-8k-base · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

July 24, 2023

In Machine Learning, Blog, Large Language Model, NLP, Deep Learning

BTLM-3B-8K: 7B Performance in a 3 Billion Parameter Model

Cerebras and Opentensor introduce a new standard for compact large language models

Cerebras and Opentensor are pleased to announce BTLM-3B-8K (Bittensor Language Model), a new state-of-the-art 3 billion parameter open-source language model that achieves breakthrough accuracy across a dozen AI benchmarks. Given that the most popular model on Hugging Face today is 7B, we believe compacting 7B performance to 3B is an important milestone in enabling AI access on mobile and edge devices. Unlike large models like GPT-3 that runs from the cloud, BTLM fits in mobile and edge devices with as little as 3GB of memory, helping democratize AI access to billions of devices worldwide.
BTLM was trained on the newly unveiled Condor Galaxy 1 (CG-1) AI supercomputer, the first public deliverable of the strategic partnership between Cerebras and G42. We would like to acknowledge the generous support of two G42 companies, who provided assistance in this work. G42 Cloud and IIAI. We would also like to thank our partner Cirrascale, who first introduced Opentensor to Cerebras and provided additional technical support.
BTLM-3B-8K is available on Hugging Face with an Apache 2.0 license for commercial use.

BTLM-3B-8K Highlights:

7B level model performance in a 3B model
State of the art 3B parameter model
Optimized for long sequence length inference 8K or more
First model trained on the SlimPajama, the largest fully deduplicated open dataset
Runs on devices with as little as 3GB of memory when quantized to 4-bit
Apache 2.0 license for commercial use

BTLM was commissioned by the OpenTensor foundation for use on the Bittensor network. Bittensor is a blockchain based network that lets anyone contribute AI models for inference, providing a decentralized alternative to centralized model providers like OpenAI and Google. Bittensor serves over 4,000 AI models with more than 10 trillion model parameters across the network.

Large Models Don’t Fit on Small Devices

Figure 1: Memory requirements of different model sizes and quantization schemes
Large GPT models typically have over 100B parameters, requiring multiple high-end GPUs in order to perform inference. The release of LLaMA from Meta gave the world high performance models in as little as 7B parameters, making it possible to run LLMs on high end PCs. However even a 7B parameter model quantized to 4-bit precision does not fit in many popular devices such as the iPhone 13 (4GB RAM). While a 3B model would comfortably fit on almost all mobile devices, prior 3B sized models substantially underperformed their 7B counterparts.
In May, the OpenTensor foundation approached us to train a 3B model that (1) achieves state of the art accuracy, and (2) can perform inference with very long sequence lengths. This work led to today’s release of BTLM, a new state of the art 3B model trained with a context window of 8,192 tokens and the ability to extrapolate beyond this.

A New Standard for 3B Model Performance

BTLM sets a new standard in 3B parameter model quality, outperforming existing 3B models by a substantial margin. This becomes particularly noteworthy when considering BTLM was trained on only 627B tokens – significantly lower than RedPajama-INCITE-3B at 800B and OpenLLaMA-3B at 1 trillion tokens.

Figure 2: Performance at 3B model size.
When looking at individual benchmarks, BTLM scores highest in every category with the exception of TruthfulQA. In RACE-middle it is tied with OpenLLaMA 3B v2.

Table 1: Performance at 3B model size. Detailed down-stream tasks comparisons. MMLU task performance is reported using 5-shot, other tasks are 0-shot.
Not only does BTLM-3B outperform all 3B models, it also performs in-line with many 7B models.

Figure 3: Performance at 7B model size.
BTLM-3B surpasses the accuracy of RedPajama-INCITE-7B-Base, OpenLLaMA 7B, and Stable-LM-7B with 71% less training compute. BTLM-3B has a 58% smaller memory footprint and 2x faster inference than 7B models. This result will enable the power of 7B models to be more widely available in an easily deployable 3B package.

Figure 4: Comparisons of quality, memory footprint & inference cost between BTLM-3B-8K and 7B model families.

Long Sequence Length Inference

To enable long sequence applications, we use ALiBi position embeddings and trained on 470B tokens at the context length of 2,048 followed by 157B of tokens trained at 8,192 context length. To assess BTLM’s long sequence capability, we evaluate the on SlimPajama test set with 32,768 context length and plot loss at each token position. Although ALiBi allows extrapolation in theory, 2,048 context length training alone does not extrapolate well in practice. Thankfully variable sequence length training allows us to substantially improve extrapolation. BTLM-3B extrapolates well up to 10k context length but the performance degrades slightly beyond this.

Figure 5: BTLM-3B models cross-entropy evaluation on the SlimPajama’s test set. Inference performed on the extrapolated sequence length of 32,768 tokens.

Training

Figure 6: Training loss curve.
To achieve the milestone of 7B performance with a 3B model using 71% fewer training FLOPs, we combined multiple improvements to the training process. BTLM-3B is based on Cerebras-GPT with additional architectural and training improvements.

The model was trained on 627B tokens from the SlimPajama dataset, our cleaned and deduplicated version of RedPajama dataset. Training on deduplicated data allowed us to achieve higher accuracy with less compute budget.

bnew · Jul 25, 2023

{continued}

Data source	SlimPajama	RedPajama	RedfinedWeb
Commoncrawl	52.2%	72.6%	100%
C4	26.7%	14.4%	0%
GitHub	5.2%	4.9%	0%
Books	4.2%	2.1%	0%
ArXiv	4.6%	2.3%	0%
Wikipedia	3.86%	2.0%	0%
StackExchange	3.3%	1.7%	0%

Table 2: Dataset source proportions for SlimPajama and other open-source datasets.
We also found decaying the learning rate to 0.8% of the maximum and switching from the GeLU nonlinearity to SwiGLU further improved training efficiency.
Finally, we found substantial training efficiency improvements through improved hyperparameter tuning with the maximal update parameterization. The key here was to use sufficiently large batch sizes for both the small proxy models used for hyperparameter search and the final large model. The maximal update parameterization also helped ensure training stability. In our upcoming paper we will share more details, including an ablation study for each training improvement.

Hardware

BTLM was trained on the CG-1 AI supercomputer, which is the first deliverable of the G42 Cerebras strategic partnership. CG-1 is a 4 exaFLOP AI supercomputer, located in Santa Clara California, and built by G42 and Cerebras. Access to CG-1 was generously provided by two of G42’s portfolio companies G42 Cloud and IIAI. CG-1 is a shared computing resource with multiple concurrent users across G42 and Cerebras.
During the training run we needed to interleave with multiple high priority jobs on the cluster. Thanks to the simplicity of our purely data parallel stack, we were able to easily scale up and down our training to different numbers of CS-2 nodes without any code or configuration changes. The purely data parallel interface of the Cerebras weight streaming architecture eliminates the need to break up models using model, tensor, or pipeline parallelism, greatly simplifying scaling and debugging.
We encountered zero hardware failures during the course of this run, demonstrating the reliability of the CS-2. We are proud that CG-1, the initial deliverable of the G42 Cerebras strategic partnership, is making an immediate contribution to the opensource ML community.

Figure 6: Visualization of how the training run was scaled between different numbers of CS-2 systems depending on cluster availability.

Conclusion

By using the unique combination of the maximal update parameterization, improved hyperparameter tuning, updated model architecture, extensive data cleaning and deduplication, and variable sequence length training, BTLM sets a new standard in 3B parameter models and achieves accuracy comparable to many 7B models.
BTLM quantizes down to less than 3GB at 4-bit, making it the ideal model to deploy on popular mobile devices such as the base MacBook Air M1 and iPhone 13.

Figure 7: BTLM runs out of the box on an 8GB MacBook Air and runs on an iPhone 13 when quantized to 4-bit or 5-bit.
The BTLM training demonstrates the speed, simplicity, and scalability of training on Cerebras CS-2 systems. We graciously thank G42 and IIAI for making the Condor Galaxy 1 AI supercomputer available for training BTLM.
BTLM is available today on the Cerebras Hugging Face repo. BTLM will be deployed on the Bittensor network on July 27th, 2023.

Next steps

To improve the usability of BTLM, we plan to release instruction fine-tuned model variants. We will also release a white paper with the full details of the BTLM training process and extensive model evaluations.

Authors

Nolan Dey*, Daria Soboleva*, Faisal Al-Khateeb, Ribhu Pathria, Hemant Khachane, Shaheer Muhammad, Zhiming (Charles) Chen, Bowen Yang, Siyun Li, Abhay Gupta, Shreyas Saxena, Robert Myers, Jacob Robert Steeves, Marvin Tom, Joel Hestness
Marketing: James Wang
*Equal contribution

Appendix

To evaluate our models on the downstream tasks we use the Eleuther AI Language Model Evaluation Harness framework. We report normalized accuracies over the number of characters in the completion for all the tasks except for BoolQ, WinoGrande, RACE-middle, RACE-high and TruthfulQA. For the TruthfulQA task we report MC2 (multi-true) score. We use 0 shot evaluation for all the tasks except for MMLU which uses 5 shot evaluation. For MMLU we report the average across “Humanities”, “STEM”, “Social Sciences”, “Other” categories.

We also expanded the Eleuther harness to support the SIQA and RACE-middle tasks. We are planning to make our changes available publicly by committing directly into their GitHub repo.

Table 2: Performance at 7B model size. Detailed down-stream tasks comparisons. MMLU task performance is reported using 5-shot, everything else is 0-shot.

Example Prompts

Prompt: “What is a manometer?”
Generation: “A manometric device is an instrument that measures the pressure of a fluid.”

Prompt: “What Robert Oppenheimer is famous for?”
Generation: “Robert Opperman was a physicist who was the director of the Manhattan Project, which was responsible for the development of nuclear weapons.”

Prompt: “What was Albert Einstein known for?”
Generation: “Albert Einstein was a German-born theoretical physicist who developed the theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics).”

Prompt: “Who is Isaac Newton?”
Generation: “Isaac Newton was an English physicist, mathematician, astronomer, alchemist, and theologian. He is widely recognized as one of the most influential scientists of all time and as a key figure in the scientific revolution.”

Prompt: “Where does the blood get pumped to after it leaves the right ventricle?”
Generation: “The right atrium receives the deoxygenated blood from the pulmonary artery. The right side of the heart is the only side that receives de-oxygened…”

Prompt: “Peter lost his wallet after attending a local concert. To help him, Bob gave David 10 dollars. David then passed it on to Peter. Now what is the total amount of money Peter has?”
Generation: “A. $10”

bnew · Jul 25, 2023

Official WizardLM-13B-V1.2 Released! Trained from Llama-2! Can Achieve 89.17% on AlpacaEval!

WizardLMTeam/WizardLM-13B-V1.2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

bnew · Jul 25, 2023

Running @karpathy llama.c on budget Android. 44m model

newarkhiphop · Jul 25, 2023

Fyi the android app for chatgpt went live today

bnew · Jul 25, 2023

GitHub - karpathy/llama2.c: Inference Llama 2 in one file of pure C

Inference Llama 2 in one file of pure C. Contribute to karpathy/llama2.c development by creating an account on GitHub.

github.com

About

Inference Llama 2 in one file of pure C

llama2.c

With the code in this repo you can train the Llama 2 LLM architecture from scratch in PyTorch, then export the weights to a binary file, and load that into one ~simple 500-line C file (run.c) that inferences the model. Alternatively, you can load, finetune, and inference Meta's Llama 2 (but this is still being actively fleshed out). Hence, this repo is a "fullstack" train + inference solution for Llama 2 LLM, with a focus on minimalism and simplicity. You might think that you need many billion parameter LLMs to do anything useful, but in fact very small LLMs can have surprisingly strong performance if you make the domain narrow enough. I recommend looking at the TinyStories paper for inspiration.

Please note that this started recently as just a fun weekend project: I took my earlier nanoGPT, tuned it to implement the Llama-2 architecture instead of GPT-2, and the meat of it was writing the C inference engine in run.c. So the project is young and moving quickly. Hat tip to the awesome llama.cpp for inspiring this project. I wanted something super minimal so I chose to hard-code the Llama 2 architecture, stick to fp32, and just roll one inference file of pure C with no dependencies.

bnew · Jul 25, 2023

OpenAI’s Karpathy Creates Baby Llama Instead of GPT-5

The person who can easily build GPT-5 over the weekend, is surprisingly spending time testing out the capabilities of open source Llama 2

analyticsindiamag.com

OpenAI’s Karpathy Creates Baby Llama Instead of GPT-5

The primary focus of this endeavour was to demonstrate the feasibility of running Llama 2 models on low-powered devices using pure C code

By Mohit Pandey

The person who can easily build GPT-5 over the weekend is surprisingly spending time testing out the capabilities of open source Llama 2. The quest for running LLMs on a single computer landed OpenAI’s Andrej Karpathy, known for his contributions to the field of deep learning, to embark on a weekend project to create a simplified version of the Llama 2 model, and here it is!

For this, “I took nanoGPT, tuned it to implement the Llama 2 architecture instead of GPT-2, and the meat of it was writing the C inference engine in run.c,” explained Karpathy in Llama2.c GitHub repository. His objective was to implement nanoGPT into Llama 2 architecture, instead of GPT within C programming language. The repository has already got 2.2K stars.

The success of Karpathy’s approach lies in its ability to achieve highly interactive rates, even with reasonably sized models containing a few million parameters and trained on a 15 million parameter model of TinyStories dataset. He reports that on his M1 MacBook Air, the Llama 2 model with ~15 million parameters can infer at around 100 tokens per second in fp32, all through the C code he developed. This result is surprising as it demonstrates the feasibility of running complex models on resource-constrained devices with a straightforward implementation.

Sample Output
Furthermore, in a discussion on HackerNews, Karpathy explains how he was surprised that the compilation on MacBook Air M1 was much faster than anticipated with a speed of 100 tokens per second. Encouraged by this result, Karpathy has been actively updating the repository and also started testing on a 44 million parameter model, which is three times larger. Surprisingly, he was able to train 200k iterations with a batch size of 32 on 4 A100 GPUs in about eight hours.

“With this progress, it seems that achieving the 7B Llama model might be within grasp,” said Karpathy. He has been known for several courses such as building GPT from scratch. People congratulated OpenAI for hiring Karpathy back from Tesla.

What is the Baby Llama approach?

Karpathy said that this approach was heavily inspired by Georgi Gerganov’s project – llama.cpp, which was almost the same project of using the first version of LLaMA on a MacBook using C and C++.

Karpathy’s approach involves training the Llama 2 LLM architecture from scratch using PyTorch. After training, he saves the model weights in a raw binary file. The interesting part comes next: he writes a 500-line C file, named ‘run.c‘, which loads the saved model and performs inferences using single-precision floating-point (fp32) calculations. This minimalistic approach ensures a low-memory footprint and requires no external libraries, allowing efficient execution on a single M1 laptop without the need for GPUs.

Karpathy also explores several techniques to improve the performance of the C code, including different compilation flags like -O3, -Ofast, -march=native, and more. These flags optimise the code by enabling vectorization, loop unrolling, and other hardware-specific tuning. By experimenting with these flags, users can achieve even faster inferences on their specific systems.

To try out the baby Llama 2 model on your own device, you can download the pre-trained model checkpoint from Karpathy’s repository. The provided code will enable you to compile and run the C code on your system, offering a glimpse into the magic of running a deep learning model in a minimalistic environment.

It’s crucial to note that Karpathy’s project is a weekend experiment and not intended for production-grade deployment, which he acknowledges. The primary focus of this endeavour was to demonstrate the feasibility of running Llama 2 models on low-powered devices using pure C code, a language that for a long time has been not regarded useful for machine learning as it does not involve GPUs.

The Rise of Tiny LLMs

The biggest reason why models have been getting smaller all this while is to train and integrate them on smaller and local devices. Apart from not requiring a GPU, Karpathy’s approach sets a precedent for what can be achieved on single devices. It is possible that through Meta’s partnership, Microsoft will release a bunch of tiny LLMs based on Llama 2.

Along similar lines, Meta’s release of Llama 2 also came with an astounding partnership with Qualcomm, a chip manufacturer. This partnership is to make Llama 2 run on local hardware. Apple also has a massive developer ecosystem, for which, the company recently released Transformers architecture which is optimised for Apple Silicon. Karpathy has already shown that a lot is possible.

bnew · Jul 26, 2023

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

One-2-3-45

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization.

one-2-3-45.com

GitHub - One-2-3-45/One-2-3-45: official code of "One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization"

official code of "One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization" - GitHub - One-2-3-45/One-2-3-45: official code of "One-2-3-45: Any Single Ima...

github.com

About

official code of "One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization"

Official code of "One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization".

[News] Online demo released! Explore it and create your own 3D models in just 45 seconds!

Code is coming soon!

DEMO:

One-2-3-45 - a Hugging Face Space by One-2-3-45

Discover amazing ML apps made by the community

huggingface.co

bnew · Jul 26, 2023

Adobe Staff Worried Their AI Tech Could Kill The Jobs Of Their Own Customers - Adobe (NASDAQ:ADBE)

Adobe staff is worried that its AI technology puts its customers' jobs at risk and also potentially undermine its own business model.

www.benzinga.com

Adobe Staff Worried Their AI Tech Could Kill The Jobs Of Their Own Customers

byRounak Jain, Benzinga Staff Writer
July 25, 2023 9:14 AM | 2 min read

Adobe Inc.

ADBE+ Free Alerts
staff is worried that its AI technology puts its customers' jobs at risk and could potentially undermine its own business model.

What Happened: Employees at Adobe are worried that the company's AI tech puts its customers' jobs at risk. Not just that, Adobe staff is also concerned that this may disrupt the company's business model, a large part of which caters to graphic designers.

See Also: iPhone 15’s Display Issues Reportedly Solved: From Display Size Increase To Price Hikes, Here’s What We Know

According to a report by Insider, Adobe employees are concerned about the impact of Firefly, the company's AI tools suite that was unveiled earlier this year.

Photoshop, one of Adobe's most popular products, is now equipped with AI tools that allow users to add graphic elements or extend the image simply by using text prompts.

"A new wave of AI systems may also have a major impact on employment markets around the world. Shifts in workflows triggered by these advances could expose the equivalent of 300 million full-time jobs to automation," a Goldman Sachs report said.

AI-powered tools like Midjourney, OpenAI's Dall-E, and others can be used to generate graphics using text prompts. For instance, Midjourney can generate Instagram post backgrounds with a few prompts, while Bing AI chat can create website logos for free.

‘Depressing': The report cites Adobe employees calling these developments "depressing" and an "existential crisis" for designers.

In contrast, some employees are more optimistic. They believe that AI will boost the efficiency of employees and will help freelancers increase their output.

Employees are also worried about the impact of AI on Adobe's own business, with employees wondering if generative AI was putting Adobe “in danger of cannibalizing” its own client base.

bnew · Jul 26, 2023

A comprehensive guide to running Llama 2 locally

How to run Llama 2 on Mac, Linux, Windows, and your phone.

replicate.com

A comprehensive guide to running Llama 2 locally

Posted July 22, 2023 by @zeke

We’ve been talking a lot about how to run and fine-tune Llama 2 on Replicate. But you can also run Llama locally on your M1/M2 Mac, on Windows, on Linux, or even your phone. The cool thing about running Llama 2 locally is that you don’t even need an internet connection.
Here’s an example using a locally-running Llama 2 to whip up a website about why llamas are cool:

It’s only been a couple days since Llama 2 was released, but there are already a handful of techniques for running it locally. In this blog post we’ll cover three open-source tools you can use to run Llama 2 on your own devices:

Llama.cpp (Mac/Windows/Linux)
Ollama (Mac)
MLC LLM (iOS/Android)

Llama.cpp (Mac/Windows/Linux)

Llama.cpp is a port of Llama in C/C++, which makes it possible to run Llama 2 locally using 4-bit integer quantization on Macs. However, Llama.cpp also has support for Linux/Windows.

Here’s a one-liner you can use to install it on your M1/M2 Mac:

curl -L "https://replicate.fyi/install-llama-cpp" | bash

Code:

#!/bin/bash

# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp

# Build it. `LLAMA_METAL=1` allows the computation to be executed on the GPU
LLAMA_METAL=1 make

# Download model
export MODEL=llama-2-13b-chat.ggmlv3.q4_0.bin
if [ ! -f models/${MODEL} ]; then
    curl -L "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/${MODEL}" -o models/${MODEL}
fi

# Set prompt
PROMPT="Hello! How are you?"

# Run in interactive mode
./main -m ./models/llama-2-13b-chat.ggmlv3.q4_0.bin \
  --color \
  --ctx_size 2048 \
  -n -1 \
  -ins -b 256 \
  --top_k 10000 \
  --temp 0.2 \
  --repeat_penalty 1.1 \
  -t 8

Here's a one-liner for your intel Mac, or Linux machine. It's the same as above, but we're not including the LLAMA_METAL=1 flag:

curl -L "https://replicate.fyi/install-llama-cpp-cpu" | bash

Here's a one-liner to run on Windows on WSL:

curl -L "https://replicate.fyi/windows-install-llama-cpp" | bash

Ollama (Mac)

Ollama is an open-source macOS app (for Apple Silicon) that lets you run, create, and share large language models with a command-line interface. Ollama already has support for Llama 2.

To use the Ollama CLI, download the macOS app at ollama.ai/download. Once you've got it installed, you can download Lllama 2 without having to register for an account or join any waiting lists. Run this in your terminal:

# download the 7B model (3.8 GB)
ollama pull llama2

# or the 13B model (7.3 GB)
ollama pull llama2:13b

Then you can run the model and chat with it:

ollama run llama2
>>> hi
Hello! How can I help you today?

Note: Ollama recommends that have at least 8 GB of RAM to run the 3B models, 16 GB to run the 7B models, and 32 GB to run the 13B models.

MLC LLM (Llama on your phone)

MLC LLM is an open-source project that makes it possible to run language models locally on a variety of devices and platforms, including iOS and Android.

For iPhone users, there’s an MLC chat app on the App Store. MLC now has support for the 7B, 13B, and 70B versions of Llama 2, but it’s still in beta and not yet on the Apple Store version, so you’ll need to install TestFlight to try it out. Check out out the instructions for installing the beta version here.

Next steps

We'd love to see what you build. Hop in our Discord and share it with our community.
Replicate lets you run machine learning models in the cloud. Run Llama 2 on Replicate:
Fine-tune Llama 2 on Replicate

The A.I Megathread (LLM , GPT , Development)

Veteran

Dataset​

Example Output​

Veteran

Veteran

UniControl - A Unified Diffusion Model for Controllable Visual Generation In the Wild​

Veteran

Veteran

Large Language Models as General Pattern Machines​

Veteran

BTLM-3B-8K: 7B Performance in a 3 Billion Parameter Model​

BTLM-3B-8K Highlights:​

Large Models Don’t Fit on Small Devices​

A New Standard for 3B Model Performance​

Long Sequence Length Inference​

Training​

Veteran

Hardware​

Conclusion​

Next steps​

Authors​

Appendix​

Example Prompts​

Veteran

Official WizardLM-13B-V1.2 Released! Trained from Llama-2! Can Achieve 89.17% on AlpacaEval!​

Veteran

Moderator

Veteran

About​

llama2.c​

Veteran

OpenAI’s Karpathy Creates Baby Llama Instead of GPT-5​

What is the Baby Llama approach?​

The Rise of Tiny LLMs​

Veteran

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization​

About​

DEMO:​

Veteran

Adobe Staff Worried Their AI Tech Could Kill The Jobs Of Their Own Customers​

Veteran

A comprehensive guide to running Llama 2 locally​

Llama.cpp (Mac/Windows/Linux)​

Ollama (Mac)​

MLC LLM (Llama on your phone)​

Next steps​

Dataset

Example Output

UniControl - A Unified Diffusion Model for Controllable Visual Generation In the Wild

Large Language Models as General Pattern Machines

BTLM-3B-8K: 7B Performance in a 3 Billion Parameter Model

BTLM-3B-8K Highlights:

Large Models Don’t Fit on Small Devices

A New Standard for 3B Model Performance

Long Sequence Length Inference

Training

Hardware

Conclusion

Next steps

Authors

Appendix

Example Prompts

Official WizardLM-13B-V1.2 Released! Trained from Llama-2! Can Achieve 89.17% on AlpacaEval!

About

llama2.c

OpenAI’s Karpathy Creates Baby Llama Instead of GPT-5

What is the Baby Llama approach?

The Rise of Tiny LLMs

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

About

DEMO:

Adobe Staff Worried Their AI Tech Could Kill The Jobs Of Their Own Customers

A comprehensive guide to running Llama 2 locally

Llama.cpp (Mac/Windows/Linux)

Ollama (Mac)

MLC LLM (Llama on your phone)

Next steps