bnew

Veteran
Joined
Nov 1, 2015
Messages
68,764
Reputation
10,607
Daps
185,939

Prompts for Education: Enhancing Productivity & Learning

About this Repository

Welcome to the Prompts for Education repository! Our mission is to transform the way students, educators, and staff in K-12 and higher education institutions interact with generative AI technology like ChatGPT and Bing Chat. By using these prompts, staff can save time and work more efficiently, and students can explore new and exciting learning opportunities.


Whether you're a student, a third-grade teacher, a college professor, or a school administrator, this collection is designed with you in mind. No technical expertise required!

Responsible AI with the Azure OpenAI and Bing Chat

At Microsoft, we're committed to the advancement of AI driven by principles that put people first. Generative models such as the ones available in Azure OpenAI Service and Bing Chat have significant potential benefits, but without careful design and thoughtful mitigations, such models have the potential to generate incorrect or even harmful content. Microsoft has made significant investments to help guard against abuse and unintended harm, which includes a registration process for access to the Azure OpenAI Service, incorporating Microsoft’s principles for responsible AI use, building content filters to support customers, and providing responsible AI implementation guidance to onboarded customers.


More details on the RAI guidelines for the Azure OpenAI Service can be found here.

Responsible AI Principles

  • Fairness: AI Systems should treat all people fairly.
  • Reliability and Safety: AI systems should perform reliably and safely.
  • Privacy and security: AI systems should be secure and respect privacy.
  • Inclusiveness: AI systems should empower everyone and engage people.
  • Transparency: AI systems should be understandable.
  • Accountability: People should be accountable for AI systems.

More details on the Responsible AI Principles here.

Disclaimer

While the prompts in this repository are designed with care and intended for educational use, users should be aware of potential risks in their application. Large Language Models (LLMs) may interpret prompts in ways that were not originally intended, leading to unexpected or inappropriate responses. We strongly encourage users to customize the prompts to fit their unique contexts, students, and needs, and to review the responses from LLMs for suitability and accuracy. Always exercise caution and professional judgment when incorporating these prompts into your educational environment.

What's a Prompt?

Think of a prompt as a special question or statement that you can give to an artificial intelligence model like GPT. It's designed to provide you with information, insights, or even creative ideas tailored to your needs. It's like having a knowledgeable assistant at your fingertips!

Improved Productivity for Faculty & Staff

Administrators, teachers, and other staff members can utilize these prompts to:

  • Create Engaging Lessons: Quickly design interesting and interactive lessons that captivate students.
  • Answer Student Questions: Provide accurate and fast answers to common student inquiries.
  • Automate Routine Tasks: Simplify day-to-day tasks with ready-to-use prompts.

New Learning Opportunities for Students

Students can use these prompts to:

  • Explore Subjects in Depth: Dive into various subjects with expert guidance.
  • Enhance Creativity: Develop writing, artistic, and critical thinking skills.
  • Personalize Learning: Tailor their learning experiences to their individual interests and needs.

How to Use

  1. Find a Prompt: Browse through our collection (currently a work in progress).
  2. Copy & Paste: Follow the direct link to Bing Chat or highlight, copy, and paste the prompt into your GPT-powered tool.
  3. Apply the Answer: Use the response in your teaching, administrative tasks, or educational activities.

Roles

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,764
Reputation
10,607
Daps
185,939

A Taxonomy of Prompt Modifiers for Text-To-Image Generation​


"A Taxonomy of Prompt Modifiers for Text-To-Image Generation"

This paper identifies six types of prompt modifiers used by practitioners in the
online text-to-image community based on a 3-month ethnographic study. The novel taxonomy of prompt
modifiers provides researchers a conceptual starting point for investigating the practice of text-to-image
generation, but may also help practitioners of AI generated art improve their images.
LOk6eRl.jpeg



 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,764
Reputation
10,607
Daps
185,939

LLMBOXING.COM​



GOa17bl.png


You've all heard about Llama 2 by now. But we've got a new kid on the block. Mistral 7B claims to outpeform "Llama 2 13B on all benchmarks," but with almost half the parameters. Mistral talks big. But can it back it up? You decide.

Each round, pick the output you think is better. Each model has 5 hitpoints.

Enough talk. Let's settle this in the ring.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,764
Reputation
10,607
Daps
185,939

PROMPTBREEDER - SELF-REFERENTIAL SELF-IMPROVEMENT VIA PROMPT EVOLUTION​


Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution​

Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, Tim Rocktäschel
Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as:arXiv:2309.16797 [cs.CL]
(or arXiv:2309.16797v1 [cs.CL] for this version)
[2309.16797] Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Focus to learn more

Submission history​

From: Chrisantha Fernando Dr [view email]
[v1] Thu, 28 Sep 2023 19:01:07 UTC (697 KB)


yji1YX4.png









Optimizing Language Model Prompts with Promptbreeder

Overview and Objective:
The objective is to automate the prompt engineering process to enhance the performance of Large Language Models (LLMs) across various domains. This will be achieved by utilizing Promptbreeder's genetic algorithm approach to evolve increasingly effective task-prompts and mutation-prompts.

Steps and Intermediate Deliverables:

1. Initial Setup
- Objective: Prepare the environment and identify the domain for which you want to optimize prompts.
- Deliverable: A list of domains and initial task-prompts.
- Example: Domains like solving math word problems or classifying hate speech.

2. Selection of Thinking Styles
- Objective: Choose the thinking styles that will guide the prompt generation process.
- Deliverable: A list of selected thinking styles.
- Example: "Let's think step-by-step," "Focus on the creative approach," "Consider the problem from multiple angles."

3. Initial Mutation Prompts
- Objective: Generate initial mutation prompts that will be used to modify task-prompts.
- Deliverable: A list of initial mutation prompts.
- Example: "Rephrase the instruction to make it more engaging," "Explain the instruction as if you're talking to a beginner," "Condense and refine the following instruction."

4. Final Deliverable:
- A set of highly optimized task-prompts and mutation-prompts for the selected domain, along with performance metrics demonstrating their effectiveness.

5. Task Prompt Structure
- Objective: Create the initial task prompts using the problem description and selected thinking styles.
- Deliverable: Initial task prompts.
- Example: Start with a problem description, choose a thinking style, apply an initial mutation prompt to generate the first task prompt, and then use a second mutation prompt to generate a second task prompt.

6. Fitness Evaluation
- Objective: Evaluate the effectiveness of the generated prompts.
- Deliverable: Performance metrics of the prompt pairs on a subset of training data.
- Example: Evaluate the generated prompt pairs on a random subset of training data to measure their effectiveness.

7. Mutation Operators
- Objective: Apply mutation operators to create variations in the prompts.
- Deliverable: A new set of mutated prompts.
- Example: Zero-order prompt generation, First-order prompt generation, Lineage-based mutation, Hypermutation of mutation prompts, Context shuffling, Lamarckian mutation.

8. Evolution Process
- Objective: Iteratively refine the prompts based on their performance metrics.
- Deliverable: An optimized set of prompts.
- Example: Iteratively evaluate the effectiveness of task and mutation prompts, select the most effective ones, and generate a new set through crossover and mutation.

9. Diversity Maintenance
- Objective: Ensure that the prompt set remains diverse to avoid local optima.
- Deliverable: A diverse set of optimized prompts.
- Example: Introduce random mutations, use fitness proportionate selection, apply population distribution-based mutation to maintain diversity.

Final Deliverable:
A set of highly optimized task-prompts and mutation-prompts for the selected domain, along with performance metrics demonstrating their effectiveness


 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,764
Reputation
10,607
Daps
185,939


Nucleus.ai Founders Emerge From Stealth and Demonstrate That Big Tech Companies Aren’t the Only Ones Building Large Language Models

Four young entrepreneurs just released a 22-billion-parameter LLM

October 05, 2023 08:00 AM Eastern Daylight Time

PALO ALTO, Calif.--(BUSINESS WIRE)--The four founders at startup NucleusAI just emerged from stealth and demonstrated they are able to build large language models (LLMs) like the Big Tech companies. NucleusAI just launched a 22-billion-parameter LLM that outperforms all the models of similar size.

Typically, foundational models are built by 30 to 100 people with skill sets that address the different aspects of LLMs. The fact Nucleus has achieved the same or better with just four people and the open source community is a testament to the team’s overall expertise.

There have been only a handful of companies to have open sourced and commercially licensed an LLM: Meta, Salesforce, Mosaic and Falcon by TII (UAE government funded) among them. With this release Nucleus will join that exclusive list. Nucleus is unique among those companies as an early stage startup.

Nucleus’s LLM

Nucleus’s 22-billion-parameter LLM was pre-trained on a context length of 2,048 tokens, and trained on a trillion tokens of data, from large scale deduplicated and cleaned web data, Wikipedia, Stack Exchange, arXiv, and code. This diverse training data ensures a well-rounded knowledge base, spanning general information to academic research and coding insights.

Today’s announcement will be followed by the release of two other models (3-billion parameters and 11-billion parameters) that were pre-trained on the larger context length of 4,096 tokens. The models were also trained with slightly different dimensional choices than the 22-billion-parameter model. The 22-billion-parameter model will be released in several versions, trained on one trillion tokens, 350 billion tokens, and 700 billion tokens, respectively.

Agriculture will be the first beneficiary

Nucleus will leverage AI to make agriculture failure proof. We have now reached a time in the world where all potential risks in agriculture (Including natural calamities) can be fully mitigated. The company is building an intelligent operating system (OS) for farming, optimizing supply and demand, just like Uber does.

Growing produce is one of the most important things for humanity, and it can be one of the most lucrative things to do. Agricultural challenges, such as optimizing and sharing yields, are top of mind for the founders who are familiar with and dedicated to tackling this challenge. More details will be announced In the second week of October.

About NucleusAI

NucleusAI is a team of 4 individuals, all of whom are former founders. Two were AI researchers at Amazon, one worked at Samsung Research. They have joined forces to pursue a long-held passion project of making agriculture one of the most lucrative things to do. The founders believe that their current network and skillset position them perfectly to take on this new challenge head-on.


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,764
Reputation
10,607
Daps
185,939

Tiny Language Models Come of Age​

To better understand how neural networks learn to simulate writing, researchers trained simpler versions on synthetic children’s stories.
Tiny Language Models Thrive With GPT-4 as a Teacher | Quanta Magazine


READ LATER

TinyStories-byAdamNickel-Lede-scaled.webp

Adam Nickel for Quanta Magazine

ByBen Brubaker
Staff Writer


October 5, 2023


Introduction​

Learning English is no easy task, as countless students well know. But when the student is a computer, one approach works surprisingly well: Simply feed mountains of text from the internet to a giant mathematical model called a neural network. That’s the operating principle behind generative language models like OpenAI’s ChatGPT, whose ability to converse coherently (if not always truthfully) on a wide range of topics has surprised researchers and the public over the past year.

But the approach has its drawbacks. For one thing, the “training” procedure required to transmute vast text archives into state-of-the-art language models is costly and time-intensive. For another, even the people who train large language models find it hard to understand their inner workings; that, in turn, makes it hard to predict the many ways they can fail.

Faced with these difficulties, some researchers have opted to train smaller models on smaller data sets and then study their behavior. “It’s like sequencing the Drosophila genome versus sequencing the human genome,” said Ellie Pavlick, a language model researcher at Brown University.

Now, in a paper recently posted to the scientific preprint server arxiv.org, a pair of Microsoft researchers have introduced a new method for training tiny language models: Raise them on a strict diet of children’s stories.

Machine learning researchers have embraced this lesson. GPT-3.5, the large language model that powers the ChatGPT interface, has nearly 200 billion parameters, and it was trained on a data set comprising hundreds of billions of words. (OpenAI hasn’t released the corresponding figures for its successor, GPT-4.) Training such large models typically requires at least 1,000 specialized processors called GPUs running in parallel for weeks at a time. Only a few companies can muster the requisite resources, let alone train and compare different models.

The two researchers showed that language models thousands of times smaller than today’s state-of-the-art systems rapidly learned to tell consistent and grammatical stories when trained in this way. Their results hint at new research directions that might be helpful for training larger models and understanding their behavior.
“I found this paper very informative,” said Chandra Bhagavatula, a language model researcher at the Allen Institute for Artificial Intelligence in Seattle. “The concept itself is super interesting.”

Once Upon a Time

The neural networks at the heart of language models are mathematical structures loosely inspired by the human brain. Each one contains many artificial neurons arranged in layers, with connections between neurons in adjacent layers. The neural network’s behavior is governed by the strength of these connections, called parameters. In a language model, the parameters control which words the model might spit out next, given an initial prompt and the words it has generated already.

A model only truly comes to life during training, when it repeatedly compares its own output to the text in its training data set and adjusts its parameters to increase the resemblance. An untrained network with random parameters is trivially easy to assemble from a few lines of code, but it will just produce gibberish. After training, it can often plausibly continue unfamiliar text. Larger models often undergo further fine-tuning that teaches them to answer questions and follow instructions, but the bulk of the training is mastering word prediction.

Success at word prediction requires a language model to master many different skills. For example, the rules of English grammar suggest that the next word after the word “going” is likely to be “to,” regardless of the subject of the text. In addition, a system needs factual knowledge to complete “the capital of France is,” and completing a passage containing the word “not” requires a rudimentary grasp of logic.
“Raw language is very complicated,” said Timothy Nguyen, a machine learning researcher at DeepMind. “In order for interesting linguistic capabilities to arise, people have resorted to ‘more data is better.’”
Ronen Eldan in a blue short-sleeved shirt against a blurry green background.

Ronen Eldan realized he could use children’s stories generated by large language models to rapidly train smaller ones.

Weizmann Institute of Science

Introduction​

Ronen Eldan, a mathematician who joined Microsoft Research in 2022 to study generative language models, wanted to develop a cheaper and faster way to explore their abilities. The natural way to do that was by using a small data set, and that in turn meant he’d have to train models to specialize in a specific task, so they wouldn’t spread themselves too thin. Initially, he wanted to train models to solve a certain class of math problems, but one afternoon, after spending time with his 5-year-old daughter, he realized that children’s stories were a perfect fit.

“It literally came to me after I read her a story,” he said.

To generate coherent children’s stories, a language model would need to learn facts about the world, keep track of characters and events, and observe the rules of grammar — simpler versions of the challenges facing large models. But large models trained on massive data sets learn countless irrelevant details along with the rules that really matter. Eldan hoped the brevity and limited vocabulary of children’s stories might make learning more manageable for small models — making them both easier to train and easier to understand.

In language model research — as in every classroom — grading is a fraught topic.

In the world of language models, though, “small” is relative: A data set a thousand times smaller than the one used to train GPT-3.5 would still need to contain millions of stories. “I don’t know how much money you want to spend, but I’m guessing you’re not going to hire professionals to write [a couple million] short stories,” Nguyen said.

It would take an extraordinarily prolific author to satisfy such voracious readers, but Eldan had a few candidates in mind. Who better to write for an audience of small language models than large ones?

Toy Stories

Eldan immediately set out to create a library of synthetic children’s stories generated by large language models. But he soon discovered that even state-of-the-art models aren’t naturally very creative. If you just tell GPT-4 to write stories appropriate for 4-year-olds, Eldan said, “about one-fifth of the stories will be about children going to the park being scared of the slides.” That’s apparently the quintessential preschool story, as far as the internet is concerned.

The solution was to add a bit of randomness into the prompt. First, Eldan used GPT-4 to generate a list of 1,500 nouns, verbs and adjectives that a 4-year-old might know — short enough that he could easily check it himself. Then he wrote a simple computer program that would repeatedly prompt GPT-3.5 or GPT-4 to generate an age-appropriate story that included three random words from the list, along with an additional randomly chosen detail like a happy ending or plot twist. The resulting stories, mercifully, were less focused on scary slides.

Eldan now had a procedure for churning out training data on demand, but he had no idea how many stories he’d need to train a functional model, or how big that model would need to be. That’s when he teamed up with Yuanzhi Li, a machine learning researcher at Microsoft and Carnegie Mellon University, to try different possibilities, taking advantage of the fact that small models could be trained very quickly. Step 1 was deciding how to evaluate their models.

YuanzhiLi-CreditTK-v2-2.webp

Yuanzhi Li teamed up with Eldan to compare different models trained on synthetic children’s stories. They found that surprisingly small models could learn to tell coherent stories.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,764
Reputation
10,607
Daps
185,939
{continued}

Courtesy of Yuanzhi Li

Introduction​

In language model research — as in every classroom — grading is a fraught topic. There’s no perfect rubric that encapsulates everything researchers want to know, and models that excel at some tasks often fail spectacularly at others. Over time, researchers have developed various standard benchmarks based on questions with unambiguous answers, which is a good approach if you’re trying to evaluate specific skills. But Eldan and Li were interested in something more nebulous: How big do language models really need to be if you simplify language as much as possible?
“In order to directly test if the model speaks English, I think the only thing you can do is let the model generate English in an open-ended way,” Eldan said.

There are only two ways to measure a model’s performance on such qualitative questions: Rely on human graders, or turn once again to GPT-4. The two researchers chose the latter route, effectively letting the big models both write the textbooks and grade the essays.

Bhagavatula said he would have liked to see how GPT-4’s evaluations compared to those of human reviewers — GPT-4 may be biased toward models that it helped train, and the opaqueness of language models makes it hard to quantify such biases. But he doesn’t think such subtleties would affect comparisons between different models trained on similar sets of synthetic stories — the main focus of Eldan and Li’s work.

Eldan and Li used a two-step procedure for evaluating each of their small models after training. First, they prompted the small model with the first half of a story distinct from those in the training data set so that it generated a new ending, repeating this process with 50 different test stories. Second, they instructed GPT-4 to grade each of the small model’s endings based on three categories — creativity, grammar and consistency with the beginning of the story. They then averaged the scores in each category, ending up with three final grades per model.

With this procedure in hand, Eldan and Li were finally ready to compare different models and find out which were the star students.

Test Results

After some preliminary exploration, the two researchers settled on a training data set containing roughly 2 million stories. They then used this data set, dubbed TinyStories, to train models ranging in size from 1 million to 30 million parameters, with varying numbers of layers. It was quick work: Using only four GPUs, the largest of these models took no more than a day to train.

The smallest models struggled. For example, one test story begins with a mean-looking man telling a girl he will take her cat. A million-parameter model got stuck in a loop with the girl repeatedly telling the man she wanted to be friends. But the larger ones — still thousands of times smaller than GPT-3.5 — performed surprisingly well. The 28-million-parameter version told a coherent story, though the ending was grim: “Katie started to cry, but the man didn’t care. He took the cat away and Katie never saw her cat again. The end.”

In addition to testing their own models, Eldan and Li presented the same challenge to OpenAI’s GPT-2, a 1.5-billion-parameter model released in 2019. It fared far worse — before the story’s abrupt ending, the man threatens to take the girl to court, jail, the hospital, the morgue and finally the crematorium.

WritingExercisesbyMerrillSherman-v2_Desktop.svg

Merrill Sherman/Quanta Magazine

Introduction​

Nguyen said it’s exciting that such tiny models were so fluent, but perhaps not surprising that GPT-2 struggled with the task: It’s a larger model but far from the state of the art, and it was trained on a very different data set. “A toddler training only on toddler tasks, like playing with some toys, might do better than you or I,” he noted. “We didn’t specialize in this simple thing.”

Comparisons between different TinyStories models don’t suffer from the same confounding factors. Eldan and Li observed hints that networks with fewer layers but more neurons per layer were better at answering questions that required factual knowledge; conversely, networks with more layers and fewer neurons per layer were better at keeping track of characters and plot points from earlier in the story. Bhagavatula found this result especially intriguing. If it can be replicated in larger models, he said, “that would be a really cool result that could stem out of this work.”

Eldan and Li also studied how their small models’ abilities depended on the duration of the training period. In every case, models mastered grammar first and consistency later. To Eldan, this pattern illustrates how differences in reward structures lead to differences in language acquisition patterns between neural networks and children. For language models, which learn by predicting words, “the incentive on the words ‘I want to have’ is as big as it is on the words ‘ice cream,’” he said. Children, on the other hand, “don’t care about whether they say ‘I would like to have some ice cream’ or just ‘ice cream, ice cream, ice cream.’”

Quality Versus Quantity

Eldan and Li hope that the research will motivate other researchers to train different models on the TinyStories data set and compare their capabilities. But it’s often hard to predict which characteristics of small models will also appear in larger ones.
“Maybe mouse models of vision are really good proxies of human vision, but are mouse models of depression good models of human depression?” Pavlick said. “For every case it’s a little bit different.”

The success of the TinyStories models also suggests a broader lesson. The standard approach to compiling training data sets involves vacuuming up text from across the internet and then filtering out the garbage. Synthetic text generated by large models could offer an alternative way to assemble high-quality data sets that wouldn’t have to be so large.

“We have more and more evidence that this is very effective, not only in TinyStories-sized models but also in larger models,” Eldan said. That evidence comes from a pair of follow-up papers about billion-parameter models by Eldan, Li and other Microsoft researchers. In the first paper, they trained a model to learn the programming language Python using snippets of code generated by GPT-3.5 along with carefully curated code from the internet. In the second, they augmented the training data set with synthetic “textbooks,” covering a wide range of topics, to train a general-purpose language model. In their tests, both models compared favorably to larger models trained on larger data sets. But evaluating language models is always tricky, and the synthetic training data approach is still in its infancy — more independent tests are necessary.

RELATED:​


  1. The Unpredictable Abilities Emerging From Large AI Models

  2. Common Sense Comes Closer to Computers

  3. Neural Networks Need Data to Learn. Even If It’s Fake.


As state-of-the-art language models grow ever larger, surprising findings from their tiny cousins are reminders that there’s still much we don’t understand about even the simplest models. Nguyen expects to see many more papers exploring the approach pioneered by TinyStories.
“The question is: Where and why does size matter?” he said. “There should be a science of that, and this paper is hopefully the beginning of a rich story.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,764
Reputation
10,607
Daps
185,939



Decoding speech from non-invasive recordings of brain activity

paper page: https://huggingface.co/papers/2208.12266

Decoding language from brain activity is a long-awaited goal in both healthcare and neuroscience. Major milestones have recently been reached thanks to intracranial devices: subject-specific pipelines trained on invasive brain responses to basic language tasks now start to efficiently decode interpretable features (e.g. letters, words, spectrograms). However, scaling this approach to natural speech and non-invasive brain recordings remains a major challenge. Here, we propose a single end-to-end architecture trained with contrastive learning across a large cohort of individuals to predict self-supervised representations of natural speech. We evaluate our model on four public datasets, encompassing 169 volunteers recorded with magneto- or electro-encephalography (M/EEG), while they listened to natural speech. The results show that our model can identify, from 3s of MEG signals, the corresponding speech segment with up to 72.5% top-10 accuracy out of 1,594 distinct segments (and 44% top-1 accuracy), and up to 19.1% out of 2,604 segments for EEG recordings -- hence allowing the decoding of phrases absent from the training set. Model comparison and ablation analyses show that these performances directly benefit from our original design choices, namely the use of (i) a contrastive objective, (ii) pretrained representations of speech and (iii) a common convolutional architecture simultaneously trained across several participants. Together, these results delineate a promising path to decode natural language processing in real time from non-invasive recordings of brain activity
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
68,764
Reputation
10,607
Daps
185,939


AI Can Write Near Human-Level College Textbooks​

In our pursuit to replicate recent research findings, we discovered the potential to create AI-authored textbooks that rival human-authored ones in quality.​


OWEN COLEGROVE
OCT 4, 2023

Recent advancements in synthetic data research have garnered significant attention, with breakthroughs like Orca, Distilling Step-by-Step, and Textbooks are All You Need II, which have demonstrated that synthetic data can significantly enhance the efficiency of training and fine-tuning Large Language Models (LLMs).
Yet, a crucial, underexplored question emerges: "What role can AI play in human education?"

It's imperative to consider this, as AI's impact is likely to extend far beyond merely serving as conversational agents. The emergence of synthetic textbooks is supporting evidence to this point, something which we will cover in detail below. We came upon this understanding while attempting to replicate some recent research results.

Attempting to replicate Textbooks Are All You Need II:​

This work presented a 1.3 billion parameter model, phi-1.5, that appears to match the common sense reasoning benchmark results of models nearly ten times its size that rely on datasets exceeding tenfold in volume.

The main focus of this work was the training data which was made up of mostly synthetic "textbook-like" data, amounting to roughly 21 billion synthetic tokens. The data was carefully constructed to impart common sense reasoning and a broad spectrum of world knowledge. The research underscored that such synthetic data shows great potential in improving the efficiency of training LLMs. We can see this work as a deep attempt to use AI to self-educate more efficiently. This work is interesting because it penetrates the learning stack more deeply than similar previous efforts in fine-tuning.

While both phi-1 and phi-1.5 were generously open sourced, the synthetic data and code to generate it were not. Given that training methods for smaller models like phi are well-known, I found the data to be most interesting part of this research and it is still being kept secret. Therefore, I set out to replicate Microsoft’s work in fully open source setting, where I have found others to work with. We are now attempting to further our understanding and benefit the open source AI research community.

Some Key Observations From Early Replication Attempts

  1. Challenges in Synthetic Token Generation: Generating a staggering 21 billion synthetic tokens is no walk in the park. For perspective, a single A100 can churn out only between 50-100M tokens per day. Beyond sheer volume, ensuring diversity in seed generation is a complex undertaking.
  2. Structuring Output Data: There are a lot of directions we can go here. Should the output data adopt a more instructional mold, elucidating step-by-step reasoning? Or should it mirror traditional textbooks, filled with facts and somewhat deeper prose than what we find on Wikipedia?
  3. Knowledge Domain Distribution: There's ambiguity surrounding the optimal distribution of knowledge domains. How should we apportion the dataset across primary and secondary education (grades 1-12) versus collegiate level knowledge?
  4. Topic-wise Distribution: Another facet to consider is the distribution of the dataset across various topics. What proportion should be allotted to the humanities, math and science, etc.?

If you would like to explore some of the early experiments, they are on HuggingFace, here, here, and here. To read some of the most exemplar textbooks, go on Github here.

Onto High Quality Synthetic Textbooks

My initial goal was to quickly produce fully synthetic textbook-like data snippets. These samples were meant to cover a smattering of topics and to include significant step-by-step reasoning in many of the examples. I made the dataset in this manner because it was my best interpretation of the work performed by Microsoft. However, one could take a more literal approach to the synthetic textbook proposal, which is something investigated by Vikas (Vik) Paruchuri here. I was impressed by this work and felt there was really something to the idea of building textbooks with AI. We have since teamed up to combine our efforts in replicating this research.

Whether or not my initial approach or the one taken by Vik is the best for training LLMs, there was something worth extending further to see if we could produce something useful for humans.

The approach I took to investigate this was to build a 4 step generation pipeline which does the following:
  1. Scrapes all course pages from MIT OpenCourseWare.
  2. Uses AI to convert the scraped pages into properly structured syllabi.
  3. Uses AI to transform the syllabi into 20 chapter long textbook table of contents, complete with sections and subsections. You can see one such example here.
  4. Use AI to generate a textbook from the table of contents with the script here.
  5. Optional - As part of (4), augment generation by retrieving the most relevant content from Wikipedia.

To make the pipeline more concrete, I will share one random example of the final step and resulting output in authoring chapter subsection below.

Here is an example prompt for book generation:

### Instructions:

You are a writing a book titled “Organizing for Innovative Product Development: A Comprehensive Guide". You are currently writing the chapter and section shown below:
(NOTE FOR READERS - EXTRACTED FROM THE TABLE OF CONTENTS)
# Title: Organizing for Innovative Product Development: A Comprehensive Guide
## Chapter: - Chapter 20: Innovation and Legal Considerations:
### Section: - Section: 20.4 Legal Compliance in Innovation:
### Subsection (optional): 20.4c Implementing Legal Compliance in Innovation

To assist you in writing the chapter, you have been provided with some related context and recent chapter contents below:
### Related Context
``` (NOTE FOR READERS - THIS IS FROM WIKIPEDIA)

## Legal compliance

Legal compliance is the process or procedure to ensure that an organization follows relevant laws, regulations and business rules. The definition of legal compliance, especially in the context of corporate legal departments, has recently been expanded to include understanding and adhering to ethical codes within entire professions, as well. There are two requirements for an enterprise to be compliant with the law, first its policies need to be consistent with the law. Second, its policies need to be complete with respect to the law.

The role of legal compliance has also been expanded to include self-monitoring the non-governed behavior with industries and corporations that could lead to workplace indiscretions.

Within the LGRC realm, it is important to keep in mind that if a strong legal governance component is in place, risk can be accurately assessed and the monitoring of legal compliance be carried out efficiently. It is also important to realize that within the LGRC framework, legal teams work closely with executive teams and other business departments to align their goals and ensure proper communication.
### Legal consistency

Legal consistency is a property that declares enterprise policies to be free of contradictions with the law. Legal consistency has been defined as not having multiple verdicts for the same case.

The antonym Legal inconsistency is defined as having two rule that contradict each other. Other common definitions of consistency refer to “treating similar cases alike”. In the enterprise context, legal consistency refers to “obedience to the law”. In the context of legal requirements validation, legal consistency is defined
```
### Last textbook section content:
``` (NOTE FOR READERS - THIS IS FROM THE PREVIOUS AI RESPONSE)
## Chapter: Innovation and Legal Considerations
### Introduction

Innovation is the lifeblood of any organization that aims to stay competitive in today's fast-paced market. However, as we venture into the realm of new product development, it is crucial to understand that innovation does not exist in a vacuum. It is intertwined with a myriad of legal considerations that can significantly impact the trajectory of product development. This chapter, "Innovation and Legal Considerations," aims to shed light on this complex interplay.

In the process of creating groundbreaking products, organizations often find themselves navigating the intricate labyrinth of intellectual property rights, patents, copyrights, and trademarks. These legal aspects serve as both a shield and a sword, protecting one's own innovations while also ensuring that the organization does not infringe upon the rights of others.

Moreover, the legal landscape is not static. It evolves with the changing technological and societal norms, making it even more critical for organizations to stay abreast of the latest legal developments. This chapter will provide a comprehensive overview of these aspects, equipping readers with the knowledge to make informed decisions during the innovation process.

In addition, we will delve into the legal implications of collaborative innovation, a common practice in today's interconnected world. We will explore how to navigate partnerships, joint ventures, and open innovation initiatives while safeguarding the organization's legal interests.

In essence, this chapter will serve as a guide to understanding the legal considerations that underpin the innovation process. It will provide a roadmap for organizations to navigate the legal landscape, ensuring that their innovative endeavors are not only groundbreaking but also legally sound.
```

Notes:
- The book is being written in the popular Markdown format.
- The context may be truncated and is meant only to provide a starting point. Feel free to expand on it or take the response in any direction that fits the prompt, but keep it in a voice that is appropriate for an advanced undergraduate course at MIT.
- Avoid making any factual claims or opinions without proper citations or context to support them, stick to the proposed context.
- Format ALL math equations with the $ and $$ delimiters to insert math expressions in TeX and LaTeX style syntax. This content is then rendered using the highly popular MathJax library. E.g. write inline math like `$y_j(n)$` and equations like `$$\Delta w = ..$$`
- If starting a new section, include `### [Section Title]`
- If starting a new subsection, include `#### [Subsection Title]`
### Response:
 
Top