1/7
I made a Colab that turns a Github repo into 1 long but well-formatted prompt! Super useful for Long Context like #Claude Opus.
+ a map to show your LLM how everything is organized!
I used it to turn
@nomic_ai
's entire Github into one prompt, to ask questions.
Enjoy! Link
2/7
Github link (please star!):
Turn a Github Repo's contents into a big prompt for long-context models like Claude 3 Opus. - andrewgcodes/repo2prompt
github.com
It's a Colab so it's easy to use.
The instructions are there. You need a Github repo URL and an access token!
The example turns the Nomic Atlas repo into a long prompt:
GitHub - nomic-ai/nomic: Interact, analyze and structure massive text, image, embedding, audio and video datasets
2/n (one more scroll)
3/7
The structure of the prompt is:
Today I Learned for programmers contents
Directory tree
filename1
‘’’
Code contents
‘’’
filename2
‘’’
Code contents
‘’’
….
The directory tree is recursively generated and is important so that your LLM understands the file structure of your codebase.
The readme provides valuable initial context.
All code is encapsulated in triple quotes to help organize for the LLM!
Today I Learned for programmers
readme.md
4/7
Might be useful to @GregKamradt @Francis_YAO_ @DrJimFan @erhartford
5/7
This is a neat little function, it's how I generate the repo filesystem tree
it's recursive!
sure my cs106b teacher would be thrilled
def build_directory_tree(owner, repo, path='', token=None, indent=0, file_paths=[]):
items = fetch_repo_content(owner, repo, path, token)
tree_str = ""
for item in items:
if '.github' in item['path'].split('/'):
continue
if item['type'] == 'dir':
tree_str += ' ' * indent + f"[{item['name']}/]\n"
tree_str += build_directory_tree(owner, repo, item['path'], token, indent + 1, file_paths)[0]
else:
tree_str += ' ' * indent + f"{item['name']}\n"
# Indicate which file extensions should be included in the prompt!
if item['name'].endswith(('.py', '.ipynb', '.html', '.css', '.js', '.jsx', '.rst', '.md')):
file_paths.append((indent, item['path']))
return tree_str, file_paths
6/7
let me know how it works for you!
personally i am realizing that most large codebases don't fit in the Poe message length limit even for 200k
7/7
of course! thanks