Thank you for tuning in to this audio only podcast presentation. This is week 106 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Code generating systems.”
You probably were wondering how long into the new year before a bunch of focus and attention were placed at the efforts of what Hugging Face has been working toward building. You won’t have to wait any longer as this missive will dig into BigCode and some other efforts to use AI to build out code generating systems [1]. I was reading a TechCrunch article from Kyle Wiggers and wondering about how many different systems existed to accomplish this feat at the moment [2]. That article references 5 code generating systems that you could in practice elect to go evaluate. For completeness I’m listing Codex and Copilot separately in this list given that the interfaces are holistically different.
- BigCode – Hugging Face & ServiceNow’s R&D division [3]
- AlphaCode – DeepMind [4]
- CodeWhisperer – Amazon [5]
- Codex – OpenAI [6]
- Copilot – GitHub (Codex based) [7]
One of the things you might be interested in learning about at this point would be a dataset called “The Stack” which happens to be a collection of 6 terabytes of permissive code data that covers 300 programming languages [8]. The permissive code part of the dataset is interesting. The GitHub archive was roughly 69 terabytes of data that they filtered by licensing which they considered permissive and ended up with that 6 terabyte collection. Understanding how the dataset that feeds the code generating system was built is very important. All my contributions on GitHub are intended to be MIT license which I think should be permissive [9]. You have to deeply consider that a lot of propriety code writers and corporations employing said coders would not have given permission to use their code in a code generation system.
Generative coding systems will abound shortly and are in an early and developing state at the moment. We are getting to the point where you can instruct Codex to build something code related in terms of creating an application and you might get a great result. It’s not a universal code generation engine at this point. However, we are getting closer and closer to conversational code generation or some flavor of that outcome which I would classify as a generative coding system. It will be a seismic shift in code generation based on democratizing the creation of applications.
What would ChatGPT create?
If you were wondering what ChatGPT from OpenAI would have generated with the same prompt, then you are in luck. I had that output generated over at https://chat.openai.com/chat by issuing a prompt.
Code Generating Systems: How They Can Help You
Code generating systems are tools that automatically generate code based on input from the user. They can save developers a lot of time and effort, and can also help ensure consistency and quality in the codebase. Here are a few ways that code generating systems can help you:
- Speed up development: Code generators can significantly speed up the development process by generating repetitive or complex code automatically. This can free up developers to focus on more challenging or important tasks.
- Improve consistency: Code generators can help ensure that code is consistent across different parts of the codebase. This can make the codebase easier to understand, maintain, and scale.
- Reduce errors: Code generators can help reduce the number of errors in the codebase by automatically generating code that follows best practices and conventions.
- Simplify maintenance: Code generators can also simplify maintenance by making it easy to update large sections of code at once.
- Better documentation: Code generators usually generate comments and documentation in the code, which can greatly improve the codebase’s readability and maintainability.
- Increase productivity: With code generators, developers can focus on solving problems rather than spending time on writing repetitive code, this can increase overall productivity.
There are many different types of code generators available, each with its own set of features and capabilities. Some popular code generators include scaffolding tools, code templates, and model-driven development tools.
Before you start using a code generator, be sure to evaluate your options and choose one that meets your specific needs. And also, keep in mind that code generators should be used as a tool rather than a replacement for a developer’s expertise.
In conclusion, code generating systems can be a valuable tool for developers, providing a way to speed up development, improve consistency, reduce errors, and simplify maintenance. They can also help to increase overall productivity and efficiency.
Links and thoughts:
Top 5 Tweets of the week:
Footnotes:
[1] https://huggingface.co/bigcode
[3] https://www.bigcode-project.org/
[4] https://alphacode.deepmind.com/
[5] https://aws.amazon.com/codewhisperer/
[6] https://openai.com/blog/openai-codex/
[7] https://github.com/features/copilot
[8] https://huggingface.co/datasets/bigcode/the-stack
[9] https://github.com/nelslindahlx
What’s next for The Lindahl Letter?
- Week 107: Highly cited AI papers.
- Week 108: Twitter as a company probably would not happen today
- Week 109: Robots in the house
- Week 110: Understanding knowledge graphs
- Week 111: Natural language processing
If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.
Lindahl, N. (2023). The Lindahl letter: 104 Machine Learning Posts. Lulu Press, Inc. https://www.lulu.com/shop/nels-lindahl/the-lindahl-letter-104-machine-learning-posts/ebook/product-y244ep.html