Nels Lindahl — Functional Journal

A weblog created by Dr. Nels Lindahl featuring writings and thoughts…

Month: February 2025

  • 20250205

    I started to wonder about when the Microsoft coding teams will start having one of their code improvement agents running on open source software on GitHub to suggest improvements by pull requests. Imagine if we had that type of learning agent improving and refining open source projects all the time to make things better and to initiate pull requests at some threshold of contribution level. This could be used as well to help find and fix known security vulnerabilities in repos that are posted to GitHub. We could also have a refinement layer for builds and branches that as code is produced does some additional deep research on the code and improvement suggestions. A whole world of possible GitHub and code repository improvements are going to be arriving soon based on what is now possible. It’s a huge door that is opening to really improve overall code quality and to make really big contributions across the open source code projects that make things run across a lot of foundational technological layers.

    I checked out this code earlier today: https://github.com/Deep-Agent/R1-V

  • 20250204

    I’m building up my collection of 4K Blu-ray science fiction movie collections. Fragmentation of content has meant that streaming is just not what it was at one point. I’d rather have a collection of movies and own them to watch them at my leisure. That is not a common or typically shared opinion about modern entertainment content. Things ended up getting off to a slow start today. I did not even really start writing until just before the very late Colorado Avalanche game tonight. These late start games are hard given that I’m going to fall asleep before the end of the game. It’s one of those wake up the next day and find out the final score sort of situations. 

  • 20250203

    In typical ChatGPT fashion I did not manage to get a single working .ipynb file out of it today. I hit my analysis limit and I’m going to have to wait until tomorrow to give it another old fashioned college try. Earlier today I watched this OpenAI YouTube video blog or relaxed video press release about how they are building and delivering a deep research agent. I generally hope this thing does a lot better job at producing deeper answers. My curiosity about it would be if it takes 15 minutes or more to produce some type of .ipynb file and it will just take a long time to produce a broken file. 

    The other thing I ended up reading today was this code project about the ah-ha moment from the DeepSeek research efforts or simply put, “Clean, minimal, accessible reproduction of DeepSeek R1-Zero.” You can find the code for here: https://github.com/Jiayi-Pan/TinyZero

  • 20250202

    Storing the web to interact with our language model driven future is probably something that should be considered. Search engines use sitemaps and crunch that data down after processing and collection. We could preprocess the content and provide files to be picked up with the content instead of trusting the processing and collection. I’m not sure we will end up with people packaging content for later distribution. That in some ways is a change from the delivering of hypertext to the online world to an entirely different method of sharing. We could just pre-build our own knowledge graph node and be ready to share with the world as the internet as it was constructed is functionally vanishing. Agentic interactions are on the rise and people visiting and reading online content will be diminished. Our method of interface will be with or through an agent and it will be a totally different online experience. 

    I actually spent a bunch of time yesterday working on distilling language models by starting with how to work with GPT2. A few of those notes are shared out on GitHub. They are fully functional and you can run them on Google Colab or anywhere you can work with a Python based notebook. I’m really interested in model distillation right now. A lot of libraries and frameworks for GPT distillation already exist and have for some time. You could grab Hugging Face’s transformers (DistilBERT, DistilGPT2), torch.distill, Google’s T5 distillation techniques, and DeepSpeed & FasterTransformer (for efficient inference). You could do some testing and see what results and benefits of GPT-2 distillation exist. A smaller model could provide reduced parameters for better efficiency. Faster inference could mean you could run on lower-end GPUs or even a CPU if you absolutely had to go that route. Distillation of models does help to preserve performance to retain most of the teacher model’s capabilities.

    Breakdown of the potential code steps:

    • Loads GPT-2 as a Teacher Model: The full-size GPT-2 model is used for generating soft targets.
    • Defines a Smaller GPT-2 as Student Model: Reduced layers and attention heads for efficiency.
    • Applies Knowledge Distillation Loss: Uses KL Divergence between student and teacher logits. Adds cross-entropy loss to ensure the model learns ground truth.
    • Trains the Student Model: Uses AdamW optimizer and trains for a few epochs.
    • Saves the Distilled Model: The final distilled model is saved for future use.

    I’m absolutely getting the most out of the free ChatGPT interface these days. I keep hitting the limit and having to wait till the next day to get more analysis time. That is probably a good use case to pay them via subscription, but I’m not going to do that. It makes it a little bit more fun to just try to get as much out of the free tier as humanly possible.

  • 20250201

    Today was one of those days where we moved 600 cases of cookies. That was a rather intense set of adventures that took up a solid chunk of the day. Tonight I’m watching the Paramount+ movie Section 31. You can apparently buy it on Blu-ray. Somebody probably should have found a way to have it open without so much exposition pulling characters together like somebody asked ChatGPT to merge a trimmed down Ocean’s style plot with some edgy Star Trek themes. I’m not entirely sure some or all of this Section 31 movie was not written by or in partnership with some sort of LLM being involved.