A good bit of working things forward on a Saturday

Throughout the summer so far my backlog of posts over on Substack has been working its way down each week. Right now only the post for July 22 remains scheduled and ready to go for this coming Friday. Due to some scheduling concerns I worked ahead to make sure a backlog existed. That backlog would help me avoid missing a publication date. I have a Google Doc with a future Substack publication list that goes out all the way beyond the 2 year mark. Assuming that I follow the process and work on writing a Substack post every Saturday morning and working it to a recordable state on Sunday no publication dates should ever get missed. So far I’m on a 77 week streak of hitting my publication window target. I had wanted to get closer to real time on the creation schedule. It’s entirely possible at some point that I will rebuild a backlog and work a few weeks ahead. 

I just recorded “Substack Week 79: Why is diffusion so popular?” using the Audacity software and my Blue microphones Yeti X. The first part of the post recording process is underway right now and that includes a noise reduction. I just used the tail end of the recording to get a noise reduction profile. That profile was just applied to the whole recording to reduce the ambient background noise that is occurring in the room where the podcast audio was recorded. The next effect I ran on the audio was a loudness normalization to normalize the perceived loudness across the episode. This is important for overall continuity and feel for somebody listening to the audio. You do not want anything major to jump out of the normal audio window and be just loud for no reason. The last effect I run is a noise gate to just clean up anything that might have remained after that first noise reduction filter. I like to make sure my audio is clean before publishing it out to Substack. 

Not only did I record that Substack post for Week 79, but also I took the time to record “Substack Week 80: Bayesian optimization (ML syllabus edition 1/8).” The second recording was twice as long as the first, but I used the Marantz professional sound shield on that one. Thanks to those two recording efforts my backlog is now good for the next 20 days in terms of Substack posts. 

A ton of digital housekeeping on a Saturday

To prepare for the annual publication of The Lindahl Letter book I went ahead and loaded up the posts from week 50 to 77. That involved doing the typesetting for roughly 25,000 words that spanned about 100 pages. I’m considering either bundling two years of posts together or just publishing one year at a time. 

I wrote the social media copy for sharing to Twitter and LinkedIn this week:

You can check out week 74 of my weekly technology newsletter The Lindahl Letter titled “ML content automation; Am I the prompt?” over on #Substack via the link below

—> Tags: #MachineLearning #ML #ArtificialIntelligence #AI #MLOps #AIOps

https://nelslindahl.substack.com/p/ml-content-automation#details

It got shared over to Twitter here:

The LinkedIn post is here:

https://www.linkedin.com/feed/update/urn:li:activity:6946447208812613632/

I had posted weekly into Facebook as well, but that yielded very little traffic back to the actual Substack site. After a few weeks of that effort I gave up and just stopped that part of my weekly social media sharing routine. I’m not entirely sure that sharing on Twitter and LinkedIn results in any subscribers, but it does create a small amount of traffic each week. 

At some point today, I’m going to work on writing the “Substack Week 78: Why is diffusion so popular?” post to create a draft. That is all queued up as a task for the day.

Working that backlog management process

On Friday my Substack backlog was 5 posts ahead. At that level of backlog a personal best was achieved in terms of working ahead. However, that is probably too far ahead. Part of getting to that point was making sure I was ready for a summer break where some vacationing was going to happen. That backlog will allow for continuity of publishing and for a bit of vacation. Right now the backlog is at 4 Substack posts and the last one will publish on July 8, 2022. Generally, I think the backlog should be 2-3 weeks, but it is possible that at some point it will move back to a more real time writing to publication cadence. Part of my writing focus is to build quality content that is not time specific or breaking news powered coverage. I want to examine topics within the artificial intelligence and machine learning space based on the merits of the inquiry and not based on a news reporting type of urgency. I try to begin with the final product in mind. Given that the final product of the Lindahl Letter is that it moves from a weekly Friday Substack email based delivery to a yearly volume of published prose. That focus on building something that can be packaged up for perpetuity remains an important consideration.

Those topics deserve to be deeply considered

Recording the audio for week 64 of “The Lindahl Letter” included a false start this week. I got about 3 minutes into the recording process and had to hit delete. My recording voice was not consistent. I realized that beyond drinking two shots of espresso some water needed to be consumed. After taking a moment and drinking some water I was able to complete the recording in two parts. Sometimes I have had to record each paragraph separately with a small pause in between sections. Ideally I would like to complete the recording in one good take, but that is not always realistic. Drafts of posts for week 65 and 66 are in progress at this point. It is entirely possible that tomorrow morning I’ll be able to record one or both of those posts. At this very moment, I’m 2 weeks ahead of the publication point. My spring break vacation occurred and my plan to build out a few weeks of buffer worked well. 

During the trip I did not bring any recording equipment with me to keep up with the podcast part of the equation. That is how the posts for week 65 and 66 are in progress. During the course of my vacation I had some time in the morning to sit down and write, but no ability to record. Both the digital divide and ethics in machine learning topics deserve my deepest focus and top quality writing efforts. Working a little bit ahead on those posts should help increase the quality of them and both of those topics deserve to be deeply considered.

Talking to other Substack writers

With about 55 or 56 other writers I attended a Substack event called, “category tour kickoff.” It seemed like a good idea to attend a few of these Substack events to get a feel for what other writers are doing these days on the Substack platform. It looks like they have events occurring for the rest of the month. That should be interesting. I’m planning on attending technology related events from the Substack calendar. 

Substack Tour Dates

That first Substack Go event

My initial reaction to the first Substack Go event today was to appreciate that they are trying to bring people together to form communities of interest. I’m curious when they will go old school over at Substack and start keeping a list of authors by topics like a catalog. Maybe we will see the return of the blogroll. Things opened up with a Substack community organizer talking about the programming and giving a rousing welcome to everybody across the globe that was in attendance.

81 other people from all over the globe were at the kick off of the Substack Go with Katie from the 81 other people from all over the globe were at the kick off of the Substack Go event series with Katie from the community team. I was a part of the very first Tolstoy hour event. Apparently, we are being connected to a squad of other writers to work together and collaborate. I’m pretending this was some type of Hogwarts sorting hat magic. I ended up being sorted into a small group of about 5 other writers. We got dropped into a Zoom breakout room to talk outside of the large group.

We opened by sharing our names and what we write about. Obviously, I shared that The Lindahl Letter just hit one year of weekly publishing. You can find that collection of thoughts here: https://nelslindahl.substack.com/ 

Bringing a collection of writers together will yield some fairly predictable results. We talked about the following topics:

  • Writing routines
  • Editing 
  • Deciding what to write
  • Growth hacking? Building your subscriber base
  • How to keep in touch

During the event I subscribed to several different Substack newsletters. I learned about a Discord group called, “Substack Writers Unite.” Summing up this very first Substack Go event would be as easy as tagging it as an introductory meet and greet. Nothing was recorded and the conversations were decent. I’m looking forward to the next event on Friday morning. 

Trying to refocus

Overall my plan to put the smartphone down and not keep it with me all the time is working out pretty well. That vtech “Connect to Cell” system works well enough as an extended headset to my Google Pixel 5 smartphone that it is almost like a house phone. It is sort of weird to hear phone calls ringing throughout the entire house again like the days of yore when land lines were a common household feature. 

The vast majority of the application alerts and notices that I spent all day clearing out are not necessary. A lot of unnecessary attention was going to that smartphone each day and that was easy enough to stop. For the next couple of days I have planned time off that could be spent writing and working on a few things. 

Today I picked back up and worked a little bit on my week 30 Substack post. It needed a little bit of refinement and rework to be ready for Friday. Intellectually I know that I should spend a few minutes on the next few posts and get them into suitably completed drafts. Initially I was able to work ahead a little bit more than what is happening now, but for some reason that process broke down and I am just working on one week at a time. If the content being produced was real time, then that would make sense as an approach. The content is however planned out weeks in advance making it much easier to produce drafts in a queue instead of working in real time to be timely based on the news of the day. Maybe that is the key to unlocking a different type of content at some point in the future. I have considered turning the weekly Substack post into both a YouTube video and a weekly podcast. I’m actually curious what has stopped me from turning the first 30 weeks of content into multimedia formats. It is probably some type of weird nostalgia for the written newsletters of the past.

Getting really focused and locking in to write for a prolonged period of time seems to be illusive. I’m able to focus on topics and complete work, but I’m struggling with really spending hours working on the same thing. That is something that is going to need to be remedied before longer form prose and projects are going to get done. Part of that is just being able to sit and type for a sustained burst of 30 minutes without shifting around and working on different things. Even right now Rocky the dog is trying to distract me with growls at a reflection in the glass of the door. It is way before sunrise right now and nobody is stirring in the house. Right now is the time for me to write and for Rocky the dog to hang out in my office.

Actualizing my stop doing list

Earlier this week I set up a vtech “Connect to Cell” system at the house. It basically connects to my smartphone via Bluetooth pretending to be a headset that is always connected. With that simple connection it is ready to answer calls and it rings at three different base stations throughout the house when I’m home with my smartphone. Setting up this system was pretty simple. It allows me to treat my cell phone like a home phone and leave it on the charging stand in my office. Part of this endeavor is to try to avoid touching my smartphone for a longer period of time during the day. Checking the alerts and notifications on my phone dozens of times a day is not really a productive thing to do with my time and energy. It is something that I’m trying very hard to put on my stop doing list. The power of the stop doing list is in how it frees up your energy and effort to work on the to do part of the list. 

Every morning I wake up before the sunrise and try to focus all of my attention and efforts at the start of the day to the act of writing. Being a writer demands some type of routine that actively directs your energy towards the production prose. That is how my daily routine works. I focus all of my energy without any distractions on the act of writing. Sometimes a little bit of research or other activities creep into the mix, but for the most part the simple act of dancing on the keyboard is what happens and it really is the essence of what should happen. Thoughts are converted from that present point of view into keystrokes. Ideally that would happen for several pages of prose creation at a time, but for some reason it seems to end up being something that happens in about a single page serving at a time. During the course of writing and focusing on the idea at hand something will inevitably pull me out of the typing and creation process and that shift will cause a breakdown in further prose creation. It’s amazing how powerful shifting your focus can be at any given time. 

Generally in the background either YouTube or Pandora is playing something that occupies a little bit of my attention. Just enough of my attention to help keep me in the pocket of writing, but not enough to totally grab my attention away from the task at hand. Last week I pulled apart the Google Doc that houses all my Substack posts from “The Lindahl Letter” and started to convert it into a Microsoft Word document capable of being published as a book. This time around for that effort I landed on using a paper size of A5. That seemed like a good size to format the content into for this journey. Today I just finished work on the content that will go out on Friday titled, “Substack Week 30: Integrations and your ML layer.” I’m going to have to remove the links and Tweets sections of each post to make the content more inline with a traditional paper bound publication. Part of the joy of a newsletter format is that the content can include live links as the delivery mechanism goes to phones and computers where people can interact with the content and open links. A more traditional manuscript is not geared toward that level of interaction. It is something that will generally be read from start to finish without a bunch of outbound links to videos or other content. 

Oh yeah — I need to circle back to the process of writing weekly missives in “The Lindahl Letter” newsletter and how that will end up being a book. I have 52 topics selected and queued up as part of the writing process. That means based on the previous shared information the project currently stands at 30 of 52 chapters being completed. Working with that content to edit, refine, and rework it to be a great start to finish read is going to require moving the content from the Google Doc each week into a more manuscript friendly format. During the course of that process I’m also going to need to really focus on reworking and expanding some of the content to be more academic instead of the purely conversational tone of the weekly newsletter. I’m not going to remove all the personal touches and invocations of personality as that would make the final product less appealing to a reader, but some rework is going to be necessary to make the final product more polished. 

I’m not entirely sure at the moment where that manuscript is going to end up getting published. It is pretty easy to publish an eBook out in the market. I know how to do that without any assistance from a publisher or a literary agent. 

My attention shifted to working on a novel called “Else” that was started back in 2018. Right now the novel was really the length of a short story and it stopped after a couple thousand words. It’s weird to read something that was written years ago and pick up the writing style and tone. You have to be in the right mood to make something like that work out. It will probably be easy enough for somebody to figure out exactly what chapter the previous effort ended on and what chapter I picked up writing today. This is one of those stories that is going to be written from start to finish and then edited.

Substack Week 4: Have an ML strategy… revisited

The post for week 4 is now up and live.

Welcome to the 4th post in this ongoing Substack series. This is the post where I’m going to go back and revisit two very important machine learning questions. First, I’ll take a look back at my answers to the question, “What exactly is an ML strategy?” Second, that will set the foundation to really dig in and answer a question about, “Do you even need an ML strategy?” Obviously, the answer to the question is a hard yes and you know that without question or hesitation. 

1. What exactly is an ML strategy?

As you start to sit down and begin the adventure that is linking budget line items to your machine learning strategy it will become very clear that some decisions have to be made.[1] That is where you will find that your machine learning strategy has to be clearly defined and based on use cases with solid return on investment. Otherwise your key performance indicators that are directly tied back to those budget line items are going to show performance problems. Being planful helps make sure things work out. 

Over the last couple of weeks this Substack series “The Lindahl Letter” has dug into various topics including machine learning talent, machine learning pipelines, machine learning frameworks, and of course return on investment modeling. Now (like right now) it is time to dig into your ML strategy. Stop reading about it and just start figuring out how to do it. Honestly, I held off on this post until we had some foundational groundwork setup to walk around the idea conceptually and kick the tires on what your strategy might actually look like. No matter where you are in an organization from the bottom to the top you can begin to ideate and visualize what could be possible from a machine learning strategy. Maybe start with something simple like a strategy statement written in a bubble located in the middle of a piece of paper and work outward with your strategy. That can help you focus in on the part to a data driven machine learning strategy based on a planful decision-making process.[2]

Part of your machine learning strategy must be about purpose, replication, and reuse. That is going to be at the heart of getting value back for the organization. Definable and repeatable results are the groundwork to predictable machine learning engagements. Machine learning is typically applied in production systems as part of a definable and repeatable process. That is how you get quality and speed. You have to have guardrails in place that keep things within the confines of what is possible for that model. Outside of that you must be clear on the purpose of using machine learning to do something for your organization. That strategy statement could be as simple as locate 5 use cases where at scale machine learning techniques could be applied in a definable and repeatable way.

Maybe your strategy starts out with a budget line item investing in the development of machine learning capabilities. Investment in training happens every year and is a pretty straightforward thing to do. Now you have part of it tagged to machine learning. From that perspective you could be walking down a path where you are doing it purely for employee engagement, because the team just really wants to do something cool and wants to leverage new technology. You may find yourself in a situation where the team really wants to do it and you can make that happen. Sure, they might figure out a novel way to use that energy and engagement to produce something that aligns to the general guiding purpose of the organization. Some of that is where innovation might drive future strategy, but it is better to have your strategy drive the foundations of how innovation is occurring in the organization. A myriad of resources about strategy exist and some of them are highly targeted in the form of online courses.[3]

From a budget line item to actually being operationalized you have to apply your machine learning strategy in a uniform way based on potential return on investment. After you do that you will know you are selecting the right path for the right reasons. Then you can begin to think about replication of both the results and process across as many applications as possible. Transfer learning both in terms of models and deployments really plays into this and you will learn quickly that after you figured out how to do it with quality and speed that applying that to a suite of things can happen much quicker. That is the power of your team coming together and being able to deliver results. That is why going after 

2. Do you even need an ML strategy?

Seeing the strategy beyond trees in the random forest takes a bit of perspective. Sometimes it is easier to lock in and focus on a specific project and forget about how that project fits into a broader strategy. Having a targeted focused ML strategy that is applied from the top down can help ensure the right executive sponsorship and resources are focused on getting results. Instead of running a bunch of separate efforts that are self-incubating it might be better to have a definable and repeatable process to roll out and help ensure the same approach can be replicated in cost effective ways for the organization. That being said… of course you need an ML strategy. 

Maybe an example of a solid ML strategy might be related to a cost containment or cost saving program to help introduce assistive ML products to allow a workforce to do things quicker with fewer errors. Executing that strategy would require operationalizing it and collecting data on the processes in action to track, measure and ensure positive outcomes.

Footnotes:

[1] Check out this article from February 2020 about KPIs and budgets https://hbr.org/2020/02/create-kpis-that-reflect-your-strategic-priorities 

[2] Interesting blog post from AWS https://aws.amazon.com/blogs/machine-learning/developing-a-business-strategy-by-combining-machine-learning-with-sensitivity-analysis/ 

[3] Here is an example of a course lecture you can freely watch right now https://www.coursera.org/lecture/deep-learning-business/2-2-business-strategy-with-machine-learning-deep-learning-0Jop8 

What’s next for The Lindahl Letter?

  • Week 5: Let your ROI drive a fact-based decision-making process
  • Week 6: Understand the ongoing cost and success criteria as part of your ML strategy
  • Week 7: Plan to grow based on successful ROI
  • Week 8: Is the ML we need everywhere now? 
  • Week 9: What is ML scale? The where and the when of ML usage
  • Week 10: Valuing ML use cases based on scale
  • Week 11: Model extensibility for few shot GPT-2
  • Week 12: Confounding within multiple ML model deployments
  • Week 13: Building out your ML Ops 
  • Week 14: My Ai4 Healthcare NYC 2019 talk revisited
  • Week 15: What are people really doing with machine learning?

I’ll try to keep the what’s next list forward looking with at least five weeks of posts in planning or review. If you enjoyed reading this content, then please take a moment and share it with a friend. 

My second Substack post went live

Well over at https://nelslindahl.substack.com/ my next post just went live today. 

Substack Week 2: Machine Learning Frameworks & Pipelines
Enter title… Machine Learning Frameworks & Pipelines
Enter subtitle… This is the nuts and bolts of the how in the machine learning equation

Ecosystems are beginning to develop related to machine learning pipelines. Different platforms are building out different methods to manage the machine learning frameworks and pipelines they support. Now is the time to get that effort going. You can go build out an easy to manage end to end method for feeding model updates to production. If you stopped reading for a moment and actually went and started doing research or spinning things up, then you probably ended up using a TensorFlow Serving instance you installed, Amazon SageMaker pipeline, or an Azure machine learning pipeline.[1] Any of those methods will get you up and running. They have communities of practice that can provide support.[2] That is to say the road you are traveling has been used before and used at scale. The path toward using machine learning frameworks and pipelines is pretty clearly established. People are doing that right now. They are building things for fun. They have things in production. At the same time all that is occurring in the wild, a ton of orchestration and pipeline management companies are jumping out into the forefront of things right now in the business world.[3]  

Get going. One way to get going very quickly and start to really think about how to make this happen is to go and download TensorFlow Extended (TFX) from Github as your pipeline platform on your own hardware or some type of cloud instance.[4] You can just as easily go cloud native and build out your technology without boxes in your datacenter or at your desk. You could spin up on GCP, Azure, or AWS without any real friction against realizing your dream. Some of your folks might just set up local versions of these things to mess around and do some development along the way. 

Build models. You could of course buy a model.[5] Steps exist to help you build a model. All of the machine learning pipeline setup steps are rather academic without models that utilize the entire apparatus. One way to introduce machine learning to the relevant workflow based on your use case is to just integrate with an API to make things happen without having to set up frameworks and pipelines. That is one way to go about it and for some things it makes a lot of sense. For other machine learning efforts complexity will preclude using an out of the box solution that has a callable API. You would be surprised at how many complex APIs are being offered these days, but they do not provide comprehensive coverage for all use cases.[6] 

What are you going to do with all those models? You are going to need to save them for serving. Getting setup with a solid framework and machine learning pipeline is all about serving up those models within workflows that fulfill use cases with defined and predictable return on investment models. 

From the point you implement it is going to be a race against time at that point to figure out when those models from the marketplace suffer an efficiency drop and some type of adjustment is required. You have to understand the potential model degradation and calculate at what point you have to shut down the effort due to return on investment conditions being violated.[7] That might sound a little bit hard, but if your model efficiency degrades to the point that financial outcomes are being negatively impacted you will want to know how to flip the off switch and you might be wondering why that switch was not automated. 

Along the way some type of adjustment to a model or parameters is going to be required. I have talked about this before at length, but just to recap here the way I look at return on investment is pretty straightforward based on the value of the initial ML model minus the initial value of the model and the final value minus the initial value divided by the cost of investment times 100%. Yeah that was a lot to read, but it’s just going to give you a positive or negative look at whether that return on investment is going to be there for you. At that point you are just following your strategy and thinking about the return on investment model.

So again strict return on investment modeling may not be the method that you want to use. I would caution against working for long periods without understanding the financial consequences. At scale, you can very quickly create breakdowns and other problems within a machine learning use case. It could even go so far that you may not find it worthwhile for your business case. Inserting machine learning into a workflow might not be the right thing to do and that is why calculating results and making fact based decisions is so important. 

Really any way you do it in a planful way that’s definable and repeatable is gonna work out great. That is fairly easy to say given that inserting fact based decision making and being willing to hit the off switch if necessary help prevent runway problems from becoming existential threats to the business. So having a machine learning strategy, doing things in a definable and repeatable way, and being ruthlessly fact based is kind of where I’m suggesting you go. 

Obviously, you got to take everything that I say with a grain of salt, you should know upfront that I’m a big Tensorflow enthusiast. That’s one of the reasons why I use it as my primary example, but it doesn’t mean that that’s the absolute right answer for you. It’s just the answer that I look at most frequently and always look to first before branching out to other solutions. That is always based on the use case and I avoid letting technology search for problems at all costs. You need to let the use case and the problem at hand fit the solution instead of applying solutions until it works or you give up.

At this point in the story, you are thinking about or beginning to build this out and you’re starting to get ramped up. The excitement is probably building to a crescendo of some sort. Now you need  somewhere to manage your models. You may need to imagine for a moment that you do have models. Maybe you bought them from a marketplace and you skipped training all together. It’s an exciting time and you are ready to get going. So in this example, you’re going from just building (or having recently acquired) a machine learning model to doing something. At that moment, you are probably realizing that you need to serve that model out over and over again to create an actual machine learning driven workload. Not only does that probably mean that you’re getting to manage those models, but also you are going to need to serve out different models over time. 

As you make adjustments and corrections that introduce different modeling techniques you get more advanced with what you are trying to implement. One of the things you’ll find is that even the perfect model that you had and was right where you wanted it to be when you launched is slowly waiting to betray you and your confidence in it by degrading. You have to be ready to model and evaluate performance based on your use case. That is what lets you make quality decisions about model quality and how outcomes are being impacted. 

I have a few takeaways to conclude this installment of The Lindahl Letter. You have to remember that at this point machine learning models and pipelines are pretty much democratized. You can get them. They are out in the wild. People are using them in all kinds of different ways. You can just go ahead and introduce this technology to your organization with relatively little friction.

  • I’m still amazed that this technology is freely available.
  • Frameworks are well developed and have been pressure tested at scale.
  • Yeah, people have proven it works.
  • The process has been well documented and the path is clear.
  • Pipelines and automation save time. Fewer ML team members are needed to deliver this way.
  • A lot of the first time doing this gotchas are managed away in this model based on leveraging community knowledge and practice.
  • Serving multiple models and model management is hard.
  • None of this replaces the deep work required to wrangle the data.

Footnotes:

[1] Links to the referenced ML pipelines: https://www.tensorflow.org/tfx, https://aws.amazon.com/sagemaker/pipelines/, or https://docs.microsoft.com/en-us/azure/machine-learning/concept-ml-pipelines
[2] One of the best places to start to learn about machine learning communities would be https://www.kaggle.com/
[3] Read this if you have a few minutes… it is worth the read https://hbr.org/2020/09/how-to-win-with-machine-learning
[4] https://github.com/tensorflow/tfx
[5] This is one of the bigger ones https://aws.amazon.com/marketplace/solutions/machine-learning
[6] This is one example of services that are open for business right now https://cloud.google.com/products/ai 
[7] This is a wonderful site and this article is spot on https://towardsdatascience.com/model-drift-in-machine-learning-models-8f7e7413b563   

What’s next for The Lindahl Letter?

  • Week 3: Machine learning Teams
  • Week 4: Have an ML strategy… revisited
  • Week 5: Let your ROI drive a fact-based decision-making process
  • Week 6: Understand the ongoing cost and success criteria as part of your ML strategy
  • Week 7: Plan to grow based on successful ROI
  • Week 8: Is the ML we need everywhere now? 
  • Week 9: What is ML scale? The where and the when of ML usage
  • Week 10: Valuing ML use cases based on scale
  • Week 11: Model extensibility for few shot GPT-2
  • Week 12: Confounding within multiple ML model deployments

I’ll try to keep the what’s next list forward looking with at least five weeks of posts in planning or review. If you enjoyed reading this content, then please take a moment and share it with a friend.