Nels Lindahl — Functional Journal

A weblog created by Dr. Nels Lindahl featuring writings and thoughts…

Month: February 2021

  • Substack Week 4: Have an ML strategy… revisited

    The post for week 4 is now up and live.

    Welcome to the 4th post in this ongoing Substack series. This is the post where I’m going to go back and revisit two very important machine learning questions. First, I’ll take a look back at my answers to the question, “What exactly is an ML strategy?” Second, that will set the foundation to really dig in and answer a question about, “Do you even need an ML strategy?” Obviously, the answer to the question is a hard yes and you know that without question or hesitation. 

    1. What exactly is an ML strategy?

    As you start to sit down and begin the adventure that is linking budget line items to your machine learning strategy it will become very clear that some decisions have to be made.[1] That is where you will find that your machine learning strategy has to be clearly defined and based on use cases with solid return on investment. Otherwise your key performance indicators that are directly tied back to those budget line items are going to show performance problems. Being planful helps make sure things work out. 

    Over the last couple of weeks this Substack series “The Lindahl Letter” has dug into various topics including machine learning talent, machine learning pipelines, machine learning frameworks, and of course return on investment modeling. Now (like right now) it is time to dig into your ML strategy. Stop reading about it and just start figuring out how to do it. Honestly, I held off on this post until we had some foundational groundwork setup to walk around the idea conceptually and kick the tires on what your strategy might actually look like. No matter where you are in an organization from the bottom to the top you can begin to ideate and visualize what could be possible from a machine learning strategy. Maybe start with something simple like a strategy statement written in a bubble located in the middle of a piece of paper and work outward with your strategy. That can help you focus in on the part to a data driven machine learning strategy based on a planful decision-making process.[2]

    Part of your machine learning strategy must be about purpose, replication, and reuse. That is going to be at the heart of getting value back for the organization. Definable and repeatable results are the groundwork to predictable machine learning engagements. Machine learning is typically applied in production systems as part of a definable and repeatable process. That is how you get quality and speed. You have to have guardrails in place that keep things within the confines of what is possible for that model. Outside of that you must be clear on the purpose of using machine learning to do something for your organization. That strategy statement could be as simple as locate 5 use cases where at scale machine learning techniques could be applied in a definable and repeatable way.

    Maybe your strategy starts out with a budget line item investing in the development of machine learning capabilities. Investment in training happens every year and is a pretty straightforward thing to do. Now you have part of it tagged to machine learning. From that perspective you could be walking down a path where you are doing it purely for employee engagement, because the team just really wants to do something cool and wants to leverage new technology. You may find yourself in a situation where the team really wants to do it and you can make that happen. Sure, they might figure out a novel way to use that energy and engagement to produce something that aligns to the general guiding purpose of the organization. Some of that is where innovation might drive future strategy, but it is better to have your strategy drive the foundations of how innovation is occurring in the organization. A myriad of resources about strategy exist and some of them are highly targeted in the form of online courses.[3]

    From a budget line item to actually being operationalized you have to apply your machine learning strategy in a uniform way based on potential return on investment. After you do that you will know you are selecting the right path for the right reasons. Then you can begin to think about replication of both the results and process across as many applications as possible. Transfer learning both in terms of models and deployments really plays into this and you will learn quickly that after you figured out how to do it with quality and speed that applying that to a suite of things can happen much quicker. That is the power of your team coming together and being able to deliver results. That is why going after 

    2. Do you even need an ML strategy?

    Seeing the strategy beyond trees in the random forest takes a bit of perspective. Sometimes it is easier to lock in and focus on a specific project and forget about how that project fits into a broader strategy. Having a targeted focused ML strategy that is applied from the top down can help ensure the right executive sponsorship and resources are focused on getting results. Instead of running a bunch of separate efforts that are self-incubating it might be better to have a definable and repeatable process to roll out and help ensure the same approach can be replicated in cost effective ways for the organization. That being said… of course you need an ML strategy. 

    Maybe an example of a solid ML strategy might be related to a cost containment or cost saving program to help introduce assistive ML products to allow a workforce to do things quicker with fewer errors. Executing that strategy would require operationalizing it and collecting data on the processes in action to track, measure and ensure positive outcomes.

    Footnotes:

    [1] Check out this article from February 2020 about KPIs and budgets https://hbr.org/2020/02/create-kpis-that-reflect-your-strategic-priorities 

    [2] Interesting blog post from AWS https://aws.amazon.com/blogs/machine-learning/developing-a-business-strategy-by-combining-machine-learning-with-sensitivity-analysis/ 

    [3] Here is an example of a course lecture you can freely watch right now https://www.coursera.org/lecture/deep-learning-business/2-2-business-strategy-with-machine-learning-deep-learning-0Jop8 

    What’s next for The Lindahl Letter?

    • Week 5: Let your ROI drive a fact-based decision-making process
    • Week 6: Understand the ongoing cost and success criteria as part of your ML strategy
    • Week 7: Plan to grow based on successful ROI
    • Week 8: Is the ML we need everywhere now? 
    • Week 9: What is ML scale? The where and the when of ML usage
    • Week 10: Valuing ML use cases based on scale
    • Week 11: Model extensibility for few shot GPT-2
    • Week 12: Confounding within multiple ML model deployments
    • Week 13: Building out your ML Ops 
    • Week 14: My Ai4 Healthcare NYC 2019 talk revisited
    • Week 15: What are people really doing with machine learning?

    I’ll try to keep the what’s next list forward looking with at least five weeks of posts in planning or review. If you enjoyed reading this content, then please take a moment and share it with a friend. 

  • Pushing along forward

    My level of writing productivity is still falling below the previous high water mark for the year. That is disappointing, but hopefully things will turn around today. Right now I’m sitting at 6 of 15 Substack posts locked in and 9 of 15 in partial progress. Most of that writing output occurred during two really productive writing sessions. Now that I look back on those glorious writing sessions I should have ridden them to more success. When you have that writing flow going you have to use to the fullest advantage possible. Working in small sections with dips and drabs is one way to go about it, but it is far better to ride the wave of productivity.

  • Ugh… productivity crash

    Being nervous about the big game this weekend really impacted my writing productivity. Even my mood right now is a little morose and an entire night has passed. Pretty much the whole weekend was a wash when it comes to producing prose and writing. I’m working on post 6 of 15 planned Substack installments. Each one has been carefully planned since the start of the project and my goal is to keep pretty far ahead of the publication date to allow for better overall quality.

  • My second Substack post went live

    Well over at https://nelslindahl.substack.com/ my next post just went live today. 

    Substack Week 2: Machine Learning Frameworks & Pipelines
    Enter title… Machine Learning Frameworks & Pipelines
    Enter subtitle… This is the nuts and bolts of the how in the machine learning equation

    Ecosystems are beginning to develop related to machine learning pipelines. Different platforms are building out different methods to manage the machine learning frameworks and pipelines they support. Now is the time to get that effort going. You can go build out an easy to manage end to end method for feeding model updates to production. If you stopped reading for a moment and actually went and started doing research or spinning things up, then you probably ended up using a TensorFlow Serving instance you installed, Amazon SageMaker pipeline, or an Azure machine learning pipeline.[1] Any of those methods will get you up and running. They have communities of practice that can provide support.[2] That is to say the road you are traveling has been used before and used at scale. The path toward using machine learning frameworks and pipelines is pretty clearly established. People are doing that right now. They are building things for fun. They have things in production. At the same time all that is occurring in the wild, a ton of orchestration and pipeline management companies are jumping out into the forefront of things right now in the business world.[3]  

    Get going. One way to get going very quickly and start to really think about how to make this happen is to go and download TensorFlow Extended (TFX) from Github as your pipeline platform on your own hardware or some type of cloud instance.[4] You can just as easily go cloud native and build out your technology without boxes in your datacenter or at your desk. You could spin up on GCP, Azure, or AWS without any real friction against realizing your dream. Some of your folks might just set up local versions of these things to mess around and do some development along the way. 

    Build models. You could of course buy a model.[5] Steps exist to help you build a model. All of the machine learning pipeline setup steps are rather academic without models that utilize the entire apparatus. One way to introduce machine learning to the relevant workflow based on your use case is to just integrate with an API to make things happen without having to set up frameworks and pipelines. That is one way to go about it and for some things it makes a lot of sense. For other machine learning efforts complexity will preclude using an out of the box solution that has a callable API. You would be surprised at how many complex APIs are being offered these days, but they do not provide comprehensive coverage for all use cases.[6] 

    What are you going to do with all those models? You are going to need to save them for serving. Getting setup with a solid framework and machine learning pipeline is all about serving up those models within workflows that fulfill use cases with defined and predictable return on investment models. 

    From the point you implement it is going to be a race against time at that point to figure out when those models from the marketplace suffer an efficiency drop and some type of adjustment is required. You have to understand the potential model degradation and calculate at what point you have to shut down the effort due to return on investment conditions being violated.[7] That might sound a little bit hard, but if your model efficiency degrades to the point that financial outcomes are being negatively impacted you will want to know how to flip the off switch and you might be wondering why that switch was not automated. 

    Along the way some type of adjustment to a model or parameters is going to be required. I have talked about this before at length, but just to recap here the way I look at return on investment is pretty straightforward based on the value of the initial ML model minus the initial value of the model and the final value minus the initial value divided by the cost of investment times 100%. Yeah that was a lot to read, but it’s just going to give you a positive or negative look at whether that return on investment is going to be there for you. At that point you are just following your strategy and thinking about the return on investment model.

    So again strict return on investment modeling may not be the method that you want to use. I would caution against working for long periods without understanding the financial consequences. At scale, you can very quickly create breakdowns and other problems within a machine learning use case. It could even go so far that you may not find it worthwhile for your business case. Inserting machine learning into a workflow might not be the right thing to do and that is why calculating results and making fact based decisions is so important. 

    Really any way you do it in a planful way that’s definable and repeatable is gonna work out great. That is fairly easy to say given that inserting fact based decision making and being willing to hit the off switch if necessary help prevent runway problems from becoming existential threats to the business. So having a machine learning strategy, doing things in a definable and repeatable way, and being ruthlessly fact based is kind of where I’m suggesting you go. 

    Obviously, you got to take everything that I say with a grain of salt, you should know upfront that I’m a big Tensorflow enthusiast. That’s one of the reasons why I use it as my primary example, but it doesn’t mean that that’s the absolute right answer for you. It’s just the answer that I look at most frequently and always look to first before branching out to other solutions. That is always based on the use case and I avoid letting technology search for problems at all costs. You need to let the use case and the problem at hand fit the solution instead of applying solutions until it works or you give up.

    At this point in the story, you are thinking about or beginning to build this out and you’re starting to get ramped up. The excitement is probably building to a crescendo of some sort. Now you need  somewhere to manage your models. You may need to imagine for a moment that you do have models. Maybe you bought them from a marketplace and you skipped training all together. It’s an exciting time and you are ready to get going. So in this example, you’re going from just building (or having recently acquired) a machine learning model to doing something. At that moment, you are probably realizing that you need to serve that model out over and over again to create an actual machine learning driven workload. Not only does that probably mean that you’re getting to manage those models, but also you are going to need to serve out different models over time. 

    As you make adjustments and corrections that introduce different modeling techniques you get more advanced with what you are trying to implement. One of the things you’ll find is that even the perfect model that you had and was right where you wanted it to be when you launched is slowly waiting to betray you and your confidence in it by degrading. You have to be ready to model and evaluate performance based on your use case. That is what lets you make quality decisions about model quality and how outcomes are being impacted. 

    I have a few takeaways to conclude this installment of The Lindahl Letter. You have to remember that at this point machine learning models and pipelines are pretty much democratized. You can get them. They are out in the wild. People are using them in all kinds of different ways. You can just go ahead and introduce this technology to your organization with relatively little friction.

    • I’m still amazed that this technology is freely available.
    • Frameworks are well developed and have been pressure tested at scale.
    • Yeah, people have proven it works.
    • The process has been well documented and the path is clear.
    • Pipelines and automation save time. Fewer ML team members are needed to deliver this way.
    • A lot of the first time doing this gotchas are managed away in this model based on leveraging community knowledge and practice.
    • Serving multiple models and model management is hard.
    • None of this replaces the deep work required to wrangle the data.

    Footnotes:

    [1] Links to the referenced ML pipelines: https://www.tensorflow.org/tfx, https://aws.amazon.com/sagemaker/pipelines/, or https://docs.microsoft.com/en-us/azure/machine-learning/concept-ml-pipelines
    [2] One of the best places to start to learn about machine learning communities would be https://www.kaggle.com/
    [3] Read this if you have a few minutes… it is worth the read https://hbr.org/2020/09/how-to-win-with-machine-learning
    [4] https://github.com/tensorflow/tfx
    [5] This is one of the bigger ones https://aws.amazon.com/marketplace/solutions/machine-learning
    [6] This is one example of services that are open for business right now https://cloud.google.com/products/ai 
    [7] This is a wonderful site and this article is spot on https://towardsdatascience.com/model-drift-in-machine-learning-models-8f7e7413b563   

    What’s next for The Lindahl Letter?

    • Week 3: Machine learning Teams
    • Week 4: Have an ML strategy… revisited
    • Week 5: Let your ROI drive a fact-based decision-making process
    • Week 6: Understand the ongoing cost and success criteria as part of your ML strategy
    • Week 7: Plan to grow based on successful ROI
    • Week 8: Is the ML we need everywhere now? 
    • Week 9: What is ML scale? The where and the when of ML usage
    • Week 10: Valuing ML use cases based on scale
    • Week 11: Model extensibility for few shot GPT-2
    • Week 12: Confounding within multiple ML model deployments

    I’ll try to keep the what’s next list forward looking with at least five weeks of posts in planning or review. If you enjoyed reading this content, then please take a moment and share it with a friend.

  • Chromebook scratches

    My Pixelbook Go has been wrapped in a dbrand black carbon top skin. That seemed like the right thing to do to prevent scratches on the top of this device. The bottom of the device seems fine so far, but the top was picking up scratches. Now it is all wrapped up in black carbon skin from dbrand. The installation process was pretty easy and took maybe five minutes to complete. Most of that time was related to using the hairdryer on the edges and corners.

    Apparently, I’m on a 10 day weblog publishing streak. That seems sort of exciting. I’m not exactly sure why, but it does seem a little exciting. Beyond the general blog writing that has been happening each day I’m tackling my new Substack posts with near reckless abandon. I’m working on them in the morning and the evening every day. That is a recipe for getting things done and it is working out so far. My goal is to get pretty far ahead in Substack posts so that a lot of refinement and tinkering can happen along the way.

  • A night of slow writing

    Ugh… I set up a path for writing on about 15 different topics. So far that path has been working well enough. I meant to write a few sentences this morning for the weblog, but for some reason that just did not happen.

  • On playing vinyl records

    That title is enough to trigger some folks. Please understand that is not my intention. My vinyl record setup is pretty straightforward. I run two powered speakers from Audioengine and they are directly connected to the record player. Audio is delivered in a stereo format from the two Audioengine A5+ speakers and the sound occurs in the room exactly how the audio engineer mastered the record to play. Yeah my epic post on playing vinyl records is really all about just letting the sound the artist was trying to achieve occur. I don’t do anything to change the sound or mix of the record. Honestly, I have over 50 records and things have worked out well enough. At one point, I did consider buying the subwoofer to go with my speaker set, but that purchase has not happened to date. 

  • Moving along Monday

    Overall my focus and productivity remain at a high level. I’m working through the 12 weeks of machine learning Substack content and really digging into that effort. It will be well beyond just writing a post 15 minutes before and clicking the publish button type of effort. Sometimes that can be fun and entertaining, but to really dig in and work at a deeper level that is not the way to go about it. At this point, I might even end up working out of one Google Doc for the entire Substack process. That is a departure from my put everything written in a day inside a single word processing document labeled by the day it was written. You can imagine that creates problems when working on academic papers and other efforts that require sustained efforts to achieve a higher quality end product.

    All right that felt really good. What you might ask? The second Substack post has been set to publish on Friday, February 5, 2021 at 5:00 PM (Denver). The very first post hit 23 total views. I’m hopeful that the second post will garner a little more viewership. One of the things that I have noticed is that Substack requires you to hustle and tell people about your work. The audience won’t just show up to read the words.