Open source machine learning security plus the machine learning and surveillance bonus issue

Thank you for tuning in to this audio only podcast presentation. This is week 72 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Open source machine learning security plus the machine learning and surveillance bonus issue.”

Security is the element of open source software that has to always be considered. Depending on the size of the associated developer community participating and the rate of development the number of security vulnerabilities is going to rise and fall. It will be a constant battle between those people trying to take advantage of vulnerabilities and the people who fight the good fight of software security. I have taken a serious look at this topic before. Realities of risk associated with software security are a problem for both open source and proprietary software. Arguments have been made that bringing together more and more people who are using a piece of software via the open source model will create a scenario where risk is reduced via transparency and contribution from a multitude of sources.

Back in January Kent Walker who is the President of Global Affairs for Google shared a blog post about, “Making Open Source software safer and more secure” [1]. That missive talked about log4j, a recent open source vulnerability that was impactful to a variety of industries. A reminder was included about a $100 million dollar donation to the Open Source Security Foundation [2]. Kent reduced the question down to figuring out the critical projects instead of trying to boil the ocean, being clear about security testing baselines, and figuring out methods for increased support from both public and private sources. 

Let’s zoom out for a second and look at public policy and regulation related to open source software security. Back on May 12, 2021 Executive Order 14028 was issued about “Improving the Nation’s Cybersecurity” [3]. The whole order is 15 pages long and may take you about 20 minutes to read. You can pivot from that to the update from May 11, 2022 to the National Institute of Standards and Technology (NIST) guidance on “Software Security in Supply Chains” [4]. That collection of online pages will take you a lot longer to read. It has a pretty high density of content. A lot of supply chains are now using machine learning and a mix of open source software elements to make things work along the path from production to delivery. As you can imagine a lot of policy makers are legitimately concerned about risks to supply chains. 

Now that we have considered concerns related to the developers, companies, and governments looking at open source software security you can see the scope of risk involved. I’m not sure I see any easy solutions on the horizon for this one. It is going to be something that has to be mitigated in real time and a lot of people are going to have to work together to make that happen on an ongoing basis. 

Does this week include some bonus edition content? Yes, it does. We are about to cover a bonus topic related to, “Machine learning and surveillance.”

Welcome to the bonus topic this week. My backlog of topics has grown a bit out of control. This is week 72 for example and the backlog has 120 topics. Moving forward I’m going to grab a few of the topics and work on making a few double issues of The Lindahl Letter. 

Making sense of and working with mind boggling amounts of data is something that machine learning can help with based on anomaly detection and computer vision elements. You can quickly work through hours of security video footage from cameras at a building and only work with the footage where motion or some type of change occurs. In terms of overnight security and monitoring this means that a large portion of the effort can be almost immediately cleared away. No review is required. You can then move from anomaly detection to the more complicated elements of computer vision to tag elements in the video and flag things for manual review or intervention by alarming or notification. I jumped into a quick Google Scholar search for all of the academic papers that might include or be related to, “computer vision machine learning surveillance” [5]. This is an area where you can find some really solid and well understood use cases. 

Back during week 37 coverage one of the links referenced out to the CLIP technology from OpenAI [6]. You can grab an implementation of that from Johan Modin over on GitHub that will help you do contrastive language to image searches [7]. When you see people in movies just searching hours and hours of video for the needle in the haystack and coming back with a quick response of all the examples of “The Man with One Red Shoe” it would be based on a technology like this making that magic happen. If you have not seen the 1985 Tom Hanks comedy thriller by the same name, then you might be missing out on the rich comedic depth of that reference. With the right amount of investment and computing power you can do amazing things in the surveillance space with machine learning. Some of them are shockingly advanced compared to where we were before.

The part of this topic that I really want to cover, but is again a deeper topic for conversation involves the various methods people stitch data together for internet tracking. Some of these tracking methods make the surveillance methods mentioned above seem primitive. I’ll try to figure out a solid way to explain how machine learning is being used within internet tracking frameworks and work that content into a weekly post in the not so distant future.  

Links and thoughts:

“[ML News] DeepMind’s Flamingo Image-Text model | Locked-Image Tuning | Jurassic X & MRKL”

“UiPath CEO Daniel Dines thinks automation can fight the great resignation”

“Vergecast: Google CEO Sundar Pichai on Google I/O 2022”

“The Download: Markdoc, VS Code Updates, Optimus Prime LEGO and More!”

Top 5 Tweets of the week:




[3] or in PDF here 



What’s next for The Lindahl Letter?

  • Week 73: Symbolic machine learning
  • Week 74: ML content automation
  • Week 75: Is ML destroying engineering colleges?
  • Week 76: What is post theory science?
  • Week 77: What is GPT-NeoX-20B?

I’ll try to keep the what’s next list for The Lindahl Letter forward looking with at least five weeks of posts in planning or review. If you enjoyed this content, then please take a moment and share it with a friend. Thank you and enjoy the week ahead.  

Leave a Reply