Nels Lindahl — Functional Journal

A weblog created by Dr. Nels Lindahl featuring writings and thoughts…

Month: September 2023

  • Proxy models for elections

    Thank you for tuning in to this audio only podcast presentation. This is week 140 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Proxy models for elections.”

    Sometimes a simplified model of something is easier to work with. We dug into econometric models recently during week 136 and they can introduce a high degree of complexity. Even within the world of econometrics you can find information about proxy models. In this case today we are digging into proxy models for elections. My search was rather direct. I was looking for a list of proxy models being used for elections [1]. I was trying to dig into election forecasting proxy models or maybe even some basic two step models. I even zoomed in a bit to see if I could get targeted on machine learning election proxy models [2].

    After a little bit of searching around it seemed like a good idea to maybe consider what it takes to generate a proxy model equation to represent something. Earlier I had considered what the chalk model of election prediction would look like with using a simplified proxy of voter registration as an analog for voting prediction. I had really thought that would end up being a highly workable proxy, but it was not wholesale accurate. 

    Here are 3 papers I looked at this week:

    Hare, C., & Kutsuris, M. (2022). Measuring swing voters with a supervised machine learning ensemble. Political Analysis, 1-17. https://www.cambridge.org/core/services/aop-cambridge-core/content/view/145B1D6B0B2877FC454FBF446F9F1032/S1047198722000249a.pdf/measuring_swing_voters_with_a_supervised_machine_learning_ensemble.pdf 

    Zhou, Z., Serafino, M., Cohan, L., Caldarelli, G., & Makse, H. A. (2021). Why polls fail to predict elections. Journal of Big Data, 8(1), 1-28. https://link.springer.com/article/10.1186/s40537-021-00525-8 

    Jaidka, K., Ahmed, S., Skoric, M., & Hilbert, M. (2019). Predicting elections from social media: a three-country, three-method comparative study. Asian Journal of Communication, 29(3), 252-273. http://www.cse.griet.ac.in/pdfs/journals20-21/SC17.pdf 

    I spent some time messing around with OpenAI’s GPT-4 on this topic. That effort drove down to a few proxy models that are typically used. The top 10 seemed to be the following: social media analysis, google trends, economic indicators, fundraising data, endorsement counts, voter registration data, early voting data, historical voting patterns, event-driven, and environmental factors. Combining all 10 proxy models into a single equation would result in a complex, multivariable model. Here’s a simplified representation of such a model:

    E=α1​(S)+α2​(G)+α3​(Ec)+α4​(F)+α5​(En)+α6​(VR)+α7​(EV)+α8​(H)+α9​(Ed)+α10​(Ef)+β

    Where:

    • E is the predicted election outcome.
    • α1, α2​,…α10 are coefficients that determine the weight or importance of each proxy model. These coefficients would be determined through regression analysis or other statistical methods based on historical data.
    • S represents social media analysis.
    • G represents Google Trends data.
    • Ec represents economic indicators.
    • F represents fundraising data.
    • En represents endorsement count.
    • VR represents voter registration data.
    • EV represents early voting data.
    • H represents historical voting patterns.
    • Ed represents event-driven models.
    • Ef represents environmental factors.
    • β is a constant term.

    This equation is a linear combination of the proxy models, but in reality, the relationship might be non-linear, interactive, or hierarchical. The coefficients would need to be determined empirically, and the model would need to be validated with out-of-sample data to ensure its predictive accuracy. Additionally, the model might need to be adjusted for specific elections, regions, or time periods. It would be interesting to try to pull together the data to test that type of complex multivariable model. Maybe later on we can create a model with some agency designed to complete that task. 

    Footnotes:

    [1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=election+proxy+models&btnG=

    [2] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=election+proxy+models+machine+learning&btnG=

    What’s next for The Lindahl Letter? 

    • Week 141: Building generative AI chatbots
    • Week 142: Learning LangChain
    • Week 143: Social media analysis
    • Week 144: Knowledge graphs vs. vector databases
    • Week 145: Delphi method

    If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

  • A lot of editing effort is pending

    Reworked the ebook manuscripts for Forbidden Stones and Jupiter Darkly into the right format. It still needs to be edited from start to finish. Both are novella length works at the moment. 

    I downloaded and installed the Artifact application after listening to Casey Newton talk about the application during Code 2023. 

    Posted a quick update on LinkedIn, “Machine learning election models: We are quickly approaching 3 years of my weekly machine learning and other artificial intelligence posts on Substack every Friday. Things are changing so rapidly in the AI/ML space that writing research notes each week has been a very rewarding journey. This last week I spent some time taking a look at machine learning election modeling.”

    Locked in 54 pages of a new book on feedback. It needs to be edited line by line before release. 

    3 manuscripts need to be line by line edited before release into an ebook at the moment.

  • Machine learning election models

    Thank you for tuning in to this audio only podcast presentation. This is week 139 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Machine learning election models.”

    This might be the year that I finally finish that book about the intersection of technology and modernity. During the course of this post we will look at the intersection of machine learning and election models. That could very well be a thin slice of the intersection of technology and modernity at large, but that is the set of questions that brought us here today. It’s one of things we have been chasing along this journey. Oh yes, a bunch of papers exist related to the topic this week of machine learning and election models [1]. None of them are highly cited. A few of them are in the 20’s in terms of citation count, but that means the academic community surrounding this topic is rather limited. Maybe the papers are written, but have just not arrived yet out in the world of publication. Given that machine learning has an active preprint landscape that is unlikely. 

    That darth of literature is not going to stop me from looking at them and sharing a few that stood out during the search. None of these papers is approaching the subject from a generative AI model side of things they are using machine learning without any degree of agency. Obviously, I was engaging in this literature review to see if I could find examples of the deployment of models with some type of agency doing analysis within this space of election prediction models. My searching over the last few weeks has not yielded anything super interesting. I was looking for somebody in the academic space doing some type of work within generative AI constitutions and election models or maybe even some work in the space of rolling sentiment analysis for targeted campaign understanding. That is probably an open area for research that will be filled at some point.

    Here are 4 articles:

    Grimmer, J., Roberts, M. E., & Stewart, B. M. (2021). Machine learning for social science: An agnostic approach. Annual Review of Political Science, 24, 395-419. https://www.annualreviews.org/doi/pdf/10.1146/annurev-polisci-053119-015921 

    Sucharitha, Y., Vijayalata, Y., & Prasad, V. K. (2021). Predicting election results from twitter using machine learning algorithms. Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science), 14(1), 246-256. www.cse.griet.ac.in/pdfs/journals20-21/SC17.pdf  

    Miranda, E., Aryuni, M., Hariyanto, R., & Surya, E. S. (2019, August). Sentiment Analysis using Sentiwordnet and Machine Learning Approach (Indonesia general election opinion from the twitter content). In 2019 International conference on information management and technology (ICIMTech) (Vol. 1, pp. 62-67). IEEE. https://www.researchgate.net/publication/335945861_Sentiment_Analysis_using_Sentiwordnet_and_Machine_Learning_Approach_Indonesia_general_election_opinion_from_the_twitter_content 

    Zhang, M., Alvarez, R. M., & Levin, I. (2019). Election forensics: Using machine learning and synthetic data for possible election anomaly detection. PloS one, 14(10), e0223950. https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0223950&type=printable 

    My guess is that we are going to see a wave of ChatGPT related articles about elections post the 2024 presidential cycle. It will probably be one of those waves of articles without any of them really standing out or making any serious contribution to the academy. 

    The door is opening to a new world of election prediction and understanding efforts thanks to the recent changes in both model agency and generative AI models that help evaluate and summarize very complex things. It’s really about how they are applied to something going forward that will make the biggest difference in how the use cases play out. These use cases by the way are going to become very visible as the 2024 election comes into focus. The interesting part of the whole equation will be when people are bringing custom knowledge bases to the process to help fuel interactions with machine learning algorithms and generative AI. 

    It’s amazing to think how rapidly things can be built. The older models of software engineering are now more of a history lesson than a primer on building things with prompt-based AI. Andrew Ng illustrated in a recent lecture the rapidly changing build times. You have to really decide what you want to build and deploy and make it happen. Ferris Bueller once said, “Life moves pretty fast.” Now code generation is starting to move even faster! You need to stop and look around at what is possible, or you just might miss out on the generative AI revolution.

    You can see Andrew’s full video here: https://www.youtube.com/watch?v=5p248yoa3oE 

    Footnotes:

    [1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=Machine+learning+election+models&btnG= 

    What’s next for The Lindahl Letter? 

    • Week 140: Proxy models for elections
    • Week 141: Building generative AI chatbots
    • Week 142: Learning LangChain
    • Week 143: Social media analysis
    • Week 144: Knowledge graphs vs. vector databases

    If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

  • What happens at the end of the blog

    Earlier this week I was thinking about what exactly happens at the end of the blog. Most of the time in the lifecycle of a weblog or blog the end happens from abandonment. Probably the vast majority of blog type writing projects have been just abandoned. At some point, the writer just stops producing that type of prose and moves along to something new. A few of them were powered by writers that sustained them for years or perhaps decades. Those platforms of prose generation stood the test of online time. Generally, at the point of abandonment most of the self hosted blog experiments eventually vanish, expire, or are terminated. Sometimes they were built on a platform that just sustains and lingers. Those free platforms sometimes can last a very long time in the online world. 

    In my case, from this point I know that the servers are paid out 5 years from now and assuming the platform properly updates itself the blog could survive during that time frame. Certainly the prose won’t really improve during that time. It will just survive online. My plans at the moment are to keep adding to the content. I write for the blog without consideration for an audience. The content is created really for my own purposes of writing. Throughout the last 20 years the blog content just mostly sits, lingers, and remains unmoving and uncompelling. It’s writing without a discrete future purpose. The prose was formed within the process of writing. 

    Considering some writing schedule updates:

    • Saturday – daily blogging, early morning hours spent on The Lindahl Letter development
    • Sunday – daily blogging, early morning hours spent on The Lindahl Letter podcast recording
    • Monday – daily blogging, nels.ai development
    • Tuesday – daily blogging, nels.ai recording 
    • Wednesday – daily blogging, nels.ai publishes at 5 pm
    • Thursday – daily blogging, big coding adventures
    • Friday – daily blogging, The Lindahl Letter goes out at 5 pm

    I have the outline of a book that probably needs to be written sometime soon. I could devote my Saturday and Sunday early morning time to working on the chapters of that book as blocks of content creation. All of that content is listed in the backlog and will eventually get built, but maybe the time to produce a certain section of that backlog is now instead of leader. It’s always the reframe of action that the time is now. Finding and sustaining the now is probably the harder part of that equation.

  • Attending an event and some focus time

    Committed the first real post for nels.ai and have that system up and running for publishing on Wednesdays going forward.

    I’m attending the CIO Future of Work Summit this morning. The first event was titled, “The Radical Next: Designing the Organization of the Future.” 

    Throughout the day I stayed in the sessions. A few of them were pretty interesting:

    • “Is There an Easier Way to Manage and Remediate Risk?”
    • “Game Changer: Deploying Gen AI to Maximize Customer Experience”
    • “From Shadow IT to Shadow AI: How to Gain Visibility and Rein in Unauthorized Usage”
    • “Deliver a Cost-Effective, Sustainable & High-Quality Digital Workplace”
    • “Friend or Foe? What’s the Role of Automation and GenAI for the Future of Work?”

    Edited and scheduled vlog day 33.

  • A lot of housekeeping tasks

    Setup for the new nels.ai domain is now complete. The process ended up needing a bit of time (over a day) for the DNS to sort out. I’m going to spend some time focusing on building content for the new domain today. 

    Polling for week 38 was completed and loaded. I need to do some benchmarking this week. 

    I need to release a version of my base election prediction proxy model. The chalk model was not very predictive on a single factor. It’s time for a new and better model. 

    I set up https://linktr.ee/nelslindahl today. This is one of those platforms that I’m not entirely sold on, but I thought it was time to build out as a profile. 

    I signed up for https://openai.com/form/red-teaming-network

  • Working on learning

    I set my alarm an hour early to join a class called “Application Development with Cloud Run” which was a part of Google’s Innovators Plus: Live Learning Events. I plan on attending a few of these events, but for some reason this series of events runs very early in the morning. 

    You can see my Google Developer profile here: https://g.dev/nelslindahl 

    Apparently I have a Google Cloud profile as well here: https://googlecloud.qwiklabs.com/public_profiles/ff63ebc9-8d5b-49a8-9c70-25c0c292ab73 

    “LocalGPT Updates – Tips & Tricks”

  • Aligning the backlog for 2023

    Today was the first day in the vlog series that I forgot to upload the video the day it was created and had to quickly finish the upload after my alarm went off. That meant instead of writing this delightful post I had to work on the upload flow. Don’t worry the video was posted an hour later in the day than the normal series, but it went live and all is well. I’m using a basic video editing workflow where all the content is created on my Pixel 7 Pro, content is edited in PowerDirector on the phone, and then it just gets uploaded to YouTube. I’m actually getting fairly proficient at using PowerDirector after 29 days of consecutive vlog updates. The YouTube Shorts format works well enough and making content that is always less than 60 seconds is a different sort of challenge. At this point, I think that is a better format than turning out sub 10 minute videos. 

    Last night I got into a fight with my writing backlog. I have been looking at the remaining blocks of content for the year and trying to consider what could best occupy that space. I’m very clear on what will be created in 2024. I have a book in mind that needs to be written and I’ll go about creating it one block of content at a time. Probably from start to finish in my normal writing style. That effort will yield a roughly 20 chapter book. Aligning the backlog for 2023 has taken a bit of time and I’m really considering either combining my code development planning and writing planning or splitting them up. One of those two things needs to happen. Generally, I keep my writing backlog separate. That has worked out well enough. I’m starting to approach a window where I’m probably going to spend more time coding than writing words. I’m sure that is a good thing and naturally a part of the cycle of creation.

  • Being a reflective builder

    Today started off in a rather normal sort of way. Two shots of espresso were made and were delightful. Sunrise happened outside the view of my window. My Saturday morning routine of watching a bit of the WAN show happened without interruption. I took a few moments to review my top 5 things from yesterday and it is somewhat satisfying to review and consider the flow of things from day to day. Being a reflective builder is an important part of the process. My argument represented as a hypothesis would be that on any given day we can accomplish 5 blocks of time building good things. To me that is a reasonable way to look at building and creating. Some people for sure are able to work in a different way creating more or less blocks of production. Generally I’m looking at reasonably hard things that are broken into achievable blocks of things that can be done. I cannot code a whole application in a single block of time. That task could be broken into a reasonable set of blocks and I could certainly work on completing that effort. 

    Right now I’m working to finish up block 142 of the Lindahl Letter Substack publication. I’m seriously considering closing the newsletter at 150 weeks of writing effort. I might let it go till 156 weeks which would be a complete 3 years of content generation. I had considered switching to a pay model and delivering more in depth independent research each week. Each week right now I provide a brief research note on the topic I’m interested in researching. It’s really a sharing of what I’m interested in and that is the sole and direct focus of the writing enterprise on that one. I have already moved to sharing the same content on my weblog each week at the same time. That got me thinking about where people consume content these days.

    Within academic spaces content  has always been harder to access than it should have been with paywalls, high prices, and subscriptions. Journals are great for keeping and storing ideas shared between academics who subscribe and read the journal. It’s a community of interest and it works generally for that academic community. People outside that circle wanting access might need to go to a library or decide if they want to pay for the journal. It’s a limiting circle of content management. Publishing a series of research notes is probably essentially ephemeral in nature. While in the abstract the internet never forgets we have reached the point where it’s really large and probably not backed up. That ephemeral nature will mean that the weekly posts will probably at some point vanish. I had considered that reality from the start of the endeavor and at the end of each year I pooled that year’s Substack content into a book. Right now two of those ponderous tomes of thought sit next to me on the shelf. 

    Those efforts will probably stay in publication longer than anything stored on the internet at large. I keep my web hosting paid for 5 years out so in theory that is the longest horizon of serving up that content on the open internet. I’m digging into some deeper topics today and that is interesting for a Saturday morning.

  • Election prediction markets & Time-series analysis

    Thank you for tuning in to this audio only podcast presentation. This is week 138 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Prediction markets & Time-series analysis.”

    We have been going down the door of digging into considering elections for a few weeks now. You knew this topic was going to show up. People love prediction markets. They are really a pooled reflection of sentiment about the likelihood of something occuring. Right now the scuttlebut of the internet is about LK-99, a potential, maybe debunked, maybe possible room temperature superconductor that people are predicting whether or not it will be replicated before 2025 [1]. You can read the 22 page preprint about LK-99 on ArXiv [2]. My favorite article about why this would be a big deal if it lands was from Dylan Matthews over at Vox [3]. Being able to advance the transmission power of electrical lines alone would make this a breakthrough. 

    That brief example being set aside, now people can really dial into the betting markets for elections where right now are not getting nearly the same level of attention as LK-99 which is probably accurate in terms of general scale of possible impact. You can pretty quickly get to all posts that the team over at 538 have tagged for “betting markets” and that is an interesting thing to scroll through [4]. Beyond that look you could start to dig into an article from The New York Times talking about forecasting what will happen to prediction markets in the future [5].

    You know it was only a matter of time before we moved from popular culture coverage to the depths of Google Scholar [6].

    Snowberg, E., Wolfers, J., & Zitzewitz, E. (2007). Partisan impacts on the economy: evidence from prediction markets and close elections. The Quarterly Journal of Economics, 122(2), 807-829. https://www.nber.org/system/files/working_papers/w12073/w12073.pdf

    Arrow, K. J., Forsythe, R., Gorham, M., Hahn, R., Hanson, R., Ledyard, J. O., … & Zitzewitz, E. (2008). The promise of prediction markets. Science, 320(5878), 877-878. https://users.nber.org/~jwolfers/policy/StatementonPredictionMarkets.pdf

    Berg, J. E., Nelson, F. D., & Rietz, T. A. (2008). Prediction market accuracy in the long run. International Journal of Forecasting, 24(2), 285-300. https://www.biz.uiowa.edu/faculty/trietz/papers/long%20run%20accuracy.pdf 

    Wolfers, J., & Zitzewitz, E. (2004). Prediction markets. Journal of economic perspectives, 18(2), 107-126. https://pubs.aeaweb.org/doi/pdf/10.1257/0895330041371321 

    Yeah, you could tell by the title that a little bit of content related to time-series analysis was coming your way. The papers being tracked within Google Scholar related election time series analysis were not highly cited and to my extreme disappointment are not openly shared as PDF documents [7]. For those of you who are regular readers you know that I try really hard to only share links to open access documents and resources that anybody can consume along their lifelong learning journey. Sharing links to paywalls and articles inside a gated academic community is not really productive for general learning. 

    Footnotes:

    [1] https://manifold.markets/QuantumObserver/will-the-lk99-room-temp-ambient-pre?r=RWxpZXplcll1ZGtvd3NreQ

    [2] https://arxiv.org/ftp/arxiv/papers/2307/2307.12008.pdf

    [3] https://www.vox.com/future-perfect/23816753/superconductor-room-temperature-lk99-quantum-fusion

    [4] https://fivethirtyeight.com/tag/betting-markets/ 

    [5] https://www.nytimes.com/2022/11/04/business/election-prediction-markets-midterms.html

    [6] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=election+prediction+markets&btnG= 

    [7] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=election+time+series+analysis&oq=election+time+series+an 

    What’s next for The Lindahl Letter? 

    • Week 139: Machine learning election models
    • Week 140: Proxy models for elections
    • Week 141: Election expert opinions
    • Week 142: Door-to-door canvassing

    If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.