Machine learning election models

Thank you for tuning in to this audio only podcast presentation. This is week 139 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Machine learning election models.”

This might be the year that I finally finish that book about the intersection of technology and modernity. During the course of this post we will look at the intersection of machine learning and election models. That could very well be a thin slice of the intersection of technology and modernity at large, but that is the set of questions that brought us here today. It’s one of things we have been chasing along this journey. Oh yes, a bunch of papers exist related to the topic this week of machine learning and election models [1]. None of them are highly cited. A few of them are in the 20’s in terms of citation count, but that means the academic community surrounding this topic is rather limited. Maybe the papers are written, but have just not arrived yet out in the world of publication. Given that machine learning has an active preprint landscape that is unlikely. 

That darth of literature is not going to stop me from looking at them and sharing a few that stood out during the search. None of these papers is approaching the subject from a generative AI model side of things they are using machine learning without any degree of agency. Obviously, I was engaging in this literature review to see if I could find examples of the deployment of models with some type of agency doing analysis within this space of election prediction models. My searching over the last few weeks has not yielded anything super interesting. I was looking for somebody in the academic space doing some type of work within generative AI constitutions and election models or maybe even some work in the space of rolling sentiment analysis for targeted campaign understanding. That is probably an open area for research that will be filled at some point.

Here are 4 articles:

Grimmer, J., Roberts, M. E., & Stewart, B. M. (2021). Machine learning for social science: An agnostic approach. Annual Review of Political Science, 24, 395-419. https://www.annualreviews.org/doi/pdf/10.1146/annurev-polisci-053119-015921 

Sucharitha, Y., Vijayalata, Y., & Prasad, V. K. (2021). Predicting election results from twitter using machine learning algorithms. Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science), 14(1), 246-256. www.cse.griet.ac.in/pdfs/journals20-21/SC17.pdf  

Miranda, E., Aryuni, M., Hariyanto, R., & Surya, E. S. (2019, August). Sentiment Analysis using Sentiwordnet and Machine Learning Approach (Indonesia general election opinion from the twitter content). In 2019 International conference on information management and technology (ICIMTech) (Vol. 1, pp. 62-67). IEEE. https://www.researchgate.net/publication/335945861_Sentiment_Analysis_using_Sentiwordnet_and_Machine_Learning_Approach_Indonesia_general_election_opinion_from_the_twitter_content 

Zhang, M., Alvarez, R. M., & Levin, I. (2019). Election forensics: Using machine learning and synthetic data for possible election anomaly detection. PloS one, 14(10), e0223950. https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0223950&type=printable 

My guess is that we are going to see a wave of ChatGPT related articles about elections post the 2024 presidential cycle. It will probably be one of those waves of articles without any of them really standing out or making any serious contribution to the academy. 

The door is opening to a new world of election prediction and understanding efforts thanks to the recent changes in both model agency and generative AI models that help evaluate and summarize very complex things. It’s really about how they are applied to something going forward that will make the biggest difference in how the use cases play out. These use cases by the way are going to become very visible as the 2024 election comes into focus. The interesting part of the whole equation will be when people are bringing custom knowledge bases to the process to help fuel interactions with machine learning algorithms and generative AI. 

It’s amazing to think how rapidly things can be built. The older models of software engineering are now more of a history lesson than a primer on building things with prompt-based AI. Andrew Ng illustrated in a recent lecture the rapidly changing build times. You have to really decide what you want to build and deploy and make it happen. Ferris Bueller once said, “Life moves pretty fast.” Now code generation is starting to move even faster! You need to stop and look around at what is possible, or you just might miss out on the generative AI revolution.

You can see Andrew’s full video here: https://www.youtube.com/watch?v=5p248yoa3oE 

Footnotes:

[1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=Machine+learning+election+models&btnG= 

What’s next for The Lindahl Letter? 

  • Week 140: Proxy models for elections
  • Week 141: Building generative AI chatbots
  • Week 142: Learning LangChain
  • Week 143: Social media analysis
  • Week 144: Knowledge graphs vs. vector databases

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

What happens at the end of the blog

Earlier this week I was thinking about what exactly happens at the end of the blog. Most of the time in the lifecycle of a weblog or blog the end happens from abandonment. Probably the vast majority of blog type writing projects have been just abandoned. At some point, the writer just stops producing that type of prose and moves along to something new. A few of them were powered by writers that sustained them for years or perhaps decades. Those platforms of prose generation stood the test of online time. Generally, at the point of abandonment most of the self hosted blog experiments eventually vanish, expire, or are terminated. Sometimes they were built on a platform that just sustains and lingers. Those free platforms sometimes can last a very long time in the online world. 

In my case, from this point I know that the servers are paid out 5 years from now and assuming the platform properly updates itself the blog could survive during that time frame. Certainly the prose won’t really improve during that time. It will just survive online. My plans at the moment are to keep adding to the content. I write for the blog without consideration for an audience. The content is created really for my own purposes of writing. Throughout the last 20 years the blog content just mostly sits, lingers, and remains unmoving and uncompelling. It’s writing without a discrete future purpose. The prose was formed within the process of writing. 

Considering some writing schedule updates:

  • Saturday – daily blogging, early morning hours spent on The Lindahl Letter development
  • Sunday – daily blogging, early morning hours spent on The Lindahl Letter podcast recording
  • Monday – daily blogging, nels.ai development
  • Tuesday – daily blogging, nels.ai recording 
  • Wednesday – daily blogging, nels.ai publishes at 5 pm
  • Thursday – daily blogging, big coding adventures
  • Friday – daily blogging, The Lindahl Letter goes out at 5 pm

I have the outline of a book that probably needs to be written sometime soon. I could devote my Saturday and Sunday early morning time to working on the chapters of that book as blocks of content creation. All of that content is listed in the backlog and will eventually get built, but maybe the time to produce a certain section of that backlog is now instead of leader. It’s always the reframe of action that the time is now. Finding and sustaining the now is probably the harder part of that equation.

Election prediction markets & Time-series analysis

Thank you for tuning in to this audio only podcast presentation. This is week 138 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Prediction markets & Time-series analysis.”

We have been going down the door of digging into considering elections for a few weeks now. You knew this topic was going to show up. People love prediction markets. They are really a pooled reflection of sentiment about the likelihood of something occuring. Right now the scuttlebut of the internet is about LK-99, a potential, maybe debunked, maybe possible room temperature superconductor that people are predicting whether or not it will be replicated before 2025 [1]. You can read the 22 page preprint about LK-99 on ArXiv [2]. My favorite article about why this would be a big deal if it lands was from Dylan Matthews over at Vox [3]. Being able to advance the transmission power of electrical lines alone would make this a breakthrough. 

That brief example being set aside, now people can really dial into the betting markets for elections where right now are not getting nearly the same level of attention as LK-99 which is probably accurate in terms of general scale of possible impact. You can pretty quickly get to all posts that the team over at 538 have tagged for “betting markets” and that is an interesting thing to scroll through [4]. Beyond that look you could start to dig into an article from The New York Times talking about forecasting what will happen to prediction markets in the future [5].

You know it was only a matter of time before we moved from popular culture coverage to the depths of Google Scholar [6].

Snowberg, E., Wolfers, J., & Zitzewitz, E. (2007). Partisan impacts on the economy: evidence from prediction markets and close elections. The Quarterly Journal of Economics, 122(2), 807-829. https://www.nber.org/system/files/working_papers/w12073/w12073.pdf

Arrow, K. J., Forsythe, R., Gorham, M., Hahn, R., Hanson, R., Ledyard, J. O., … & Zitzewitz, E. (2008). The promise of prediction markets. Science, 320(5878), 877-878. https://users.nber.org/~jwolfers/policy/StatementonPredictionMarkets.pdf

Berg, J. E., Nelson, F. D., & Rietz, T. A. (2008). Prediction market accuracy in the long run. International Journal of Forecasting, 24(2), 285-300. https://www.biz.uiowa.edu/faculty/trietz/papers/long%20run%20accuracy.pdf 

Wolfers, J., & Zitzewitz, E. (2004). Prediction markets. Journal of economic perspectives, 18(2), 107-126. https://pubs.aeaweb.org/doi/pdf/10.1257/0895330041371321 

Yeah, you could tell by the title that a little bit of content related to time-series analysis was coming your way. The papers being tracked within Google Scholar related election time series analysis were not highly cited and to my extreme disappointment are not openly shared as PDF documents [7]. For those of you who are regular readers you know that I try really hard to only share links to open access documents and resources that anybody can consume along their lifelong learning journey. Sharing links to paywalls and articles inside a gated academic community is not really productive for general learning. 

Footnotes:

[1] https://manifold.markets/QuantumObserver/will-the-lk99-room-temp-ambient-pre?r=RWxpZXplcll1ZGtvd3NreQ

[2] https://arxiv.org/ftp/arxiv/papers/2307/2307.12008.pdf

[3] https://www.vox.com/future-perfect/23816753/superconductor-room-temperature-lk99-quantum-fusion

[4] https://fivethirtyeight.com/tag/betting-markets/ 

[5] https://www.nytimes.com/2022/11/04/business/election-prediction-markets-midterms.html

[6] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=election+prediction+markets&btnG= 

[7] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=election+time+series+analysis&oq=election+time+series+an 

What’s next for The Lindahl Letter? 

  • Week 139: Machine learning election models
  • Week 140: Proxy models for elections
  • Week 141: Election expert opinions
  • Week 142: Door-to-door canvassing

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

Maintaining 5 slots of building each day

Focusing inward on delivering requires a certain balance. My balance has been off recently. I got knocked off my feet and it impacted my ability to produce blocks of content for about a week. That type of thing does not normally happen to me. It was a new set of emotions and things to consider. Getting knocked down hard enough to pause for a moment and need to look around before moving again was a very new sensation. I’m not sure it was something that I was looking for or even prepared to experience. Really the only thing that put me back on the right track to success and deeper inward consideration (restored balance) was the passage of some time. It just took a little bit of time for me to internalize and move on to a new set of expectations. 

Each new day brings forward a set of time for creating blocks of content. My thoughts right now are around the consideration of making and maintaining 5 slots of building each day. To that end I have been sitting down on the whiteboard and writing down 5 good things to work on each day and trying to make sure they are attainable blocks to complete. At this time, I don’t want to put multi-slot blocks or all day blocks on the board for action and review. This is not the time for that type of stretching and personal growth by taking on highly complex activities. Right now is the time to make things clear, work on the clear things, and be stronger with that resolution every single day go forward. 

Maybe getting back to the absolute standard of sitting down at the very start of the day after drinking two shots of espresso and writing for a few minutes is the key to reframe my day. It is something that has been missing. It was missed. Perhaps it was missed more than I even realized at the time. I’ll admit to sitting down and watching about 4-5 seasons of the Showtime series Billions instead of actively writing and building. Alternatively, I could have been listing some graded sports cards on eBay and working to sell a few of them each day. Let’s zoom out for a second from those thoughts and consider what the next 30 days will uphold as a standard. 

One block of the daily 5 is going to be related to committing code on GitHub. I’m going to really focus my time and energy on making solid contributions to published code. Taking on that effort will help me be focused and committed to something that will become more and more necessary. Building code has changed a bit with the advent of LLMs, but the general thought exercise and logic remain pretty much the same. You might be able to take a wild run at something that was not attainable before and prompt your way to something magical. Generally you are going to go where logic can take you within the confines of the coding world as the framework is a lot more logical than it is purely chaotic in nature. 

5 good things for 9/15

  1. Rework block 142
  2. Commit something LangChain related in Colab
  3. Work on https://www.coursera.org/learn/intro-to-healthcare/home/week/1
  4. Review blocks 143-145
  5. Start building voter data baseline package

Outside of those efforts generally as a part of my daily routine I’m producing a daily vlog via YouTube Shorts and striving to output a daily reflection functional journal blog post. I’m going to try to take some inline functional journal notes throughout the day as well. That is going to structurally end up with a sort of blog post being written at the start of the day and then a bunch of more inline bullets being created. Posting is still going to happen at the end of the day or potentially a day delayed. 

Delivering independent research is more important now than ever. I have spent some time thinking about the models of how that research is delivered and what value it has generally. 

Block 142 is pretty much ready to go. I’ll be able to record it tomorrow morning and stay on track to have a 4 block recorded backlog of content ready to go for my Substack. 

During the course of reviewing blocks 143 to 145 I considered if those are even the right topics to spend time working. They are probably fine elements of things to research. It’s not about producing timely content, but instead it is about making meaningful blocks of content that are not time sensitive. That of course is always a harder thing to accomplish while producing independent research.

Tracking political registrations

Thank you for tuning in to this audio only podcast presentation. This is week 137 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Tracking political registrations.”

Trying to figure out how many republicans, democrats, and independents are registered in each state is actually really hard. It’s not a trivial task. Even with all our modern technology and the extreme power of the internet providing outsized connectedness between things and making content accessible to searches. Even GPT-4 from OpenAI with some decent plugins turned on will struggle to complete this task.Your best searches to get a full list by state are probably going to land you into the world of projections and surveys. One that will show up very quickly are some results from the Pew Research which contacted people (300 to 4,000 of them) from each state to find out more data about political affiliation [1]. They evaluated responses into three buckets with no lean, lean republication, or lean democrat. That allowed the results to evaluate based on sampling to get a feel for general political intentions. However, that type of intention based evaluation does not give you a sense of the number of voters within each state. 

It opened the door to me considering if political registration is even a good indicator of election outcomes. Sports tournaments rarely play out based on the seeding. That is the element of it that makes it exciting and puts the sport into the tournament. To that end back during week 134 I shared the chalk model to help explore a hypothesis related to registration being predictive. At the moment, I’m more interested to see how proxy models for predicting sporting events are working. Getting actual data to track changes in political registrations is an interesting process. ChatGPT, Bard, and Bing Chat are capable of providing some numbers if you prompt them properly. The OpenAI model GPT-3.5 has some older data from September 2021 and will tell you registered voters by state [2]. I started with a basic prompt, “make a table of voter registration by state.” I had to add a few encouraging prompts at some points, but overall the models all 3 spit out results [3]. The Bing Chat model really tried to direct you back to the United States Census Bureau website [4]. 

This is an area where setting up some type of model with a bit of agency to go out to the relevant secretary of states websites for the 30 states that provide some data might be a way to go to build a decent dataset. That would probably be the only way to really track the official data coming out by state to show the changes in registration over time. Charting that change data might be interesting as a directional view of how voters view themselves in terms of voter registration in a longitudinal way. People who participate in Kaggle have run into challenges where election result prediction is actually a competition [5]. It’s interesting and thinking about what features are most impactful during election prediction is a big part of that competition. Other teams are using linear regression and classification models to help predict election winners as well [6]. I was reading a working paper from Ebanks, Katz, and King published in May 2023 that shared an in depth discussion about picking the right models and the problems of picking the wrong ones [7][8]. 

To close things out here I did end up reading this Center for Politics article from 2018 that was interesting as a look back at where things were [9]. Circling back to the main question this week, I spent some time working within the OpenAI ChatGPT with plugins trying to get GPT-4 to search out and voter registration by state. I have been wondering why with a little bit of agency one of these models could not do that type of searching. Right now the models are not set up with a framework that could complete this type of tasking. 

Footnotes:

[1] https://www.pewresearch.org/religion/religious-landscape-study/compare/party-affiliation/by/state/ 

[2] https://chat.openai.com/share/8a6ea5e7-6e42-4743-bc23-9e8e7c4f79c5 

[3] https://g.co/bard/share/96b6f8d02e8e 

[4] https://www.census.gov/topics/public-sector/voting/data/tables.html 

[5] https://towardsdatascience.com/feature-engineering-for-election-result-prediction-python-943589d89414 

[6] https://medium.com/hamoye-blogs/u-s-presidential-election-prediction-using-machine-learning-88f93e7f6f2a

[7] https://news.harvard.edu/gazette/story/2023/03/researchers-come-up-with-a-better-way-to-forecast-election-results/

[8] https://gking.harvard.edu/files/gking/files/10k.pdf 

[9] https://centerforpolitics.org/crystalball/articles/registering-by-party-where-the-democrats-and-republicans-are-ahead/ 

What’s next for The Lindahl Letter? 

  • Week 138: Election prediction markets & Time-series analysis
  • Week 139: Machine learning election models
  • Week 140: Proxy models for elections
  • Week 141: Election expert opinions
  • Week 142: Door-to-door canvassing

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

Econometric election models

Thank you for tuning in to this audio only podcast presentation. This is week 136 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Econometric election models.”

It has been a few weeks here since we started by digging into a good Google Scholar search and you know this topic would be just the thing to help open that door [1]. My searches for academic articles are always about finding accessible literature that sits outside paywalls that is intended to be read and shared beyond strictly academic use. Sometimes that is easier than others when the topics lend themselves to active use cases instead of purely theoretical research. Most of the time these searches to find out what is happening at the edge of what is possible involve applied research. Yes, that type of reasoning would place me squarely in the pracademic camp of intellectual inquiry. 

That brief chautauqua aside, my curiosity here is how do we build out econometric election models or other model inputs to feed into large language model chat systems as prompt engineering for the purposes of training them to help either predict elections or interpret and execute the models. This could be a method for introducing extensibility or at least the application of targeted model effect to seed a potential future methodology within the prompt engineering space. As reasoning engines go it’s possible that an econometric frame could be an interesting proxy model within generative AI prompting. It’s a space worth understanding a little bit more for sure as we approach the 2024 presidential election cycle. 

I’m working on that type of effort here as we dig into econometric election models. My hypothesis here is that you can write out what you want to explain in a longer form as a potential input prompt to train a large language model. Maybe a more direct way of saying that is we are building a constitution for the model based on models and potentially proxy models then working toward extensibility and agency from introducing those models together. For me that is a very interesting space to begin to open up and kick the tires on in the next 6 months. 

Here are 6 papers from that Google Scholar search that I thought were interesting:

Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106. https://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.31.2.87 

Fair, R. C. (1996). Econometrics and presidential elections. Journal of Economic Perspectives, 10(3), 89-102. https://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.10.3.89

Armstrong, J. S., & Graefe, A. (2011). Predicting elections from biographical information about candidates: A test of the index method. Journal of Business Research, 64(7), 699-706. https://faculty.wharton.upenn.edu/wp-content/uploads/2012/04/PollyBio58.pdf 

Graefe, A., Green, K. C., & Armstrong, J. S. (2019). Accuracy gains from conservative forecasting: Tests using variations of 19 econometric models to predict 154 elections in 10 countries. Plos one, 14(1), e0209850. https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0209850&type=printable

Leigh, A., & Wolfers, J. (2006). Competing approaches to forecasting elections: Economic models, opinion polling and prediction markets. Economic Record, 82(258), 325-340. https://www.nber.org/system/files/working_papers/w12053/w12053.pdf 

Benjamin, D. J., & Shapiro, J. M. (2009). Thin-slice forecasts of gubernatorial elections. The review of economics and statistics, 91(3), 523-536. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2860970/pdf/nihms190094.pdf 

Beyond those papers, I read some slides from Hal Varian on “Machine Learning and Econometrics” from January of 2014 [2]. The focus of the slide was applied to modeling human choices. Some time was spent on trying to understand the premise that the field of machine learning could benefit from econometrics. To be fair since that 2014 set of slides you don’t hear people in the machine learning space mention econometrics that often. Most people talk about Bayesian related arguments. 

On a totally separate note for this week I was really into running some of the Meta AI Llama models on my desktop locally [3]. You could go out and read about the new Code Llama which is an interesting model trained and focused on coding [4]. A ton of researchers got together and wrote a paper about this new model called, “Code Llama: Open Foundation Models for Code” [5]. That 47 page missive was shared back on August 24, 2023, and people have already started to build alternative models. It’s an interesting world in the wild wild west of generative AI these days. I really did install LM Studio on my Windows workstation and run the 7 billion parameter version of Code Llama to kick the tires [6]. It’s amazing that a model like that can run locally and that you can interact with it locally using your own high end graphics card.

Footnotes:

[1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=econometric+election+prediction+models&btnG= 

[2] https://web.stanford.edu/class/ee380/Abstracts/140129-slides-Machine-Learning-and-Econometrics.pdf 

[3] https://ai.meta.com/llama/ 

[4] https://about.fb.com/news/2023/08/code-llama-ai-for-coding/ 

[5] https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/

[6] https://lmstudio.ai/

What’s next for The Lindahl Letter? 

  • Week 137: Tracking political registrations
  • Week 138: Prediction markets & Time-series analysis
  • Week 139: Machine learning election models
  • Week 140: Proxy models
  • Week 141: Expert opinions

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

Polling aggregation models

Thank you for tuning in to this audio only podcast presentation. This is week 135 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Polling aggregation models.”

I read and really enjoyed the book by Nate Silver from 2012 about predictions. It’s still on my bookshelf. Strangely enough the cover has faded more than any other book on the shelf. 

Silver, N. (2012). The signal and the noise: Why so many predictions fail-but some don’t. Penguin.

That book from Nate is sitting just a few books over from Armstrong’s principles of forecasting. A book that I have referenced a number of times before. It will probably be referenced more as we move ahead as well. It’s a resource that just keeps on giving. Math it’s funny like that. 

Armstrong, J. S. (Ed.). (2001). Principles of forecasting: a handbook for researchers and practitioners (Vol. 30). Boston, MA: Kluwer Academic.

My podcast feed for years has included the 538 podcast where I listened to Nate and Galen talk about good and bad uses of polling [1]. Sadly, it does not currently feature Nate after the recent changes over at 538. They reported on and ranked a lot of polling within the 538 ecosystem of content. Model talk and the good or bad use of polling were staples in the weekly pod journey. I really thought at some point they would take all of that knowledge about reviewing, rating, and offering critiques of polling to do some actual polling. Instead they mostly offered polling aggregation which is what we are going to talk about today. On the website they did it really well and the infographics they built are very compelling. 

Today setting up and running a polling organization is different from before. A single person could run a large amount of it thanks to the automation that now exists. An organization with funding to set up automation and run the polling using an IVR and some type of dialogue flow [2]. Seriously, you could build a bot setup that placed calls to people and completed a survey in a very conversational way. That still runs into the same problem that phone survey methods are going to face. I screen out all non-contact phone calls and I’m not the only person doing that. Cold calls are just not effective for business or polling in 2023 and the rise of phone assistants that can effectively block out noise are going to make the phone methodology even harder to effectively utilize.

It’s hard to make a hype based drum roll on the written page. You are going to have to imagine it for me to get ready for this next sentence. Now that you are imagining that drum roll… Get ready for a year of people talking about AI and the 2024 election. It probably won’t get crypto bad in terms of the hype trane showing up to nowhere, but it will get loud. I’m going to contribute to that dialogue, but hopefully in the softest possible way. Yeah, I’m walking right into that by reflecting on the outcome of my actions while simultaneously writing about them during this missive.

You can see an article from way back in November 2020 talking about how AI does show some potential to gauge voter sentiment [3]. That was before all of the generative AI and agent hype started. Things are changing rapidly in that space and I’m super curious about what can actually be accomplished in that space. I’m spending time every day learning about this and working on figuring out ways to implement this before the next major presidential election in 2024. An article from The Atlantic caught my attention as it talked about how nobody responds to polls anymore and started to dig into what AI could possibly do in that space, microtargeting, and Kennedy (1960) campaign references [4]. That was an interesting read for sure but you could veer over to VentureBeat to read about how AI fared against regular pollsters in the 2020 election [5]. That article offered a few names to watch out for and dig into a little more including KCore Analytics, expert.ai, and Polly. 

We will see massive numbers of groups purporting to use AI in the next election cycle. Even The Brooking Institute has started to share some thoughts on how AI will transform the next presidential election [6]. Sure you could read something from Scientific American where people are predicting that AI could take over and undermine democracy [7]. Dire predictions abound and those will probably also accelerate as the AI hype train pulls up to election station during the 2024 election cycle [8][9]. Some of that new technology is even being deployed into nonprofits to help track voters at the polls [10].

Footnotes:

[1] https://projects.fivethirtyeight.com/polls/ 

[2] https://cloud.google.com/contact-center/ccai-platform/docs/Surveys 

[3] https://www.wsj.com/articles/artificial-intelligence-shows-potential-to-gauge-voter-sentiment-11604704009

[4] https://www.theatlantic.com/technology/archive/2023/04/polls-data-ai-chatbots-us-politics/673610/ 

[5] https://venturebeat.com/ai/how-ai-predictions-fared-against-pollsters-in-the-2020-u-s-election/

[6] https://www.brookings.edu/articles/how-ai-will-transform-the-2024-elections/ 

[7] https://www.scientificamerican.com/article/how-ai-could-take-over-elections-and-undermine-democracy/

[8] https://www.govtech.com/elections/ais-election-impact-could-be-huge-for-those-in-the-know

[9] https://apnews.com/article/artificial-intelligence-misinformation-deepfakes-2024-election-trump-59fb51002661ac5290089060b3ae39a0 

What’s next for The Lindahl Letter? 

  • Week 136: Econometric election models
  • Week 137: Tracking political registrations
  • Week 138: Prediction markets & Time-series analysis
  • Week 139: Machine learning election models
  • Week 140: Proxy models

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

A moment of promise stood out

Yesterday was sort of a lost day of sorts. I know that some days are going to be less productive than others, but yesterday was a little bit on the disappointing side of things. Part of that is on me for not working toward starting the day with a reflection on what I accomplished the day before and considering the handful of things that would be best to accomplish today. Sure that sounds simplistic and to some degree being planful and working to deliver a defined backlog is a direct strategy. My backlog has about 200 blocks or more of content in it that need to be worked. My biggest push this week has been to really dive in and use LangChain both locally and within some notebooks [1]. That effort should translate to a few GitHub notebook outputs and maybe some fresh look items being added into the backlog.

I’m going to spend some time today learning a bit more about IDEFICS as well which Hugging Face just shared with the world [2]. 

Some time was spent working on the content blocks for weeks 139 to 142. 

Footnotes:

[1] https://www.langchain.com/[2] https://huggingface.co/blog/idefics

Some things are just routine

Some things are just routine. Uninterrupted writing during the earliest hours on Saturday and Sunday. Walking the dog in the morning. 25 evening push ups. Drinking a glass of water with every meal. These things just become a part of a regularly scheduled plan. It’s good to have some things that are routine. I’m really trying to focus on working on delivering the items in my writing backlog. Earlier this week things became a lot less stable. My reaction to that instability was not to power through a bunch of backlog items. It’s good to understand that reaction. I’m really just sitting down now to write a little bit of content. Everything within the foundation of my writing ability was shaken a little bit. Distinctly shaken, not stirred, is how things felt for the moment part. A whirlwind of emotion and reaction beyond anything that was going to just drive some writing. It was one of those moments where you have to take a step back and consider things within a more macro context. 

I’m actually thinking about releasing some type of poll on Twitter/X each day. That activity would be done just for my own personal amusement. I had considered doing something within the spaces (live audio). That idea never went anywhere. I have posted over 400 videos on YouTube. Apparently, I have posted as of this moment exactly 444 videos. A bunch of those happened during the time when I thought it would be fun to publish a daily vlog. Maybe going back and trying to do that again would be fun. The best way to do that would be to just commit to recording short videos on my Google Pixel 7 Pro, editing them using the PowerDirector application, and publishing them from the phone. That is pretty much how the last round of vlogging happened. I might string together a few random clips and do a test run without making it privately visible. We will see how that ends up going tomorrow.

You might have guessed that it is now the next day and I was able to run the publishing test on YouTube. Oddly enough, the system actually uploaded the video as my very first YouTube Shorts contribution. It was a pretty easy thing to accomplish and PowerDirector was just as easy to use as I remember from the last time around. Loading content to the project, editing it up a bit, and publishing took just a few minutes. It was a very low friction activity and it was sort of interesting to see it show up as a Shorts contribution. I think that happened due to the orientation of the video. All the little snippets I grabbed happened to be shot vertically. That may be the key to triggering that type of upload category. I’m not exactly sure on that one. 

I was able to get back into the routine of things for my Saturday morning early hour writing efforts. Two blocks of content were developed including weeks 139 and 141. I’ll have to pick back up with week 140 tomorrow and finish up editing the other two blocks of content. I may not be able to record any new podcast episodes this weekend. I did take the time to set up a podcast playlist on YouTube and I might go ahead and release the Friday August 25, 2023 edition in video format as a sort of test run of the process. Adding that type of video production to my workflow will change the publishing cost from the 10 minutes of audio creation to about 30 minutes per edition. If the video creation process becomes very involved, then things might jump up to 90 minutes per block of content. That could be worth it or it might be a thing to try out and fail faster on by adding it to the mythical stop doing list.

The chalk model for predicting elections

Thank you for tuning in to this audio only podcast presentation. This is week 134 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “The chalk model for predicting elections.”

Last week we started to mess around with some methods of doing sentiment analysis and setting up some frameworks to work on that type of effort. This week we take a little different approach and are going to look at an election model. I’m actively working on election focused prompt based training for large language models for better predictions. Right now I have access to Bard, ChatGPT, and Llama 2 to complete that training. Completing that type of training requires feeding election models in written form as a prompt for replication. I have been including the source data and written out logic as a part of the prompt as well.

Party registration drives the signal. Everything else is noise. That is what I expected to see within this model. It was the headline that could have been, but sadly could not be written. It turns out that this hypothesis could be tested. You can pretty easily try to view the results as a March Madness college basketball style bracket. Accepting that chalk happens or to be put more bluntly the higher ranked seeds normally win. Within the NCAA tournament things are more sporting and sometimes major upsets occur. Brackets are always getting busted. That is probably why they have ended up branding it as March Madness. Partisan politics are very different in terms of the chalk being a lot more consistent. Sentiment can change over time and sometimes voter registration does not accurately predict the outcome.

We are going to move into the hypothesis testing part of the process. This model accepts a bi-model two party representation of political parties with an assumption that generally the other parties are irrelevant to predicting the outcome. The chalk model for predicting elections based on registration reads like this, the predicted winner = max{D,R} where D = registered democrats and R = registered republicans at the time of election. For example, the State of Colorado in December of 2020 that would equate to the max{1127654,1025921} where registered Democrats outnumber registered Republicans [1]. This equation accurately predicted the results of the State of Colorado during the 2020 presidential election. 30 states report voter statistics by party with accessible 2020 archives. Using the power of hindsight we can test the chalk model for predicting elections against the results of the 2020 presidential elections. 

Several internet searches were performed using Google with the search, “(state name) voter registration by party 2020.” Links to the referenced data are provided for replication and or verification of the data. Be prepared to spend a little time completing a verification effort as searching out the registered voter metric for each of the states took about 3 hours of total effort. It will go much faster if you use the links compared to redoing the search from scratch. Data from November of 2020 was selected when possible. Outside of that the best fit of the data being offered was used. 

  1. Alaska max{78664,142266}, predicted R victory accurately [2]
  2. Arizona max{128453,120824}, predicted D victory accurately [3]
  3. California max{10170317,5334323}, predicted D victory accurately [5]
  4. Colorado max{1127654,1025921}, predicted D victory accurately [6]
  5. Connecticut max{850083,480033}, predicted D victory accurately [7]
  6. Delaware max{353659,206526}, predicted D victory accurately [8]
  7. Florida max{5315954,5218739}, predicted D victory in error [9] * The data here might have been lagging to actual by 2021 it would have been accurate at max{5080697,5123799}, predicting R victory
  8. Idaho max{141842,532049}, predicted R victory accurately [10]
  9. Iowa max{699001,719591}, predicted R victory accurately [11]
  10. Kansas max{523317,883988}, predicted R victory accurately [12]
  11. Kentucky max{1670574,1578612}, predicted D victory in error [13] * The data here might have been lagging to actual voter sentiment. The June 2023 numbers flipped max{1529360,1593476}
  12. Louisiana max{1257863,1020085}, predicted D victory in error [14,15]
  13. Maine max{405087,321935}, predicted D victory accurately [16]
  14. Maryland max{2294757,1033832}, predicted D victory accurately [17]
  15. Massachusetts max{1534549,476480}, predicted D victory accurately [18]
  16. Nebraska max{370494,606759}, predicted R victory accurately [19]
  17. Nevada max{689025,448083}, predicted D victory accurately [20]
  18. New Hampshire max{347828,333165}, predicted D victory accurately [21]
  19. New Jersey max{2524164,1445074}, predicted D victory accurately [22]
  20. New Mexico max{611464,425616}, predicted D victory accurately [23]
  21. New York max{6811659,2965451}, predicted D victory accurately [24]
  22. North Carolina max{2627171,2237936}, predicted D victory in error [25,26]
  23. Oklahoma max{750669,1129771}, predicted R victory accurately [27]
  24. Oregon max{1043175,750718}, predicted D victory accurately [28]
  25. Pennsylvania max{4228888,3543070}, predicted D victory accurately [29]
  26. Rhode Island max{327791,105780}, predicted D victory accurately [30]
  27. South Dakota max{158829,277788}, predicted R victory accurately [31]
  28. Utah max{250757,882172}, predicted R victory accurately [32]
  29. West Virginia max{480786,415357}, predicted D victory in error [33]
  30. Wyoming max{48067,184698}, predicted R victory accurately [34]

This model predicting a winner with the max(D,R) ended up with incorrect prediction outcomes in 5 states during the 2020 presidential election cycle including Florida, Kentucky, Louisiana, North Carolina, and West Virginia. All 5 of these states based on voter registration data should have yielded D victory, but did not perform that way in practice. Some of these states clearly have shifted voter registration and I have added some notes to show those changes in Kentucky and Florida. It is possible that in both of those states voter registration was a lagging indicator compared to the sentiment of votes cast. The chalk model for predicting elections ended up being 25/30 or 73.52% accurate. 

You can imagine that I was expecting to see a much more accurate prediction of elections out of this chalk model. Again, calling back to that March Madness and thinking about what it means to have a clear path to victory for registered voters, but it not working out that way. So, that is why we tested this hypothesis of the chalk model. You can obviously see here that it is accurate most of the time, but not all the time. It’s something that we will continue to dig into as I look at some other models and I do some other tests with voter data while we are looking at elections and how they intersect with AI/ML.

Footnotes:

[1] https://www.sos.state.co.us/pubs/elections/VoterRegNumbers/2020/December/VotersByPartyStatus.pdf or https://www.sos.state.co.us/pubs/elections/VoterRegNumbers/2020VoterRegNumbers.html 

[2] https://www.elections.alaska.gov/statistics/2020/SEP/VOTERS%20BY%20PARTY%20AND%20PRECINCT.htm#STATEWIDE

[3] https://azsos.gov/sites/default/files/State_Voter_Registration_2020_General.pdf 

[4] https://azsos.gov/elections/results-data/voter-registration-statistics 

[5] https://elections.cdn.sos.ca.gov/ror/15day-gen-2020/county.pdf 

[6] https://www.sos.state.co.us/pubs/elections/VoterRegNumbers/2020/December/VotersByPartyStatus.pdf 

[7] https://portal.ct.gov/-/media/SOTS/ElectionServices/Registration_and_Enrollment_Stats/2020-Voter-Registration-Statistics.pdf 

[8] https://elections.delaware.gov/reports/e70r2601pty_20201101.shtml 

[9] https://dos.myflorida.com/elections/data-statistics/voter-registration-statistics/voter-registration-reports/voter-registration-by-party-affiliation/ 

[10] https://sos.idaho.gov/elections-division/voter-registration-totals/ 

[11] https://sos.iowa.gov/elections/pdf/VRStatsArchive/2020/CoNov20.pdf 

[12] https://sos.ks.gov/elections/22elec/2022-11-01-Voter-Registration-Numbers-by-County.pdf 

[13] https://elect.ky.gov/Resources/Pages/Registration-Statistics.aspx 

[14] https://www.sos.la.gov/ElectionsAndVoting/Pages/RegistrationStatisticsStatewide.aspx 

[15] https://electionstatistics.sos.la.gov/Data/Registration_Statistics/statewide/2020_1101_sta_comb.pdf 

[16] https://www.maine.gov/sos/cec/elec/data/data-pdf/r-e-active1120.pdf 

[17] https://elections.maryland.gov/pdf/vrar/2020_11.pdf 

[18] https://www.sec.state.ma.us/divisions/elections/download/registration/enrollment_count_20201024.pdf 

[19] https://sos.nebraska.gov/sites/sos.nebraska.gov/files/doc/elections/vrstats/2020vr/Statewide-November-2020.pdf 

[20] https://www.nvsos.gov/sos/elections/voters/2020-statistics 

[21] https://www.sos.nh.gov/sites/g/files/ehbemt561/files/documents/2020%20GE%20Election%20Tallies/2020-ge-names-on-checklist.pdf 

[22] https://www.state.nj.us/state/elections/assets/pdf/svrs-reports/2020/2020-11-voter-registration-by-county.pdf 

[23] https://klvg4oyd4j.execute-api.us-west-2.amazonaws.com/prod/PublicFiles/ee3072ab0d43456cb15a51f7d82c77a2/aa948e4c-2887-4e39-96b1-f6ac4c8ff8bd/Statewide%2011-30-2020.pdf 

[24] https://www.elections.ny.gov/EnrollmentCounty.html 

[25] https://vt.ncsbe.gov/RegStat/ 

[26] https://vt.ncsbe.gov/RegStat/Results/?date=11%2F14%2F2020 

[27] https://oklahoma.gov/content/dam/ok/en/elections/voter-registration-statistics/2020-vr-statistics/vrstatsbycounty-11012020.pdf 

[28] https://sos.oregon.gov/elections/Documents/registration/2020-september.pdf 

[29] https://www.dos.pa.gov/VotingElections/OtherServicesEvents/VotingElectionStatistics/Documents/2020%20Election%20VR%20Stats%20%20FINAL%20REVIEWED.pdf 

[30] https://datahub.sos.ri.gov/RegisteredVoter.aspx 

[31] https://sdsos.gov/elections-voting/upcoming-elections/voter-registration-totals/voter-registration-comparison-table.aspx 

[32] https://vote.utah.gov/current-voter-registration-statistics/ 

[33] https://sos.wv.gov/elections/Documents/VoterRegistrationTotals/2020/Feb2020.pdf 

[34] https://sos.wyo.gov/Elections/Docs/VRStats/2020VR_stats.pdf 

What’s next for The Lindahl Letter? 

  • Week 135: Polling aggregation
  • Week 136: Econometric models
  • Week 137: Time-series analysis
  • Week 138: Prediction markets
  • Week 139: Machine learning election models

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.