Machine learning election models

Thank you for tuning in to this audio only podcast presentation. This is week 139 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Machine learning election models.”

This might be the year that I finally finish that book about the intersection of technology and modernity. During the course of this post we will look at the intersection of machine learning and election models. That could very well be a thin slice of the intersection of technology and modernity at large, but that is the set of questions that brought us here today. It’s one of things we have been chasing along this journey. Oh yes, a bunch of papers exist related to the topic this week of machine learning and election models [1]. None of them are highly cited. A few of them are in the 20’s in terms of citation count, but that means the academic community surrounding this topic is rather limited. Maybe the papers are written, but have just not arrived yet out in the world of publication. Given that machine learning has an active preprint landscape that is unlikely. 

That darth of literature is not going to stop me from looking at them and sharing a few that stood out during the search. None of these papers is approaching the subject from a generative AI model side of things they are using machine learning without any degree of agency. Obviously, I was engaging in this literature review to see if I could find examples of the deployment of models with some type of agency doing analysis within this space of election prediction models. My searching over the last few weeks has not yielded anything super interesting. I was looking for somebody in the academic space doing some type of work within generative AI constitutions and election models or maybe even some work in the space of rolling sentiment analysis for targeted campaign understanding. That is probably an open area for research that will be filled at some point.

Here are 4 articles:

Grimmer, J., Roberts, M. E., & Stewart, B. M. (2021). Machine learning for social science: An agnostic approach. Annual Review of Political Science, 24, 395-419. https://www.annualreviews.org/doi/pdf/10.1146/annurev-polisci-053119-015921 

Sucharitha, Y., Vijayalata, Y., & Prasad, V. K. (2021). Predicting election results from twitter using machine learning algorithms. Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science), 14(1), 246-256. www.cse.griet.ac.in/pdfs/journals20-21/SC17.pdf  

Miranda, E., Aryuni, M., Hariyanto, R., & Surya, E. S. (2019, August). Sentiment Analysis using Sentiwordnet and Machine Learning Approach (Indonesia general election opinion from the twitter content). In 2019 International conference on information management and technology (ICIMTech) (Vol. 1, pp. 62-67). IEEE. https://www.researchgate.net/publication/335945861_Sentiment_Analysis_using_Sentiwordnet_and_Machine_Learning_Approach_Indonesia_general_election_opinion_from_the_twitter_content 

Zhang, M., Alvarez, R. M., & Levin, I. (2019). Election forensics: Using machine learning and synthetic data for possible election anomaly detection. PloS one, 14(10), e0223950. https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0223950&type=printable 

My guess is that we are going to see a wave of ChatGPT related articles about elections post the 2024 presidential cycle. It will probably be one of those waves of articles without any of them really standing out or making any serious contribution to the academy. 

The door is opening to a new world of election prediction and understanding efforts thanks to the recent changes in both model agency and generative AI models that help evaluate and summarize very complex things. It’s really about how they are applied to something going forward that will make the biggest difference in how the use cases play out. These use cases by the way are going to become very visible as the 2024 election comes into focus. The interesting part of the whole equation will be when people are bringing custom knowledge bases to the process to help fuel interactions with machine learning algorithms and generative AI. 

It’s amazing to think how rapidly things can be built. The older models of software engineering are now more of a history lesson than a primer on building things with prompt-based AI. Andrew Ng illustrated in a recent lecture the rapidly changing build times. You have to really decide what you want to build and deploy and make it happen. Ferris Bueller once said, “Life moves pretty fast.” Now code generation is starting to move even faster! You need to stop and look around at what is possible, or you just might miss out on the generative AI revolution.

You can see Andrew’s full video here: https://www.youtube.com/watch?v=5p248yoa3oE 

Footnotes:

[1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=Machine+learning+election+models&btnG= 

What’s next for The Lindahl Letter? 

  • Week 140: Proxy models for elections
  • Week 141: Building generative AI chatbots
  • Week 142: Learning LangChain
  • Week 143: Social media analysis
  • Week 144: Knowledge graphs vs. vector databases

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

Election prediction markets & Time-series analysis

Thank you for tuning in to this audio only podcast presentation. This is week 138 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Prediction markets & Time-series analysis.”

We have been going down the door of digging into considering elections for a few weeks now. You knew this topic was going to show up. People love prediction markets. They are really a pooled reflection of sentiment about the likelihood of something occuring. Right now the scuttlebut of the internet is about LK-99, a potential, maybe debunked, maybe possible room temperature superconductor that people are predicting whether or not it will be replicated before 2025 [1]. You can read the 22 page preprint about LK-99 on ArXiv [2]. My favorite article about why this would be a big deal if it lands was from Dylan Matthews over at Vox [3]. Being able to advance the transmission power of electrical lines alone would make this a breakthrough. 

That brief example being set aside, now people can really dial into the betting markets for elections where right now are not getting nearly the same level of attention as LK-99 which is probably accurate in terms of general scale of possible impact. You can pretty quickly get to all posts that the team over at 538 have tagged for “betting markets” and that is an interesting thing to scroll through [4]. Beyond that look you could start to dig into an article from The New York Times talking about forecasting what will happen to prediction markets in the future [5].

You know it was only a matter of time before we moved from popular culture coverage to the depths of Google Scholar [6].

Snowberg, E., Wolfers, J., & Zitzewitz, E. (2007). Partisan impacts on the economy: evidence from prediction markets and close elections. The Quarterly Journal of Economics, 122(2), 807-829. https://www.nber.org/system/files/working_papers/w12073/w12073.pdf

Arrow, K. J., Forsythe, R., Gorham, M., Hahn, R., Hanson, R., Ledyard, J. O., … & Zitzewitz, E. (2008). The promise of prediction markets. Science, 320(5878), 877-878. https://users.nber.org/~jwolfers/policy/StatementonPredictionMarkets.pdf

Berg, J. E., Nelson, F. D., & Rietz, T. A. (2008). Prediction market accuracy in the long run. International Journal of Forecasting, 24(2), 285-300. https://www.biz.uiowa.edu/faculty/trietz/papers/long%20run%20accuracy.pdf 

Wolfers, J., & Zitzewitz, E. (2004). Prediction markets. Journal of economic perspectives, 18(2), 107-126. https://pubs.aeaweb.org/doi/pdf/10.1257/0895330041371321 

Yeah, you could tell by the title that a little bit of content related to time-series analysis was coming your way. The papers being tracked within Google Scholar related election time series analysis were not highly cited and to my extreme disappointment are not openly shared as PDF documents [7]. For those of you who are regular readers you know that I try really hard to only share links to open access documents and resources that anybody can consume along their lifelong learning journey. Sharing links to paywalls and articles inside a gated academic community is not really productive for general learning. 

Footnotes:

[1] https://manifold.markets/QuantumObserver/will-the-lk99-room-temp-ambient-pre?r=RWxpZXplcll1ZGtvd3NreQ

[2] https://arxiv.org/ftp/arxiv/papers/2307/2307.12008.pdf

[3] https://www.vox.com/future-perfect/23816753/superconductor-room-temperature-lk99-quantum-fusion

[4] https://fivethirtyeight.com/tag/betting-markets/ 

[5] https://www.nytimes.com/2022/11/04/business/election-prediction-markets-midterms.html

[6] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=election+prediction+markets&btnG= 

[7] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=election+time+series+analysis&oq=election+time+series+an 

What’s next for The Lindahl Letter? 

  • Week 139: Machine learning election models
  • Week 140: Proxy models for elections
  • Week 141: Election expert opinions
  • Week 142: Door-to-door canvassing

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

Tracking political registrations

Thank you for tuning in to this audio only podcast presentation. This is week 137 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Tracking political registrations.”

Trying to figure out how many republicans, democrats, and independents are registered in each state is actually really hard. It’s not a trivial task. Even with all our modern technology and the extreme power of the internet providing outsized connectedness between things and making content accessible to searches. Even GPT-4 from OpenAI with some decent plugins turned on will struggle to complete this task.Your best searches to get a full list by state are probably going to land you into the world of projections and surveys. One that will show up very quickly are some results from the Pew Research which contacted people (300 to 4,000 of them) from each state to find out more data about political affiliation [1]. They evaluated responses into three buckets with no lean, lean republication, or lean democrat. That allowed the results to evaluate based on sampling to get a feel for general political intentions. However, that type of intention based evaluation does not give you a sense of the number of voters within each state. 

It opened the door to me considering if political registration is even a good indicator of election outcomes. Sports tournaments rarely play out based on the seeding. That is the element of it that makes it exciting and puts the sport into the tournament. To that end back during week 134 I shared the chalk model to help explore a hypothesis related to registration being predictive. At the moment, I’m more interested to see how proxy models for predicting sporting events are working. Getting actual data to track changes in political registrations is an interesting process. ChatGPT, Bard, and Bing Chat are capable of providing some numbers if you prompt them properly. The OpenAI model GPT-3.5 has some older data from September 2021 and will tell you registered voters by state [2]. I started with a basic prompt, “make a table of voter registration by state.” I had to add a few encouraging prompts at some points, but overall the models all 3 spit out results [3]. The Bing Chat model really tried to direct you back to the United States Census Bureau website [4]. 

This is an area where setting up some type of model with a bit of agency to go out to the relevant secretary of states websites for the 30 states that provide some data might be a way to go to build a decent dataset. That would probably be the only way to really track the official data coming out by state to show the changes in registration over time. Charting that change data might be interesting as a directional view of how voters view themselves in terms of voter registration in a longitudinal way. People who participate in Kaggle have run into challenges where election result prediction is actually a competition [5]. It’s interesting and thinking about what features are most impactful during election prediction is a big part of that competition. Other teams are using linear regression and classification models to help predict election winners as well [6]. I was reading a working paper from Ebanks, Katz, and King published in May 2023 that shared an in depth discussion about picking the right models and the problems of picking the wrong ones [7][8]. 

To close things out here I did end up reading this Center for Politics article from 2018 that was interesting as a look back at where things were [9]. Circling back to the main question this week, I spent some time working within the OpenAI ChatGPT with plugins trying to get GPT-4 to search out and voter registration by state. I have been wondering why with a little bit of agency one of these models could not do that type of searching. Right now the models are not set up with a framework that could complete this type of tasking. 

Footnotes:

[1] https://www.pewresearch.org/religion/religious-landscape-study/compare/party-affiliation/by/state/ 

[2] https://chat.openai.com/share/8a6ea5e7-6e42-4743-bc23-9e8e7c4f79c5 

[3] https://g.co/bard/share/96b6f8d02e8e 

[4] https://www.census.gov/topics/public-sector/voting/data/tables.html 

[5] https://towardsdatascience.com/feature-engineering-for-election-result-prediction-python-943589d89414 

[6] https://medium.com/hamoye-blogs/u-s-presidential-election-prediction-using-machine-learning-88f93e7f6f2a

[7] https://news.harvard.edu/gazette/story/2023/03/researchers-come-up-with-a-better-way-to-forecast-election-results/

[8] https://gking.harvard.edu/files/gking/files/10k.pdf 

[9] https://centerforpolitics.org/crystalball/articles/registering-by-party-where-the-democrats-and-republicans-are-ahead/ 

What’s next for The Lindahl Letter? 

  • Week 138: Election prediction markets & Time-series analysis
  • Week 139: Machine learning election models
  • Week 140: Proxy models for elections
  • Week 141: Election expert opinions
  • Week 142: Door-to-door canvassing

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

Econometric election models

Thank you for tuning in to this audio only podcast presentation. This is week 136 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Econometric election models.”

It has been a few weeks here since we started by digging into a good Google Scholar search and you know this topic would be just the thing to help open that door [1]. My searches for academic articles are always about finding accessible literature that sits outside paywalls that is intended to be read and shared beyond strictly academic use. Sometimes that is easier than others when the topics lend themselves to active use cases instead of purely theoretical research. Most of the time these searches to find out what is happening at the edge of what is possible involve applied research. Yes, that type of reasoning would place me squarely in the pracademic camp of intellectual inquiry. 

That brief chautauqua aside, my curiosity here is how do we build out econometric election models or other model inputs to feed into large language model chat systems as prompt engineering for the purposes of training them to help either predict elections or interpret and execute the models. This could be a method for introducing extensibility or at least the application of targeted model effect to seed a potential future methodology within the prompt engineering space. As reasoning engines go it’s possible that an econometric frame could be an interesting proxy model within generative AI prompting. It’s a space worth understanding a little bit more for sure as we approach the 2024 presidential election cycle. 

I’m working on that type of effort here as we dig into econometric election models. My hypothesis here is that you can write out what you want to explain in a longer form as a potential input prompt to train a large language model. Maybe a more direct way of saying that is we are building a constitution for the model based on models and potentially proxy models then working toward extensibility and agency from introducing those models together. For me that is a very interesting space to begin to open up and kick the tires on in the next 6 months. 

Here are 6 papers from that Google Scholar search that I thought were interesting:

Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106. https://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.31.2.87 

Fair, R. C. (1996). Econometrics and presidential elections. Journal of Economic Perspectives, 10(3), 89-102. https://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.10.3.89

Armstrong, J. S., & Graefe, A. (2011). Predicting elections from biographical information about candidates: A test of the index method. Journal of Business Research, 64(7), 699-706. https://faculty.wharton.upenn.edu/wp-content/uploads/2012/04/PollyBio58.pdf 

Graefe, A., Green, K. C., & Armstrong, J. S. (2019). Accuracy gains from conservative forecasting: Tests using variations of 19 econometric models to predict 154 elections in 10 countries. Plos one, 14(1), e0209850. https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0209850&type=printable

Leigh, A., & Wolfers, J. (2006). Competing approaches to forecasting elections: Economic models, opinion polling and prediction markets. Economic Record, 82(258), 325-340. https://www.nber.org/system/files/working_papers/w12053/w12053.pdf 

Benjamin, D. J., & Shapiro, J. M. (2009). Thin-slice forecasts of gubernatorial elections. The review of economics and statistics, 91(3), 523-536. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2860970/pdf/nihms190094.pdf 

Beyond those papers, I read some slides from Hal Varian on “Machine Learning and Econometrics” from January of 2014 [2]. The focus of the slide was applied to modeling human choices. Some time was spent on trying to understand the premise that the field of machine learning could benefit from econometrics. To be fair since that 2014 set of slides you don’t hear people in the machine learning space mention econometrics that often. Most people talk about Bayesian related arguments. 

On a totally separate note for this week I was really into running some of the Meta AI Llama models on my desktop locally [3]. You could go out and read about the new Code Llama which is an interesting model trained and focused on coding [4]. A ton of researchers got together and wrote a paper about this new model called, “Code Llama: Open Foundation Models for Code” [5]. That 47 page missive was shared back on August 24, 2023, and people have already started to build alternative models. It’s an interesting world in the wild wild west of generative AI these days. I really did install LM Studio on my Windows workstation and run the 7 billion parameter version of Code Llama to kick the tires [6]. It’s amazing that a model like that can run locally and that you can interact with it locally using your own high end graphics card.

Footnotes:

[1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=econometric+election+prediction+models&btnG= 

[2] https://web.stanford.edu/class/ee380/Abstracts/140129-slides-Machine-Learning-and-Econometrics.pdf 

[3] https://ai.meta.com/llama/ 

[4] https://about.fb.com/news/2023/08/code-llama-ai-for-coding/ 

[5] https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/

[6] https://lmstudio.ai/

What’s next for The Lindahl Letter? 

  • Week 137: Tracking political registrations
  • Week 138: Prediction markets & Time-series analysis
  • Week 139: Machine learning election models
  • Week 140: Proxy models
  • Week 141: Expert opinions

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

Polling aggregation models

Thank you for tuning in to this audio only podcast presentation. This is week 135 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Polling aggregation models.”

I read and really enjoyed the book by Nate Silver from 2012 about predictions. It’s still on my bookshelf. Strangely enough the cover has faded more than any other book on the shelf. 

Silver, N. (2012). The signal and the noise: Why so many predictions fail-but some don’t. Penguin.

That book from Nate is sitting just a few books over from Armstrong’s principles of forecasting. A book that I have referenced a number of times before. It will probably be referenced more as we move ahead as well. It’s a resource that just keeps on giving. Math it’s funny like that. 

Armstrong, J. S. (Ed.). (2001). Principles of forecasting: a handbook for researchers and practitioners (Vol. 30). Boston, MA: Kluwer Academic.

My podcast feed for years has included the 538 podcast where I listened to Nate and Galen talk about good and bad uses of polling [1]. Sadly, it does not currently feature Nate after the recent changes over at 538. They reported on and ranked a lot of polling within the 538 ecosystem of content. Model talk and the good or bad use of polling were staples in the weekly pod journey. I really thought at some point they would take all of that knowledge about reviewing, rating, and offering critiques of polling to do some actual polling. Instead they mostly offered polling aggregation which is what we are going to talk about today. On the website they did it really well and the infographics they built are very compelling. 

Today setting up and running a polling organization is different from before. A single person could run a large amount of it thanks to the automation that now exists. An organization with funding to set up automation and run the polling using an IVR and some type of dialogue flow [2]. Seriously, you could build a bot setup that placed calls to people and completed a survey in a very conversational way. That still runs into the same problem that phone survey methods are going to face. I screen out all non-contact phone calls and I’m not the only person doing that. Cold calls are just not effective for business or polling in 2023 and the rise of phone assistants that can effectively block out noise are going to make the phone methodology even harder to effectively utilize.

It’s hard to make a hype based drum roll on the written page. You are going to have to imagine it for me to get ready for this next sentence. Now that you are imagining that drum roll… Get ready for a year of people talking about AI and the 2024 election. It probably won’t get crypto bad in terms of the hype trane showing up to nowhere, but it will get loud. I’m going to contribute to that dialogue, but hopefully in the softest possible way. Yeah, I’m walking right into that by reflecting on the outcome of my actions while simultaneously writing about them during this missive.

You can see an article from way back in November 2020 talking about how AI does show some potential to gauge voter sentiment [3]. That was before all of the generative AI and agent hype started. Things are changing rapidly in that space and I’m super curious about what can actually be accomplished in that space. I’m spending time every day learning about this and working on figuring out ways to implement this before the next major presidential election in 2024. An article from The Atlantic caught my attention as it talked about how nobody responds to polls anymore and started to dig into what AI could possibly do in that space, microtargeting, and Kennedy (1960) campaign references [4]. That was an interesting read for sure but you could veer over to VentureBeat to read about how AI fared against regular pollsters in the 2020 election [5]. That article offered a few names to watch out for and dig into a little more including KCore Analytics, expert.ai, and Polly. 

We will see massive numbers of groups purporting to use AI in the next election cycle. Even The Brooking Institute has started to share some thoughts on how AI will transform the next presidential election [6]. Sure you could read something from Scientific American where people are predicting that AI could take over and undermine democracy [7]. Dire predictions abound and those will probably also accelerate as the AI hype train pulls up to election station during the 2024 election cycle [8][9]. Some of that new technology is even being deployed into nonprofits to help track voters at the polls [10].

Footnotes:

[1] https://projects.fivethirtyeight.com/polls/ 

[2] https://cloud.google.com/contact-center/ccai-platform/docs/Surveys 

[3] https://www.wsj.com/articles/artificial-intelligence-shows-potential-to-gauge-voter-sentiment-11604704009

[4] https://www.theatlantic.com/technology/archive/2023/04/polls-data-ai-chatbots-us-politics/673610/ 

[5] https://venturebeat.com/ai/how-ai-predictions-fared-against-pollsters-in-the-2020-u-s-election/

[6] https://www.brookings.edu/articles/how-ai-will-transform-the-2024-elections/ 

[7] https://www.scientificamerican.com/article/how-ai-could-take-over-elections-and-undermine-democracy/

[8] https://www.govtech.com/elections/ais-election-impact-could-be-huge-for-those-in-the-know

[9] https://apnews.com/article/artificial-intelligence-misinformation-deepfakes-2024-election-trump-59fb51002661ac5290089060b3ae39a0 

What’s next for The Lindahl Letter? 

  • Week 136: Econometric election models
  • Week 137: Tracking political registrations
  • Week 138: Prediction markets & Time-series analysis
  • Week 139: Machine learning election models
  • Week 140: Proxy models

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

The chalk model for predicting elections

Thank you for tuning in to this audio only podcast presentation. This is week 134 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “The chalk model for predicting elections.”

Last week we started to mess around with some methods of doing sentiment analysis and setting up some frameworks to work on that type of effort. This week we take a little different approach and are going to look at an election model. I’m actively working on election focused prompt based training for large language models for better predictions. Right now I have access to Bard, ChatGPT, and Llama 2 to complete that training. Completing that type of training requires feeding election models in written form as a prompt for replication. I have been including the source data and written out logic as a part of the prompt as well.

Party registration drives the signal. Everything else is noise. That is what I expected to see within this model. It was the headline that could have been, but sadly could not be written. It turns out that this hypothesis could be tested. You can pretty easily try to view the results as a March Madness college basketball style bracket. Accepting that chalk happens or to be put more bluntly the higher ranked seeds normally win. Within the NCAA tournament things are more sporting and sometimes major upsets occur. Brackets are always getting busted. That is probably why they have ended up branding it as March Madness. Partisan politics are very different in terms of the chalk being a lot more consistent. Sentiment can change over time and sometimes voter registration does not accurately predict the outcome.

We are going to move into the hypothesis testing part of the process. This model accepts a bi-model two party representation of political parties with an assumption that generally the other parties are irrelevant to predicting the outcome. The chalk model for predicting elections based on registration reads like this, the predicted winner = max{D,R} where D = registered democrats and R = registered republicans at the time of election. For example, the State of Colorado in December of 2020 that would equate to the max{1127654,1025921} where registered Democrats outnumber registered Republicans [1]. This equation accurately predicted the results of the State of Colorado during the 2020 presidential election. 30 states report voter statistics by party with accessible 2020 archives. Using the power of hindsight we can test the chalk model for predicting elections against the results of the 2020 presidential elections. 

Several internet searches were performed using Google with the search, “(state name) voter registration by party 2020.” Links to the referenced data are provided for replication and or verification of the data. Be prepared to spend a little time completing a verification effort as searching out the registered voter metric for each of the states took about 3 hours of total effort. It will go much faster if you use the links compared to redoing the search from scratch. Data from November of 2020 was selected when possible. Outside of that the best fit of the data being offered was used. 

  1. Alaska max{78664,142266}, predicted R victory accurately [2]
  2. Arizona max{128453,120824}, predicted D victory accurately [3]
  3. California max{10170317,5334323}, predicted D victory accurately [5]
  4. Colorado max{1127654,1025921}, predicted D victory accurately [6]
  5. Connecticut max{850083,480033}, predicted D victory accurately [7]
  6. Delaware max{353659,206526}, predicted D victory accurately [8]
  7. Florida max{5315954,5218739}, predicted D victory in error [9] * The data here might have been lagging to actual by 2021 it would have been accurate at max{5080697,5123799}, predicting R victory
  8. Idaho max{141842,532049}, predicted R victory accurately [10]
  9. Iowa max{699001,719591}, predicted R victory accurately [11]
  10. Kansas max{523317,883988}, predicted R victory accurately [12]
  11. Kentucky max{1670574,1578612}, predicted D victory in error [13] * The data here might have been lagging to actual voter sentiment. The June 2023 numbers flipped max{1529360,1593476}
  12. Louisiana max{1257863,1020085}, predicted D victory in error [14,15]
  13. Maine max{405087,321935}, predicted D victory accurately [16]
  14. Maryland max{2294757,1033832}, predicted D victory accurately [17]
  15. Massachusetts max{1534549,476480}, predicted D victory accurately [18]
  16. Nebraska max{370494,606759}, predicted R victory accurately [19]
  17. Nevada max{689025,448083}, predicted D victory accurately [20]
  18. New Hampshire max{347828,333165}, predicted D victory accurately [21]
  19. New Jersey max{2524164,1445074}, predicted D victory accurately [22]
  20. New Mexico max{611464,425616}, predicted D victory accurately [23]
  21. New York max{6811659,2965451}, predicted D victory accurately [24]
  22. North Carolina max{2627171,2237936}, predicted D victory in error [25,26]
  23. Oklahoma max{750669,1129771}, predicted R victory accurately [27]
  24. Oregon max{1043175,750718}, predicted D victory accurately [28]
  25. Pennsylvania max{4228888,3543070}, predicted D victory accurately [29]
  26. Rhode Island max{327791,105780}, predicted D victory accurately [30]
  27. South Dakota max{158829,277788}, predicted R victory accurately [31]
  28. Utah max{250757,882172}, predicted R victory accurately [32]
  29. West Virginia max{480786,415357}, predicted D victory in error [33]
  30. Wyoming max{48067,184698}, predicted R victory accurately [34]

This model predicting a winner with the max(D,R) ended up with incorrect prediction outcomes in 5 states during the 2020 presidential election cycle including Florida, Kentucky, Louisiana, North Carolina, and West Virginia. All 5 of these states based on voter registration data should have yielded D victory, but did not perform that way in practice. Some of these states clearly have shifted voter registration and I have added some notes to show those changes in Kentucky and Florida. It is possible that in both of those states voter registration was a lagging indicator compared to the sentiment of votes cast. The chalk model for predicting elections ended up being 25/30 or 73.52% accurate. 

You can imagine that I was expecting to see a much more accurate prediction of elections out of this chalk model. Again, calling back to that March Madness and thinking about what it means to have a clear path to victory for registered voters, but it not working out that way. So, that is why we tested this hypothesis of the chalk model. You can obviously see here that it is accurate most of the time, but not all the time. It’s something that we will continue to dig into as I look at some other models and I do some other tests with voter data while we are looking at elections and how they intersect with AI/ML.

Footnotes:

[1] https://www.sos.state.co.us/pubs/elections/VoterRegNumbers/2020/December/VotersByPartyStatus.pdf or https://www.sos.state.co.us/pubs/elections/VoterRegNumbers/2020VoterRegNumbers.html 

[2] https://www.elections.alaska.gov/statistics/2020/SEP/VOTERS%20BY%20PARTY%20AND%20PRECINCT.htm#STATEWIDE

[3] https://azsos.gov/sites/default/files/State_Voter_Registration_2020_General.pdf 

[4] https://azsos.gov/elections/results-data/voter-registration-statistics 

[5] https://elections.cdn.sos.ca.gov/ror/15day-gen-2020/county.pdf 

[6] https://www.sos.state.co.us/pubs/elections/VoterRegNumbers/2020/December/VotersByPartyStatus.pdf 

[7] https://portal.ct.gov/-/media/SOTS/ElectionServices/Registration_and_Enrollment_Stats/2020-Voter-Registration-Statistics.pdf 

[8] https://elections.delaware.gov/reports/e70r2601pty_20201101.shtml 

[9] https://dos.myflorida.com/elections/data-statistics/voter-registration-statistics/voter-registration-reports/voter-registration-by-party-affiliation/ 

[10] https://sos.idaho.gov/elections-division/voter-registration-totals/ 

[11] https://sos.iowa.gov/elections/pdf/VRStatsArchive/2020/CoNov20.pdf 

[12] https://sos.ks.gov/elections/22elec/2022-11-01-Voter-Registration-Numbers-by-County.pdf 

[13] https://elect.ky.gov/Resources/Pages/Registration-Statistics.aspx 

[14] https://www.sos.la.gov/ElectionsAndVoting/Pages/RegistrationStatisticsStatewide.aspx 

[15] https://electionstatistics.sos.la.gov/Data/Registration_Statistics/statewide/2020_1101_sta_comb.pdf 

[16] https://www.maine.gov/sos/cec/elec/data/data-pdf/r-e-active1120.pdf 

[17] https://elections.maryland.gov/pdf/vrar/2020_11.pdf 

[18] https://www.sec.state.ma.us/divisions/elections/download/registration/enrollment_count_20201024.pdf 

[19] https://sos.nebraska.gov/sites/sos.nebraska.gov/files/doc/elections/vrstats/2020vr/Statewide-November-2020.pdf 

[20] https://www.nvsos.gov/sos/elections/voters/2020-statistics 

[21] https://www.sos.nh.gov/sites/g/files/ehbemt561/files/documents/2020%20GE%20Election%20Tallies/2020-ge-names-on-checklist.pdf 

[22] https://www.state.nj.us/state/elections/assets/pdf/svrs-reports/2020/2020-11-voter-registration-by-county.pdf 

[23] https://klvg4oyd4j.execute-api.us-west-2.amazonaws.com/prod/PublicFiles/ee3072ab0d43456cb15a51f7d82c77a2/aa948e4c-2887-4e39-96b1-f6ac4c8ff8bd/Statewide%2011-30-2020.pdf 

[24] https://www.elections.ny.gov/EnrollmentCounty.html 

[25] https://vt.ncsbe.gov/RegStat/ 

[26] https://vt.ncsbe.gov/RegStat/Results/?date=11%2F14%2F2020 

[27] https://oklahoma.gov/content/dam/ok/en/elections/voter-registration-statistics/2020-vr-statistics/vrstatsbycounty-11012020.pdf 

[28] https://sos.oregon.gov/elections/Documents/registration/2020-september.pdf 

[29] https://www.dos.pa.gov/VotingElections/OtherServicesEvents/VotingElectionStatistics/Documents/2020%20Election%20VR%20Stats%20%20FINAL%20REVIEWED.pdf 

[30] https://datahub.sos.ri.gov/RegisteredVoter.aspx 

[31] https://sdsos.gov/elections-voting/upcoming-elections/voter-registration-totals/voter-registration-comparison-table.aspx 

[32] https://vote.utah.gov/current-voter-registration-statistics/ 

[33] https://sos.wv.gov/elections/Documents/VoterRegistrationTotals/2020/Feb2020.pdf 

[34] https://sos.wyo.gov/Elections/Docs/VRStats/2020VR_stats.pdf 

What’s next for The Lindahl Letter? 

  • Week 135: Polling aggregation
  • Week 136: Econometric models
  • Week 137: Time-series analysis
  • Week 138: Prediction markets
  • Week 139: Machine learning election models

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

Avoiding those major blockers

Two podcast audio tracks were recorded this morning. Both the week 137 and 138 content received some revision and were recorded. That was a productive start to the day. Right now the backlog of recorded and ready to release blocks of content for The Lindahl Letter is sitting at 5 weeks. My alternate building project is progressing as well on a daily basis. When you are the only one coding and developing something you have to be really careful about hitting roadblocks and other blockers. That they can just bring everything to a crashing halt on a side project. No opportunity for recovery exists as you have to either solve it, figure out a work around, or elect to move on to something else. No one element of the backlog can be allowed to take up every bit of possible time. 

I was reading this article from Vox writer Peter Kafka about how the newsletter boom is over [1]. During the course of writing 138 blocks of content to share on Substack I have wondered about how that company is doing and what exactly is going on in the world of newsletter publishing. It’s interesting to think of publishing a newsletter as an MVP to ship each week. Kafka shared a few links about what has been happening with Substack as well which was helpful. Those articles did not paint the best forward looking picture. A few weeks ago now I started to both post my podcast audio and Substack post content on this blog each week as well. That is more or less just an archival play at the moment. I’d have to figure out how to fold subscribers from one platform to the other which I guess would be possible by email. I’d have to consider the right opt-in process for that to make the switch, but it’s probably a problem for future Nels. 

Footnotes:
[1] https://www.vox.com/recode/23289433/newsletters-substack-subscriptions-bari-weiss-semafor-peter-kafka-column

Always building blocks of content

This morning I spent some time working on content blocks for weeks 137 and 138. Some general research is underway for week 139 as well. Currently, three weeks are staged up and ready to publish. Hopefully, tomorrow morning that will be extended back out to 5 weeks of staged content. My plan is to finish up and record audio for two weeks of content. I’m still actively working to record podcast audio each week. It is one of those things that I have considered dropping from my routine a few times. It has even made my “stop doing” list a couple of times, but it is not really that big of a time commitment and people do utilize that content stream each week.

It’s entirely possible that I’m at the peak of my creative abilities. I can sit down and write just about any block of content. Tackling even the difficult things has become easier and easier. Conceptually, I have it that 10,000 hours of practice threshold a few times on the writing front and maybe just maybe one of those threshold passes was successful. Knowing that might help influence what things from the backlog get tackled and in what order.

Automated survey methods

Thank you for tuning in to this audio only podcast presentation. This is week 133 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Automated survey methods.”

I’m still spending some time digging into notebooks. This time around the topic as you might have guessed for that inquiry is figuring out how to automate a survey or more pointedly some sentiment analysis. People are building automated phone surveys with interactive voice response (IVR) systems. The next wave of this technology will be hard to tell if it is a person or a bot. Seriously, those systems are going to keep getting better at a rapid pace. The new wave of generative large language models are going to make outbound call surveys better and probably more plentiful. When the outbound call survey plugins roll in for ChatGPT, Bard, and anybody can build one for Llama 2 if they are willing to serve up a custom model. At the same time, I’m entirely sure (and hopeful) that people will be using more advanced technology to block those phone calls as well.  

All right, let’s shift away from considering phone calls and start to dig around into some of the automated sentiment analysis techniques that exist. We are starting to see frameworks where you can ask these new series of ChatGPT type services to act as an agent for you and complete some type of tasking. One of the things that would be interesting to ask that type of agent to complete would be to evaluate sentiment about something. I’m sure brands would like to have some automated brand evaluation methods. This will inevitably be used for politics as well. Right now we are not to the point where everybody has plugins that allow agency for ChatGPT or other toolings at the moment. That really is coming very soon as far as I can tell. Between lower energy costs and the solid platforms being built, those changes together may enable the compute for this type of interaction to happen in very conversational ways with a computer in the next 5 years.

Right now you could start by messing around with Google Colab and use the forms options they have [1]. Completing some really solid sentiment analysis may require more than just focusing on the  Google Colab environment. You may want to go out to somewhere like Hugging Face to get some information on how to do this with some python code [2]. A nice place to go along this journey to get some sentiment analysis done would be to venture out to the world of Kaggle and access one of their notebooks for sentiment analysis [3].Another notebook that I liked was from notebook dot community and it shares some of the natural language processing basic of sentiment analysis in really good chunks that make it easy to understand the mechanics of how things are happening [4]. At this point in the process you are probably ready to start to do some work on your own to complete some sentiment analysis and I found the right Google Colab notebook for you to start work on designing your own sentiment analysis tool [5].

I was talking to somebody recently about the future of AI. My explanation may have not been what they expected to hear. Within the next couple of years I expect to see a lot of companies spin up and a lot of different creativity happening in the space. All of that will end up settling out into a commodified built in series of advancements. A lot of new features for applications and tooling will spin out off the great wave of AI builds that are happening now, but it will end up feeling more commonplace and build into technologies that exist now. These technologies will mostly supplement or augment things as we move forward. You will have to know how to interact with and work with the generative models that exist, but they are going to be built into the platforms and systems that end up winning out within the business world in the next couple of years.

Footnotes:

[1] https://colab.research.google.com/notebooks/forms.ipynb

[2] https://huggingface.co/blog/sentiment-analysis-python 

[3] https://www.kaggle.com/code/omarhassan1406/notebook-for-sentiment-analysis

[4] https://notebook.community/n-kostadinov/sentiment-analysis/SentimentAnalysis

[5] https://colab.research.google.com/github/littlecolumns/ds4j-notebooks/blob/master/investigating-sentiment-analysis/notebooks/Designing%20your%20own%20sentiment%20analysis%20tool.ipynb

What’s next for The Lindahl Letter? 

  • Week 134: The chalk model for predicting elections
  • Week 135: Polling aggregation
  • Week 136: Econometric models
  • Week 137: Time-series analysis
  • Week 138: Prediction markets

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

Synthetic data notebooks

Thank you for tuning in to this audio only podcast presentation. This is week 132 of The Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for The Lindahl Letter is, “Synthetic data notebooks.”

People are totally working on this one actively which I thought was pretty interesting. I have a general interest in how to create synthetic data using notebooks as it helps to provide people with lessons on how to do it from an educational based perspective. Really solid automated testing process may include some of this in the development process. It makes automation even more amazing as a part of the process. It looks like the folks over at Towards AI released a nice guide to synthetic data that is geared at beginners in March of 2023 [1]. That guide walked through some of the concepts and a few pieces of information like how some report from a researcher at Gartner showed that half of future AI data will end up being synthetic data. Don’t worry I went out and found sourcing for that from Gartner and Alexander Linden who estimated by 2030 that synthetic data would outpace the regular data [2]. 

Those future considerations aside, the reason that is happening is that most people are going to be expanding their datasets with synthetic data to help them do training and work with models [3]. We are pretty far into the second paragraph and you might be wanting to access a couple of Google Colab notebooks to be able to do some of this yourself. Don’t worry that is about to happen for you. The team over at gretel AI shared a couple of notebooks that you can use for this type of effort:

https://colab.research.google.com/github/gretelai/gretel-synthetics/blob/master/examples/synthetic_records.ipynb

The first notebook had all sorts of errors and would not work. 

https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/docs/notebooks/create_synthetic_data_from_a_dataframe_or_csv.ipynb

The second one required a Gretel API key to get going which was a lot less fun than it could have been without that part of the equation. I went out to the website over at https://gretel.ai/ and they have some free elements. I got into the dashboard they have pretty easily and started to look around to see what they are offering [4]. I went out to YouTube and found a 12 minute video from one of the co-founders Alex Watson showing how to do this effort. They did quickly show how to get the API key for the above notebook. 

I really did follow those instructions to get that magic API key which totally worked in the 2nd Google Colab notebook link from above. I stepped through the entire notebook in about 15 minutes and was able to see the process of synthetic data generation from a dataframe or CSV which was exciting to watch and learn about in a notebook. The main model training took 7 minutes so don’t expect that it will happen in just a click.

Maybe you wanted to see somebody else do some generation of synthetic data in Google Colab on YouTube. You can see YData work with a fabric environment and work in a notebook. They had 44 subscribers and the video had 28 views before I shared this link for your enjoyment.

You could also check out this other YouTube video from The Next Phase team that shows more information about “Synthetic data generation with CTGAN” in a Google Colab notebook as well [5]. 

Maybe you wanted to switch gears a bit and learn a little bit about how to create 8-bit audio samples [6]. I’m going to share one more article here that walks through how to generate datasets as I thought it was actually pretty good [7]. I’m going to close this one out with a zoom out to what some people think is the future of these synthetic data driven creations which is in fact the politician of the open internet and eventual model collapse [8]. People have even recently gone as far as to say copies of the internet before all this generated content are worth more for training than the derivatives. We will see what happens soon. Oftentimes in the machine learning spaces people have used randomness, chaos, or other techniques of shifting things around to overcome blocks. I’ll be curious to see if something is developed to overcome these potential model collapse elements. 

Footnotes:

[1] https://towardsai.net/p/machine-learning/a-beginners-guide-to-synthetic-data

[2] https://www.gartner.com/en/newsroom/press-releases/2022-06-22-is-synthetic-data-the-future-of-ai

[3] https://towardsdatascience.com/generating-expanding-your-datasets-with-synthetic-data-4e27716be218 

[4] https://console.gretel.ai/use_cases/cards/use-case-synthetic/projects 

[5] https://colab.research.google.com/drive/18vavq2Kt8HqhSnZvvFxJUc-70NCPeDU_?usp=sharing 

[6] https://medium.com/mlearning-ai/python-machine-learning-gans-synthetic-data-and-google-colab-5bb43491a8c7

[7] https://medium.com/nerd-for-tech/synthetically-generate-datasets-using-deep-learning-c1f6ee7a0990 

[8] https://arxiv.org/pdf/2305.17493.pdf 

What’s next for The Lindahl Letter? 

  • Week 133: Automated survey methods
  • Week 134: Make a link based news report automatically
  • Week 135: Saving some notebooks every day
  • Week 136: What if July was startup month? 31 days for 31 ideas
  • Week 137: The battle is about having the idea

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.