Multimodal machine learning revisited

Thank you for tuning in to this audio only podcast presentation. This is week 59 of The Lindahl Letter publication. A new edition arrives every Friday. This week the machine learning or artificial intelligence related topic under consideration is,“Multimodal machine learning revisited.”

You might well be aware that multimodal machine learning (MMML) is a slice of the machine learning universe. Even typing the title of this post was challenging. I really wanted to type multi model vs. modal with an e before the last “L” instead of an “A” like the actual wording requires in this case. The definition of multimodal is really direct and does not include a ton of mystery. The word is used to describe something with more than one mode. You might end up quickly running down the path of multimodal deep learning to help describe it. As a person is capable of taking in the world with multiple senses and converting those signal paths into one stream for analysis. A multimodal deep learning network could be built to evaluate multi types of inputs. That really becomes a lot more complicated than it sounds within the modeling space. Our current class of models does not demonstrate the practical skill a person would at differentiating senses and understanding them in real time [1].

Within the machine learning space the first generation of models were really focused on achieving very specific tasks. They received highly defined training on very specific data problems with very curated training datasets. At some point, machine learning models or builds will need to be able to take on more than one type of tasking. I think that is a really fascinating part of machine learning to study. Expanding the capabilities and ultimately what is possible changes future trajectories. I ended up reading a paper titled, “Recent Advances and Trends in Multimodal Deep Learning: A Review,” from 2021 [2]. It was pretty much the exact paper I was looking to read to really dig into the topic under consideration today. That paper really focused on video and language examples which really put things in context. 

One of the things that I realized during the course of my research on this topic was that a treasure trove of recorded lectures exist on YouTube. A lot of them are related to computer science, machine learning, and artificial intelligence. That is a really good thing if you were trying to put together a syllabus geared toward providing an introduction to machine learning. The video that I spent the most time watching this time around was from Victoria Dean’s, “MIT 6.S191 Lecture 5 Multimodal Deep Learning,” lecture from 2017 [3]. Somebody who was willing to put in the work to curate the content could easily pull together all the lectures necessary from a multitude of different sources. 

After reviewing the current trends in multimodal deep learning my interests shifted to one particular topic related to automated ICD coding. A quick Google Scholar search for “automated icd coding” in quotes and multimodal machine learning produced a ton of interesting results [4]. My search returned 94 results which was pretty surprising given the targeted nature of the terms used. Some of the articles were related to feature extraction and others were really keyed in on trying to get to the point of working with an ICD code or completing the action of coding content. One of the results that dug into automated ICD coding caught my attention was titled, “A Deep Learning Framework for Automated ICD-10 Coding” [5]. Ultimately the automation would help physicians be more productive and accurate in working toward a diagnosis. That seems like a noble effort to assist physicians and help patients.

Links and thoughts:

“MIT 6.S191 Lecture 5 Multimodal Deep Learning”

“Steam Deck: What I Didn’t Say In My Review – WAN Show February 25, 2022”

Top 5 Tweets of the week:







What’s next for The Lindahl Letter?

  • Week 60: General artificial intelligence
  • Week 61: AI network platforms
  • Week 62: Touching the singularity
  • Week 63: Sentiment and consensus analysis
  • Week 64: Language models revisited

I’ll try to keep the what’s next list for The Lindahl Letter forward looking with at least five weeks of posts in planning or review. If you enjoyed this content, then please take a moment and share it with a friend. Thank you and enjoy the week ahead.  

Leave a Reply