Starting to learn how to edit with the LaTeX typesetting system

This weekend a little blogging on the WordPress Android application occurred via my Google Pixel 5 smartphone. Two different posts were made to keep my writing streak alive. Both of the posts were just updates to my activity during the weekend, but they were enough to keep things moving along. During the lengthy car ride back to Denver from Kansas City I gave some thought to the edges of the things being expired in my writing. I’m getting to the bleeding edge of a lot of different academic work. Writing is occurring often at that edge, but I’m not taking the time to put it into an academic paper format for submission. While I don’t wholesale believe in that type of writing for every purpose it probably is something that deserves an investment of my time and energy going forward. 

I’m learning how to use the online site Overleaf as a LaTeX editor. A lot of people ask questions online about the best LaTeX editor for beginners. Over the years I have become very skilled at using Microsoft Word to produce manuscripts and it has worked just fine. Millions of people use it daily. Right now I’m writing out of a Google Docs file with a .DOCX extension. Working out of a LaTeX editor is not something that I really ever do. Either I have to learn how to write in an editor that supports that format or I have to take the time at the end of the journey to convert everything over to that format. Some people have found ways to edit LaTeX documents in Google Docs and it seems that it might be possible. Instead of messing around with that type of effort I’m going to just go all in with Overleaf and see what happens. Today will be the day that starts and I’m hopeful it will be a fun adventure. Learning how to modify and work with LaTeX formatting is not really something that I want to invest my time and energy into, but it seems like something that will end up paying off in the end. 

It should be possible to take my research note on open software MLOps repositories shared on GitHub and get everything converted over to LaTeX using Overleaf. I found an arXiv style template that will serve as a basis for the final output. It should be a fun little adventure in the fine arts of typesetting. Right at the start it is clear that the source and recompile being split sides of a screen is radically different from what I normally handle as a workflow. Right now I’m writing in a print preview mode basically that shows me the read pretty much what will happen live within the document and what will be sent to the printer or a PDF document for that matter. I’m not sold on the idea that you need some type of academic typesetting to gatekeeper the publishing world as a technologic barrier to entry at the port of academic freedom. 

Considering equations in ML papers

During the course of the 4th of July I got a chance to read a few PDFs of papers. Being able to write within the academic tone that papers usually have is a skill. Sometimes the papers include great scientifically based research, but are difficult to follow due to being poorly written. In the machine learning space this is one of those things that happens and could be compounded due to the mathematics the author is trying to share. Within the abstract and introduction things will start out on the right academic footing, but then as the mathematics start to get introduced things will veer off into the wild unknown. Most of the mathematics that gets shared within the machine learning space is not a provable theorem or something that you can easily break down and check. Every single time somebody starts to walk me through a series of equations in a paper I start to evaluate the work. Most of the time you cannot check the actual output given that the equations in the paper are implemented as code. Generally, that code is not shared so you cannot work your way backward from the code to the paper or the other way around to figure out what the author did for the actual delivery of the mathematics. 

The other piece of the puzzle that often worries me is that the equation presented in the paper is theoretical as an implementation and they are using an algorithm built into software that was already written. Within this example the author did not implement the mathematics within the code and probably is not deriving a perfect reflection of equation and implementation in the paper. Being able to run code as a part of a software package and being able to work through the equation using LaTeX or some other package to articulate it within an editor are very different things. I would have to work it out with pen and paper and then bring it over to the paper after the fact. Generally, I would be working with a known transform or algorithm within the machine learning space. It would be unlikely that I would be advancing the equations denoting the mathematical machinations beyond the original efforts. Within the coded implementation I might do something in an applied way that could ultimately be represented as a novel piece of mathematics. However, most researchers would be better off presenting the code vs. trying to transform the code back into a series of mathematical representations within an academic paper. 

It might very well be a good topic for a research paper to do an analysis of the equations presented in mathematical form in the top 100 machine learning papers by number of citations. It might be interesting to see the overlap in the equations across the paper. Alternatively it might be very illuminating to see if no overlap exists given that it’s possible outside of a shared statistical foundation the researches are not working from a shared mathematical base. A book on the mathematics of machine learning would be interesting to read for sure assuming it built from the basics to the more complex corners of the academy of knowledge surrounding the topic.

Keeping writing a priority

Right now my focus is on working three different pieces of content to completion. 

  1. On Machine Learning – this book needs some work to update the footnotes after it comes back from the editor
  2. Considering product choices – this future paper needs to go from a talk based slideshow (PowerPoint) to a paper
  3. I need to finish the slides for my “The next 5 years of ML in the healthcare space” talk

Each one of these needs a different level of care to complete the effort. One of the things that I need to focus on continuing throughout 2022 is keeping at least 3 writing projects open. This obviously does not include my daily writing efforts or the one off projects that get sparked from the flames of imagination. This list of things that I’m working on will continue to be a living list that will undergo change. From here on out I’m just going to write about my projects. Taking that course of action will give me more content to write about and help focus on thoughts on the task at hand. Keeping a list of open writing projects helps me take my time on Saturday and Sunday morning and focus it on something more academic. One of the things that is very important to me is to start turning more of my focus to writing academic papers and manuscripts moving forward.

Papers for the sake of papers

More papers are getting published than any human could possibly read. Those publications are getting stacked up across many different fields and in the case of machine learning the sheer volume of content is staggering. You could try to only focus on a specific journal or two, but some of the most cutting edge research barely goes into the journal system anymore. A lot of it is just sort of pushed out online and those cutting edge researchers are on to the next project. It feels like a vicious cycle of papers for the sake of papers. My efforts to communicate and share my thoughts are generally focused on the medium I’m using and having some reason to share it with people. Within that framework of the necessity to communicate something is hopefully a better line in the sand for what should end up in a paper. We may hit an inflection point where only the top researchers in a field are able to pull together references and share content in a way that is widely read and dispersed. It would be a method of gatekeeping by sustained successful communication, but this could create a type of bubble around a top set of researchers and it could very well obscure the future edge of technology. 

This is a topic that I’m really concerned about obviously. I have spent a good portion of my morning thinking about the future of academic research and the fragile current nature of the broader academy of academic thought. Intergenerational equity within the academy is about the effective storage and sharing of knowledge across the shoulders of giants as the intersection of technology and modernity occurs. Solutions to that quandary are probably beyond any single weblog post or thinking session. It will take a collective action within the academy to rebalance the means of communication toward something new. Somebody within a major field will need to hold some type of conference, lead a nationwide chautauqua, or create an institute to begin that process. Ultimately the system of introducing knowledge in any academic discipline involves lectures where a profession reduces a mountain of content into a presentable set of mapped coursework. That process sometimes ends up in books being published and other times a few of those textbooks become the standard across a discipline. Even the best ones either evolve over time or are replaced by the next set. That is a natural part of communicating the essence of an ever growing mountain of knowledge. 

I keep thinking that maybe every discipline will end up with a sort of encyclopedia of knowledge for that core area of exploration. Just like people built out giant tomes of knowledge to share content when print was the primary medium of communication, some type of modern encyclopedia for a field could provide a foundation for begging to understand a vast accumulation of knowledge within a field. You have to have some way of opening the door to people wanting to learn about the content, but most of them cannot start at the end of the stream of knowledge by reading the latest work by the foremost experts in the field. They need some type of foundational knowledge to be able to understand and consider that work at the bleeding edge of what is possible. In some of the sciences reading the mathematics presented on the page alone requires a certain amount of knowledge before it could be comprehended. My abilities in mathematics are decent, but occasionally when reading a machine learning paper the mathematics on a page are daunting and take me a bit to try to figure out exactly what the researcher is trying to communicate to me as long strings of math are not annotated and commented like code to help people along the way of reading them from start to finish.

Thinking about research schedules

Some of my thoughts have drifted toward the productivity that having due dates from classes provided me over the years. Maybe I need to start thinking about turning the cycle of writing a paper into a scheduled thing like a class. A few different examples exist related to creating timelines for writing a research paper. Waiting for the paper to organically develop is not really working at the moment. It may very well be time to start working toward a new writing plan that involves a research schedule just like starting a class and working toward turning in a research paper at the end. I’m going to start working toward an August 16, 2021 start date for a planned cycle of research. 

I sort of started my effort by going out to consider literature reviews: