During the course of the 4th of July I got a chance to read a few PDFs of papers. Being able to write within the academic tone that papers usually have is a skill. Sometimes the papers include great scientifically based research, but are difficult to follow due to being poorly written. In the machine learning space this is one of those things that happens and could be compounded due to the mathematics the author is trying to share. Within the abstract and introduction things will start out on the right academic footing, but then as the mathematics start to get introduced things will veer off into the wild unknown. Most of the mathematics that gets shared within the machine learning space is not a provable theorem or something that you can easily break down and check. Every single time somebody starts to walk me through a series of equations in a paper I start to evaluate the work. Most of the time you cannot check the actual output given that the equations in the paper are implemented as code. Generally, that code is not shared so you cannot work your way backward from the code to the paper or the other way around to figure out what the author did for the actual delivery of the mathematics.
The other piece of the puzzle that often worries me is that the equation presented in the paper is theoretical as an implementation and they are using an algorithm built into software that was already written. Within this example the author did not implement the mathematics within the code and probably is not deriving a perfect reflection of equation and implementation in the paper. Being able to run code as a part of a software package and being able to work through the equation using LaTeX or some other package to articulate it within an editor are very different things. I would have to work it out with pen and paper and then bring it over to the paper after the fact. Generally, I would be working with a known transform or algorithm within the machine learning space. It would be unlikely that I would be advancing the equations denoting the mathematical machinations beyond the original efforts. Within the coded implementation I might do something in an applied way that could ultimately be represented as a novel piece of mathematics. However, most researchers would be better off presenting the code vs. trying to transform the code back into a series of mathematical representations within an academic paper.
It might very well be a good topic for a research paper to do an analysis of the equations presented in mathematical form in the top 100 machine learning papers by number of citations. It might be interesting to see the overlap in the equations across the paper. Alternatively it might be very illuminating to see if no overlap exists given that it’s possible outside of a shared statistical foundation the researches are not working from a shared mathematical base. A book on the mathematics of machine learning would be interesting to read for sure assuming it built from the basics to the more complex corners of the academy of knowledge surrounding the topic.
Leave a Reply