One hundred years ago, the best way to stay on top of science was to hang out in the Bodleian Library in Oxford. Now there's Litmaps.
9 minute read
Everyone could do with more science in their life. Imagine a world where doctors knew all the latest medical discoveries, entrepreneurs had complete knowledge of economics, and humans knew everything there was to know about health and productivity.
Making scientific research legible is a big deal. Although accessibility has always been a hard problem, the existing tools and processes are becoming less useful in the modern context and are falling behind the times.
In 1900, the best way to stay on top of science was to hang out in the Bodleian Library in Oxford. When you wanted to understand a topic, a librarian would point you to a physical document or collection. If you wanted to follow up any references from that document, you would then return to the librarian and the circle of life would repeat.
The modern Bodleian Library is the internet, the modern document is digital, and the modern librarian is Google scholar. But aside from these details, the process is essentially the same.
But this process is not scaling well. The volume of scientific papers goes into the hundreds of millions, and each additional year brings a further one million papers to Pubmed alone. And this rate is accelerating by 8-9% per year. Finding papers on a topic is not the problem. The problem is efficiently navigating between papers and understanding the scientific literature in aggregate.
Suppose I want to understand this deep learning thing I keep hearing about. If I type "deep learning" into my preferred academic search engine I get something like this:
What do I learn from these results?
However, I'm missing some important context:
Another problem: all the top results are called exactly "deep learning". How can I decide which to read first? (In fact Google scholar is actually confused here. The first entry is actually a review of the book "Deep Learning" by Goodfellow et al.)
Let's say I choose to read the third entry. While reading, I think reference 12 seems interesting. To learn more, I scroll down to the reference, and scan for number twelve.
Perhaps I can decide if the paper is worth reading based title, year, and authors. It's more likely I will have to at least read the abstract. In this case I have to copy paste the title into Google scholar, and a link to the pdf or html (assuming it isn't behind a paywall).
Once I've found some papers I would like to read, I will typically either download the pdfs and organise them in a new directory, or print them out to read the physical copy. As I read through them, I will often want to take notes, either with document annotations, or in a separate file.
Often I need to turn my reading into a bibliography, either because I'm writing a literature review or because I want to add sources for some factual writing. If I'm using a reference manager, then creating a bibliography will be fairly straightforward. If not, then I'll have to go back to Google Scholar for each paper, and copy paste the citation details into a document or BibTeX file.
I'm sure you'll agree: this process could be improved.
Litmaps is motivated by the theory that the status quo approach to reading scientific literature (as described above) has two major problems.
The first problem was steps which could be automated with software needed to be done manually. These include:
The second problem was a lack of context. The user had little sense of how any specific paper fitted into the scientific literature as a whole. With visualisations, the following context could easily be communicated to the user:
We built the Litmaps app to fix these problems. Litmaps uses a database of over a billion citation connections, and a sleek modern web interface to make navigating and understanding scientific literature not just easy but actually enjoyable.
When we run our "deep learning" search on Litmaps, we get the following:
Unlike Google Scholar, we can quickly learn from looking at this page that:
We thus have the context we were missing from the Google Scholar results. From prior knowledge, I have some guesses about why the literature looks like this:
We are thus able to quickly form a mental landscape of the literature. As we learn more about specific papers, we can incorporate this knowledge into a greater schema.
By hovering over each circle in the "literature map" visualisation, we can quickly find additional details:
By hovering over a few circles in this manner, we can quickly build up an intuition of the distribution of citations. LeCun 2015 is an outlier with tens of thousands of citations. Most of the other large-ish circles are in the hundreds of citations, and most the small circles have ten or less citations.
We can click on LeCun 2015 to bring up even more details:
The panel on the left displays the abstract and list of citations for this paper, and a list of other papers which cite it. On the right we have a visualisation of the papers which cite this paper.
From this panel we can easily navigate to
Thus automating the navigation problems encountered when using Google Scholar.
When we find a paper which is important for our task, we can add it to our project by clicking the "add" button:
This automates the process of keeping track of important papers.
Once we're ready to do something with our project, we can export a bibliography as text, BibTeX, or an image of the literature map as PNG or PDF.
We're still not done making things more efficient. To recap, when you're trying to get a foothold on a new field, a common procedure is:
To save you the trouble of going through all the references yourself, we've created network analysis tools for Litmaps which does it for you. After you've added some papers to your project, you can activate the "suggestions radar". This scans through all the citations and references (collectively referred to as "citation connections") of all the papers in your project. A paper which has multiple citation connections to your project is a candidate recommendation, and the more citation connections it has, the higher it is ranked as a candidate. This effectively triages papers which are navigable from your current set of papers, so you can prioritise those which are "deeply" connected to your research.
Litmaps also lets you combine keyword search with network analysis. The "relevance search" tool displays search results prioritised by the number of citation connections to the current project.
This can be very useful for finding papers in the intersection of disciplines. For example, if we want to find papers on using Bayesian statistics for forecasting, we can create a project with several papers on forecasting, then do a a "relevance search" for the keyword "Bayesian". This will show us papers with the keyword "Bayesian" which are connected to the forecasting papers.
Relevance search is also useful for disambiguation. A keyword may have different meanings in different disciplines. By populating a project with papers from the target discipline, running a relevance search with the target keyword will only return results which have some connection to the discipline.
Network analysis of course does have its downsides. The biggest problem is the Mathew Effect, or the propensity for the "rich" to get "richer". In the case of scientific literature, this means that other things being equal, the most citations will go to the papers which already have many citations.
Stigler's Law of Eponomy illustrates how pernicious the Mathew Effect can be in science. Stigler's Law states that no discovery is named after its original discoverer. Instead, discoveries tend to be named after famous scientists, either because many people first encounter the discovery only after a famous person starts talking about it, or just because the story works better when a discovery was made by a famous person. Examples of Stigler's Law include Hubble's law, the Pythagorean theorem, and Stigler's Law itself.
Clearly assigning credit according to existing reputation rather than originality is unfair and creates perverse incentives. As we continue to build out Litmaps, we will be thinking carefully about how to combat the Mathew Effect. By making the citation network more visible to researchers, we hope that contributions can be more accurately traced to those who really deserve credit.
Exploring scientific literature version 1.0 was a librarian helping you find physical papers in a library and using reference lists to identify papers worth reading. Version 2.0 was using an academic search engine to find digital papers, but still using reference lists to identify papers worth reading. Version 3.0 will be fluidly moving between digital papers in an interactive citation network, with visual context cues, and algorithms guiding you to papers worth reading. At Litmaps we are building this future.
Early Access now available. Try Litmaps and Discover Science Faster.