Cognitive neuroscientist Christopher Madan says open-access data or data that is freely shared among researchers to use in their studies can not only save time and money, it can enable scientists to "skip straight to doing analysis and then drawing conclusions from it," if the datasets they need already exist. Madan works as an assistant professor at the University of Nottingham in England, where he studies the impact of aging in the brain, focusing specifically on memory. He started using open-access data in his work about three years ago.
Given the stiff competition for funding, scientists like Madan are turning to open-access data as a way to expedite their own research process as well as the work of others in the field. Madan says there are various benefits to using open-access data in research — namely, it provides researchers with large and diverse datasets that might otherwise be difficult to obtain independently. This pre-existing data could help them make inferences about generalizing the results of their studies to larger populations, he says. Making research data freely available, however, isn't such a straightforward process. In some cases, especially when researchers use patient data in studies, they must take steps to anonymize it, he says, adding that "we also need to have balance, so we don't become too dependent on specific open datasets."
We spoke with Madan about how he uses open-access data in his research.
Ambika Kandasamy, Shareable: I attended a talk you gave at the MIT Media Lab in August about some of the benefits of sharing research material such as MRI datasets with other researchers. What compelled you to move towards this open and shared approach?
Christopher Madan: To some degree, I'm more a consumer of open data than adding to it. The main plus is that the data is already there. Instead of, I have an idea and then I have to acquire the data — both applying for grants or somehow getting the money side sorted and then having a research assistant to put in the actual time to get them — people to come in and be scanned. Scanners are kind of expensive. All of this would take, on the optimistic side, I'd say several months or more into years, if I wanted to get a sample size of like three, four hundred people.
But for the sake of just looking at age, datasets exist. It can take a few minutes to download, maybe into hours depending on which one and how much other data I have to sort through to organize it into a way that is more how I want the data organized to be analyzed. It's still in the scale of hours and maybe days versus months to years. Then the analysis on that going forward is the same at that point.
In an article in the Frontiers in Human Neuroscience journal, you wrote that "open-access data can allow for access to populations that may otherwise be unfeasible to recruit — such as middle-age adults, patients, and individuals from other geographic regions." Could you elaborate on that?
The maybe more surprising one of those is the middle-age adults. People in their 30s to 50s could generally have jobs and families and are busy, so it's harder to get them to be in research studies. If we're interested in aging, getting young adults that are effectively university student age, they're relatively easy to be recruited in university studies because they're walking down the halls of the same places that the research is done. Older adults, to some degree, can be easier to recruit. … But middle age adults have a lot less flexibility of their time. Even if they're interested, they have a lot of other commitments that they have to balance. It's just harder to get them into research studies. Now, it's not that they're impossible to get. It's just effectively lower odds for that demographic. If people have already spent the effort of trying to get them in, then we should take advantage of that data and not just use it for one study and that's it, but answer multiple research questions and try to get more out of the same data that's already been collected.
In the article, you also mentioned that you keep a list of open-access datasets of structural MRIs on GitHub. Have other researchers contributed to this list?
Yes, they have. I initially made a list of basically just stuff that I knew. One morning, I was like, "maybe I should do this." I was keeping track of things, but every so often, new datasets get shared. How much can you keep in your head or keep the PDFs related to these in a folder? It's not that great of an organization. So I thought, maybe I'll make a list where I'll say the name — some of them have shorter abbreviations, so a spelled out version, a link to where that data actually is, a link to the paper that kind of describes it, some notes about what kind of MRIs are with it or how many individuals are included in it, the demographics — is it all young adults or old adults — that sort of information. I basically just made a list of it and put that online. Other people found it useful. Some people needed parts of that but not others, or generally didn't think about open-access data as much until that point. Here’s a list of them. You can look up what’s there and what might be useful to you and take advantage of it.
Since then, some that I basically didn't include, that I didn't know of or didn't think of or whichever, that other people are involved in, they requested to add themselves to the list, and I approved that. Other ones, people that aren't just involved in the data collection of it, but knew of that weren't in the list, contributed to it. It's grown a bit since then, particularly I'll say from other people's additions, which also shows other people are looking at it and making a note of it. At least you can have people favorite it for later. I think it's about 2,000 or so people have. I think maybe eight, nine people have actively added new things to it, so it's growing a bit. Again, it is a bit of a specialized topic and resource, but other people have found it useful, so that does kind of show that it's not just a list that I made for myself, but other people have found some benefit in this as well.
How could this kind of open-access data accelerate the process of scientific discovery?
I think the main thing is just after having some idea about what datasets exist — as soon as you have some sort of research idea and you can match it onto something of that sort — you can just download the data. In some cases you have to do an application, so maybe there's a week or something when someone needs to approve that you're using this for valid purposes, but you can skip straight to doing analysis and then drawing conclusions from it and writing up a research paper if it went somewhere, rather than having things be drawn out for probably several years.
From your own experience, have you noticed any trends over the years in data sharing among researchers?
There's definitely more open data now than there used to be. That's great, both in terms of more people using it, but also just more people sharing whatever data they've been collecting anyway. From more personal analysis, talks with researchers that have not shared data yet, but have been thinking about it for data they've already been collecting — can they share it because in terms of consent of what the initial participants gave? Would that include sharing of their data when that wasn't explicitly asked? Even if that doesn't and they're working with more medical kind of patient data, then you can still plan forward and say, "okay, what do we need to add?" A couple of extra sentences to the consent form to allow for this at this point forward even if we can't do it retrospectively. People are thinking about it even beyond just what's kind of more apparent in terms of what data is actually available today — little more behind the scenes. The field is shifting in that direction. It'll just continue along that trajectory.
This Q&A has been edited for length and clarity. Photo of Madan [top] by Dan Lurie and [left] by Yang Liu. This is part of Shareable's series on the open science movement. Further reading: