In the 11th episode of the Common Stacks podcast, Heather chatted with Josh Nicholson about his product, Scite.ai, a deep learning platform that evaluates the reliability of scientific claims by citation analysis.
Scite is a Library Lever vendor partner, so if you are interested in learning more about them, or adding them to your offering, please get in touch with us, and we can have a conversation about how we can help.
Have a comment or show idea? Want to leave some kudos for a friend or colleague? Leave us a voicemail and we'll feature your shoutout in a future episode!
Rough Transcript for Episode 11: Josh Nicholson at Scite.ai
Hello, and welcome to the Common Stacks podcast. This is the show that brings together professionals from within the library world, as well as interesting experts from other professions to engage in discussions around the issues affecting libraries and looking at the ways in which libraries are dispelling the myth of, well, this is how it's always been done.
I'm your host, Heather Teysko. And this is episode 11 with Josh Nicholson. Josh is the co-founder and CEO of Scite. Scite is a deep learning platform that allows anyone to see how a scientific paper has been cited. Not just how many times specifically also, if it has been supported or contradicted. Previously, he was the founder and CEO of the Winnower and CEO of Authorea, acquired in 2018 by Wiley, two companies aimed at improving how scientists publish and collaborate. He holds a PhD in cell biology from Virginia tech, where his research focused on the effects of aneuploidy on chromosome segregation in cancer.
Heather:
Before we hop into the interview, though, I want to invite you to our launch party library. Lever is getting ready to launch and remember we're a new kind of buying club for library procurement. We're having our launch event on June 15th at 2:00 PM. We have some great speakers, including New York Times bestselling author, Jeff Goins, discussing how to manage change with creativity, and the Genius Academy Founding President Rocco Shields. Also our own Rob Karen will be giving some brief info about library lever, our mission, and how we can help libraries build sustainable equitable collections. Learn more and RSVP at LibraryLever.com/LLlaunch
Heather:
So Josh, why Scite? What problem is it solving for researchers and for libraries?
Josh:
The short answer to that is that I think Scite is, is helping with information overload and discovery and, and understanding the longer, you know, answer is that there's massive amounts of information. It's hard to tell, you know, what is right? What is reproducible, who is leading, who is doing reliable research and Scite helps organize all that information.
Heather:
And so what Scite does is it lets people see what articles are being used in other places, what citations are being used. And can you explain a little bit more in, in detail?
Josh:
So citation indices have been around for a long time. And you've long been able to see what articles reference one another, but to see exactly how they reference one another and what they say about each other is, is very difficult, right? If you want to look at a paper that has been cited a hundred times to see what those hundred citations say, where those citations exist in the citing papers would require you to open, you know, a hundred separate papers and to interpret it and to understand it. And so we have made that much easier. So by showing the citation context taken from the full text of the citing article, so you cannot just see, okay, these two paper cite one another, but you can read directly how, you know, one is citing the other what does it say? Where does it come from in that citing paper?
Heather:
Gotcha. So this is being used by students doing research. Can you give me some use cases?
Josh:
Yeah, so I, students were, you know, always kind of we thought students would use Scite, but I didn't think they would use it as early as they've been using it, down to the undergraduate level. And so that caught us by a bit of surprise. I think students primarily use Scite to help with understanding, right? So they've been tasked with writing an essay on this topic or this specific paper or this book. They can look those things up in sight and then they get to see what have experts said about these topics, these chapters, these articles by looking at the citation context. And so they get to understand by looking at all these different interpretations and discussions of, from the literature. So from experts. So not just from other student essays, which would be valuable in itself, but literally from peer review publications. And so I think it helps not around looking at reproducibility, which a student might not care so much about but really around, you know, understanding research better and that that helping them to inform their, their own writing and their own research if they're doing it.
Heather:
Gotcha. It gives a lot more context to it then.
Josh:
Exactly. Yeah. I mean, I liken it to that you're wearing glasses to kind of like x-ray vision for scientific articles. So you see a lot more which is there and, and maybe you could look at in more meticulous and invasive ways. But, but we, you know, make it easy to see it.
Heather:
Gotcha. Why did you start Scite?
Josh:
So I initially didn't want to start Scite. I wrote a paper suggesting that Thompson Reuters or Elsevier, NCBI, they should start the idea behind Scite. And this was in response to growing concerns around reproducibility. So around 2015, or actually maybe 2012 there was, you know, some big papers that came out of large pharma looking at not just cancer research, but a lot of different biomedical indications and what these large pharma wanted to do over the course of, you know, many years was to reproduce the findings from the literature in house so that they could develop drugs based on it.
Heather:
And you were studying this at the time you were doing your PhD in this, right?
Josh:
I was looking at cancer research. And so I was, I was doing a lot of studying various cancer cell lines. And what these large pharma wanted to do was to build off of the literature. Right. So not to read the studies, but to use it, to develop, you know, possible therapies for different diseases and things like that. And what they reported was that, you know, when they went to reproduce, so first validate these claims from the papers in house. So spending a lot of money and a lot of time and a lot of energy and effort most of the times they couldn't do that. And, and so that was pretty surprising because we're not talking about kind of studies that were not, you know, UN they were major studies, right?
Josh:
Like, so from top labs, if you will that got my attention, got the attention of the scientific community and even got the attention, you know, really of, of the world. And it was, you know, documented in New York times and Washington posts and things like this because if most findings are not strong enough to build off of, you know, we're wasting a lot of time energy people's lives, you know, people go into clinical trials, things like this, they take drugs, there's tons of money behind that, you know, the NIH alone, disperses about 40 billion per year. And so even improving that modestly, I think would have very large, you know, effects. And so I knew within my field, certain papers that kind of fit this problem or this criteria in that they were, you know, published in, in arguably one of the world's best journals came from top universities, like MIT had been highly cited, so, you know, 300 plus citations, but I knew specific examples where papers, you know, like this had been challenged by others and shown to be, you know, not reproducible by, you know, three or four different labs.
Josh:
And so I wanted to or we wanted to surface that information from that, you know, avalanche of citations so that you could see here's this paper or the study that I'm interested in. Has it been tested and if it's been tested, has it been supported? Has it been challenged? Has it been used?
Heather:
That's really valuable, obviously. What's the feedback been from researchers and from students how, how is, how has it being received?
I can imagine some people might be upset about this. Some of those people whose articles are being challenged.
Josh:
So initially, initially the idea was to introduce this reproducibility metric based off this right, right. Which we call the R factor. And we gave talks during like lab, you know, meetings and would go to different lab and say, Hey, here's this idea. And people were resistant to it cuz they said, oh, I can see myself being measured by this now for funding for all these things. Yeah. And so, you know, I think we've evolved kind of our thinking and it's not just that like, so we don't have this R factor for papers or for researchers, we drop that. But I still think, you know, at the core of it, yes, people can see if a paper's been challenged more easily through Scite or if it's been more supported. And so I generally, and maybe this is a bit glib to say, you know, people either love us or hate us.
Josh:
And that really depends if we like, you know, kind of feed their ego or, or challenge them. I think the interesting thing that has emerged from this at least anecdotally for me, is that, you know, contrasting citations don't have to be all bad, right. If I'm a researcher and I've contrasted my previous work, I, as a reader, find that very valuable to see that this researcher is not just putting out perfect, nice narratives. They're really, you know, documenting their process. They're changing their mind, they're changing their, their, you know what they're reporting based on new evidence. And so when I don't see that or when researchers kind of come to us and say, no, one's ever contrast to my work, you know, I, I, I tell 'em to think about that for a second, because that means you yourself have never, you know, built off the door. And I think, you know, that's really part of, of research is to have debates, to learn and to change, not to just put out trophies that stand the test of time that no one should ever, you know challenge.
Heather:
Right? Yeah. If you're, if you're not developing your research over time, you're, you're obviously going to have things that are changing and that's what science does.
Josh:
Yeah. And I mean, I would like to kind of get the, so I think, you know, we reward people with highly cited papers and things like this, and you, you know, even in our kind of paradigm would reward people with supported papers. I think it'd be interesting to get some of these researchers that are contrasting their previous work to kind of get that, tell them to get them, to tell the story behind this. Right. And so to really kind of make it more, you know, human and personable about how research is actually done. Right. We have this idea of how the research is done and, and really it's, it's, you know, quite far from the truth of like, a hypothesis, I went through the scientific method one by one, right. It's really kind of like, I have a hunch, you know, I go through all this literature, I try to find things that support or challenge me and then, you know, kind of bobble along one step at a time.
Heather:
Right. And then you get new information and you might have to go back and revisit things. And yeah, that would be really interesting to hear people share about the development of their work over time.
Heather:
So are you take I'm you said people either love you or hate you and also more undergraduate students using this, then you had anticipated, are you taking some of these things you're learning into account with like the roadmap of the product? Like how is that changing things?
Josh:
Yeah, we really tell researchers to, to be brutally honest with us because if they don't tell us what they're actually thinking, it's hard to improve upon that. And so, you know, this has changed from small things like the color that we used for a contrasting citation. It used to be red. It used to have this X icon, which was maybe too inflammatory. It's now blue and a question mark. And so that comes from, you know, user feedback, maybe angry user feedback. But then we've also, you know, kind of listened to users to say, okay, here's what they're trying to do with the tool. How can we make that easier? Right. And so really, it's just a matter of taking these vast, you know, kind of world of information and making it more understandable, more discoverable, more digestible and, and going beyond just kind of search match on titles and abstract. And then here's a list of titles that you have to go sift through, you know, still manually. So yeah, a lot of feedback, a lot of user research also, because I think, you know, while we have a lot of sophisticated machine learning and, and AI like getting the user interface is arguably just as important and maybe just as hard, really, because if someone doesn't understand, you know, this classification, it doesn't matter if your class classification is perfect for example.
Heather:
Gotcha. And where all like, so I know you have like an extension, a Chrome extension - where, where all are you deployed? Where all are people using you?
Josh:
So our aim has, I would say broadened, or maybe become more ambitious than just reproducibility. And I think really we want to be the next generation of citations. And so, you know, that means not just being the standalone tool on the side, but working with publishers to display our quote unquote smart citations on the version of record.
So we're, we're live on millions of articles at this point. So Wiley was our first one, there's hundreds of journals that display our smart citations on Wiley. There's a Royal society proceedings in national academy of science. And then there's more rolling out and we primarily show citations by types. So does it provide supporting evidence contrasting evidence or does it mention it, but we've also made it possible to show citations by section. So do these citations come from the introduction section? Do they come from the discussion section? Do they come from the method section, et cetera? Mostly because I think, you know, there's some publishers that are hesitant to show kind of contrasting citations on their article if you will because this is what they're trying to, to work with.
Heather:
And then just personally, a personal question. This is not your first rodeo with this kind of thing. This is your third, second business you started, right. Third business. You've been a CEO of yeah. How has this changed from the first one? The Winer, right? Yeah. How, how, how are you growing as a, as a CEO and how, what are your challenges that you're finding with this, that you didn't with earlier ones?
Josh:
Yeah, so, I mean, that's a, a good question. And so I think, you know, I've long been interested in kind of how research is done. So how is it published? How is it funded? How does peer review work? Because I think all that stuff, which is sometimes seen as separate or to the side of the actual research is not right. It's, it's, it's just as important, right? If you don't have money to do this experiment, you can't do this experiment. You never publish it. You know, maybe you never even discovered it. Right. and so all these things are critical. And I think looking at those systems of how they work can have big, you know impacts on, on everyone's research, not just on my own. And so during the last year of my PhD or always kind of on the side, I was doing these kind of back of the back of the, the napkin, you know, studies really, so looking at is, is private funding and, and public funding.
Josh:
Are they going to the same researchers? Is it more diversified across these different pools? Is peer review, you know, limiting innovative ideas, things like this. Ultimately, you know, I was writing papers around this. I, I decided maybe I should try to do something beyond just kind of describe the problems and suggest solutions for, for other people to do. And so very naively and idealistically. I started the winnow and so I didn't know anything about starting a company about raising money really kind of fell into that. And I think that's maybe a good thing because it's a challenging space to work in. Right. And, and so I was, you know, again, very idealistic and naive. I wanted to fix all of things with, with publishing. So go against Elsevier and say, this is you're doing it wrong. Let me show you how to do it.
Josh:
And so the Winnower was really trying to fix or improve how we peer review articles. And so peer review I would say, is, is quite valuable. It's not perfect. But it's also, you know, done behind closed doors even. And when it works to find kind of a major error that article, you know, ultimately gets published somewhere else, and those peer reviews kind of are, are lost. And so it's also quite slow, right. You know, the average time from submission to publication is over nine months. And so, I mean, literally you can conceive a human and, and, you know, give birth quicker than you can you know, a scientific idea and, and publish it. And so the idea was to publish things first and then winnow the good from the bad via open post publication, peer review challenge was, you know, people don't just come and magically leave reviews, you have to organize this.
Josh:
You have to kind of, you know, almost like really push them. And then they don't necessarily want to do it open. And so the winner were pretty early on shifted to publishing gray literature. So things that, you know, traditional publishers were ignoring and we gave them traditional identifiers, like a digital object identifier and permanent archival to say, a blog post, a student essay, a journal club even, you know, Reddit science, AIS, we formatted his papers and, and, you know, published them. And so this worked because no one else was really doing that. And really, because it was kind of the prestigious form of that. So like, you know, we could say here, you can now get an alt metric score and citations like this for things that are, are otherwise not counted if you will. But again, you know, I was very kind of early.
Josh:
I didn't think about building this into something sustainable early enough or didn't think hard enough about it. And then it's hard to raise, you know, money and it's hard to do anything without money in this space. And so I've definitely learned kind of what not to do, you know, what to focus on. And I think that's been very beneficial for Scite and, and really, I kind of see 'em all as an extension of one another is trying to improve the tooling or the processes, you know, around research, not just, you know develop a single drug for, for kind of one thing.
Heather:
Right. And then you started Scite during the pandemic -
Josh:
So it was, it was before it, so, yeah, I mean, it's, it's interesting. So started it, you know, in, so we wrote the idea back in 2015, kind of had work on it on the side for, for years, really like, you know, just going, here's this paper, how do we know how it's been cited? Can we tell if it's been supported or contrasted as humans? And so manually going through hundreds of papers, opening them, reading the in tech citation and saying, okay, this paper is indicating that they support it or that they refute it. We were using refute at that time by reading it right. And so we manually built up this kind of like data set to say, okay, we looked at a bunch of papers that we kind of know are, are maybe untrustworthy or have been challenged. And we think we can do this manually. It's just hugely time consuming. Right. It would take hours if not days. And Selma's just like, I can't, I'm not going to look at a thousand papers.
Heather:
Right. You remind me, I have this image of like those in movies where somebody, when they finally solved the mystery and they go into the person's room and there's like the magazine articles with like the threads all over, like, to that, like, that's kind of what I, the, the vision I have in my head right now, as you would like these all like linking to each other,
Josh:
It's not so far off. I mean, it was like a spreadsheet. And I mean, I can vividly remember myself doing this in coffee shops where it's like, okay, I drink a coffee and have a bagel and just open up these things, because it wasn't so hard. It was just very time consuming. Again, like access is challenging. You'd have to get access to this article. You'd have to open it. You'd have to find it. You'd have to read it and say, okay, mentioning, do that again, you know, and that quickly adds up. And so I've been thinking about this and kind of working on the Scite on the side since 2015 and then, you know, the previous two companies the winnow winner and authority, it didn't fully work out. And, you know, as those were kind of, you know, having their, their conclusion either joining, you know, a publisher or, or closing down I, I started full time on Scite. And you know, there was, you know, really, I think that wasn't just the timing of like, not working on the other ones. It was finding one of our co-founders who had enough technical know-how to kind of prototype this out. And I think as soon as I saw the prototype, it was like, okay, wow, there's, there's something there. We just need to figure out how to do this more systematically and to, to grow it from a hacky prototype to, to something that works.
Heather:
Gotcha. Interesting. okay. And then I'm going to ask you we ask everybody what your library love story is, how you fell in love with libraries. It's just like the personal as we're winding down with other kinds of questions.
Josh:
Yeah. I mean, I, I miss libraries cause I haven't been in them in a while. I used to, I don't know how I fell in love with 'em. I think they're always been in like gorgeous buildings and weirdly that, that makes a big difference, but I can remember going to, you know, the library at UC Berkeley as a kid. So I grew up in the, you know, the Berkeley Hills remember just kind of spending time going through the stacks, almost aimlessly and finding things that were really cool and taking home books you know, before that my as a child, we would grab as many children's books as they would let you check out, take 'em home and read them. And so I think, you know, I've long loved libraries. I think I've always had kind of a superficial view of what libraries do though.
Josh:
I always thought you love reading. You're a librarian you can just read all day. And I think, you know, as I work with more librarians, clearly the it's not just that simple, right? You're not just sitting in the library, reading everything you want all day <laugh>. But I do think like a lot of researchers to this day, you know, still don't utilize the library very efficiently or, or really understand it. Right. It's okay. Here's the library they'll help me get access to things when I need access. And that's kind of my extent of it. And I think there's a lot more that libraries do and provide that, you know, is, is not fully appreciated by researchers or access by researchers. And I think that's a big challenge for libraries to, to get over that.
Heather:
Interesting. Interesting. Okay. And then my final question, and then if you want to add anything else, please feel free. But my final question is like, how do people learn more about you?
Josh:
Yeah. So I mean, I, about me or about -
Heather:
About Scite.
Josh:
So I would say you could learn more about, you know, me by, by emailing me and I'm happy to talk with people. One on one I think Scite, you know, is, is going to it and exploring it for yourself is probably the best way. So it's Scite.ai. And then letting us know what you would like to see what you, what is unclear, what doesn't work, what, you know, you think could have bad repercussions, if this does become, you know, huge, all those things. And so, you know, we've had debates with BI and nutritions around ranking and things like this, and we intentionally don't have ranking of universities on the system through some of that. And so we do really listen, you know, we don't have all the answers. We think this is something we want to do with the community as opposed to onto the community.
Josh:
And so I think, you know, these discussions you know, demos, trials, feedback via email tweets, all those things you know, come to us as a team and, and, you know, we, we put it out also to the community to have fruitful discussions. And so I think that's the best way to kind of just experience, you know, what I've just talked about, kind of for yourself and then to engage in, in, you know, this is really a long dialogue because to introduce a next generation of citations involves a lot of different stakeholders, right? It involves publishers, adding, it involves, you know, people acknowledging that is, you know, as these are that type of citation et cetera. And so it's, it's not just us that will do this. It's, you know, us as a community that I think will really kind of bring citations forward.
Heather:
Is there anything you want to add that I missed?
Josh:
No. I mean, I think what I would say is, you know, it's, it's, it's maybe, well, you kind of touched upon this a little bit, but I think it's, it's underappreciated. So I, you know, I think this space is challenging to build in, you know, as a, a founder, if you will, because you know, say you are successful in raising a ton of money. Well, then the community says you've raised a ton of money and you're just trying to do something for money. Raising a ton of money in itself is extremely hard though, because it's not, you know, vice versa. It's not this big as community that has ton of money. And so I think, you know, we need to find ways as vendors and librarians, I wouldn't say vendors or librarians to kind of work together more to, you know, it shouldn't just be, we're just library led tools or we're just vendor tools.
Josh:
I think there needs to be more kind of collaboration amongst these two groups to develop things that work for everyone together. Right. And, and, you know, things that are sustainable, right. There's, there's clearly an ideal world, but then there's also, you know, step back and kind of here's this ideal. And here's something that's practical from this ideal. And I think that takes cooperation with, with different stakeholders. And so I just kind of encourage us all to think about how we can work together, not just as, you know, buyer and seller, but really as, as partner, which I think is -
Heather:
Do you have like a like user groups or like librarians a way for them to give you feedback or anything like that?
Josh:
No, I think this is one thing we've long wanted to do is kind of have some of these, you know, like a stakeholder group, right. Advisory board to have these really is just, you know, we're a small team and this adds, you know, more to, to things. And so I think it's a matter of, of of time before we have that. And, and, you know, again, maybe there's people listening to this that are interested in what we're doing and, and maybe want to kind of join that. And so, you know, maybe this is a good point to this, to target because I think it would be really valuable. Yeah.
Heather:
Okay, great. Well, I'm out of questions and we've gone for 20 couple minutes here, so I appreciate your, your time in sharing your your background and your journey here with me. It's really, really interesting this whole, like you say, you geek out on all these citations and all that kind of stuff. Like, I, I don't, and I, it's so amazing to me when I find out things that are, people's like whole worlds and I didn't even, I've never even thought about it before. Right? Yeah. It's, it's, it's amazing to, to hear your story and to and, and thank you for sharing it. So, so I appreciate
Josh:
It. Yeah. Thank, thank you so much for having me. I, I was, you know, writing something this morning that was looking at kind of the origin of, of citation and disease. And again, there's amazing research and I use sight to find some of this research. So we're eating our own dog food, if you will. And you can find, you know, traces back to like the 12th century and you know, these, this religious texts and you see the images of this, and it's so cool. Right? And so I'm like, I'm like, everyone must think this is cool. And I tweet it out, expecting it to go viral. And people are probably like what <laugh>.
Heather:
Oh, that's awesome. That's so awesome.
Heather:
Thanks to Josh for taking the time out of his schedule to introduce me to the extremely wonderful world of citations. I learned so much and thanks to you for listening, make sure you subscribe. So you never miss an episode, leave a rating where you're listening to this. If you like the show, and remember to come to our launch party, it's on June 15th at 2:00 PM, learn more and RSVP at LibraryLever.com/LLlaunch