AI institute develops new, free, science source engine

Backed by Microsoft co-founder Paul Allen, an Artificial Intelligence institute has launched a new, innovative and perhaps most importantly – free, science search engine.

Oren Etzioni, chief executive officer of the Allen Institute for Artificial Intelligence. (CC BY-NC 2.0).

Oren Etzioni, chief executive officer of the Allen Institute for Artificial Intelligence. (CC BY-NC 2.0).

If you’re looking for published papers, there are quite a lot of search engines out there. Google scholar is the most well known one, but engines like PubMed and arXiv can be incredibly useful. But even so, if you ask most people, they’ll tell you that things can still be greatly improved. With that in mind, the non-profit Allen Institute for Artificial Intelligence (AI2) in Seattle, Washington set out to develop a new approach.

Their product, Semantic Scholar, wants to do something different: understand a paper and its content, providing what its name says: semantics.

“We’re trying to get deep into the papers and be fast and clean and usable,” says Oren Etzioni, chief executive officer of AI2.

In order to do this they pick out the most important phrases and keywords, without needing the authors to actually insert them.

“It’s surprisingly difficult for a system to do this,” says Etzioni.

It also indexes which of a paper’s cited references were truly influential, rather than being included incidentally for background or as a comparison, which again, is quite a significant change.

“That’s a really good feature,” says Jose Manuel Gomez-Perez, who works on search engines and is director of research and development in Madrid for the software company Expert System. Semantic Scholar also extracts figures from the papers to present in the search result.

Currently, Semantic Scholar is still in beta, and they’re limited to only three million articles – which may seem like a lot (and in a way, it really is a lot), but when you consider all the articles in all the scientific niches, is only a small fraction. Google Scholar, the largest such engine, engulfs 100 million papers or so. But Google does a surprisingly bad job at understanding the content of a paper.

“Google has access to a lot of data. But there’s still a step forward that needs to be taken in understanding the content of the paper,” says Gomez-Perez.

I could really see Semantic Scholar work – from the ground up, they seem to have a much healthier approach than Google, which just does its own thing; having toyed with it a bit, I feel that Semantic does a better job at finding free, open-access papers. But the problem is, as usual, the non-free ones. Google Scholar can “see” behind paywalls, whereas Semantic can’t.

“We don’t have a way past the paywall — it’s a limitation for us,” Etzioni says. “But we feel the tide is turning. More and more stuff is available somewhere.”

Indeed, more and more journals are going full out open-source, or at the very least providing some open-source option. In 2016, their main goal is to extend in more and more areas, with medicine being a top priority.

“I’ve talked to people who say that doctors are in the emergency room looking up things on Google Scholar on their phones,” he says. “They have what I’d consider a fairly blunt instrument.”


Leave a Reply

Your email address will not be published. Required fields are marked *