Noam Chomsky, the philosopher, cognitive scientist, historian, social critic, and father of modern linguistics, has authored over 1,000 articles and 130 books. The 89-year-old intellectual also has written films and appeared in many documentaries. The substantial work he has done in linguistics and politics has earned him the title of “most cited living author.”
Now his work is the subject of the Noam Chomsky Knowledge Graph, the first Semantic Knowledge Graph for a public figure. “Doing a Semantic Project of all he has written or said is a fabulous tribute to a man who has made a big contribution to the study of language and its meaning,” says Fred Davis, Executive Director of the Chomsky Knowledge Graph project.
Franz Inc., a leader in graph database solutions with its AllegroGraph technology, and Semantic Web Company, developers of the PoolParty Semantic Suite, are partners in the project that will be hosted on the Internet Archive. In addition to Chomsky’s published works, media interviews, and movies, the project also will include personal papers that he has donated to MIT, where he has been a professor for 66 years.
The team began working on the Chomsky Knowledge Graph about a year ago and the plan, says Davis, is to launch a beta of it next year, and then continuously improve on it. About one-third of Chomsky’s books have already been scanned — and he’s still writing. So, it’s no surprise that the project has a way to go.
The Internet Archive, whose Open Library project is working to build an open, editable library catalog for every book ever published, was recently designated as a U.S. public library and it is helping to scan Chomsky’s work. A researcher (or anyone else) looking up something in Chomsky’s work can “check out” one of his books from the library — and the collection of his work that will be accessible for checking out will continue to grow.
“The Internet Archive has been backing up the internet for 20 years,” Davis says. The Noam Chomsky Knowledge Graph is going to be a pilot project for the nonprofit organization to look at the value of semantic analysis for the data it has compiled.
“We are hoping to create a new type of tool,” Davis says. “A Knowledge Graph is more accurate and accessible and valuable than a simple bio would be. It brings together everything a person has written and done, and the great thing about semantic technology is the idea of deep linking.”
Existing unstructured and structured data from an author’s books or transcripts of videos, podcasts, and the like can be linked together in a semantic layer. Additional information from semantic sources such as dbPedia, Wikidata and Geonames can be brought in in RDF triple store or semantic database formats for linking, too.
“A good thing about Semantic Technology is that it’s easier to add new information than if you have a highly structured database. That’s because of the way things are stored in triples — where you have a subject, predicate, and object relationship—so you can bring in new information that instantly connects to other information,” says Dr. Jans Aasman, CEO of Franz.
An author’s works and sources that have a relationship to other of the author’s works or to external data will become fully searchable in the context of topics and concepts, readable in excerpts, and easily available to journalists, scientists, technologists, students, philosophers, and historians as well as the general public.
There’s great value if the Internet Archive starts to apply Semantic Technology to the Open Library project and the WayBack Machine that is the history of the web. Even simple linking can create a real resource and enable more sophisticated querying. With semantic linking in place, the Wayback Machine could provide valuable fact checking, too.
The interconnectedness enabled by Semantic Technology as delivered by AllegroGraph and PoolParty Semantic Suite will make it possible to discover, for example, that a term Chomsky talks about is related to other terms, says Aasman. “It’s possible to find out what are the hidden relationships in his thinking,” he says. As an example, if a person in Chomsky’s work is discussed many times, there is a higher chance that Chomsky also will talk about the country the person lives in.
When website access to the Chomsky Knowledge Graph becomes available, Davis looks forward to having added other semantic linking-based capabilities. For instance, when a user researches and pulls out a passage from one of his books, the endnotes will be displayed right next to the quote. “Because of the semantic nature, you could get far more relevant quotes” in the first place, he says.
As of yet, Chomsky is only tangentially involved in the project, Davis says, but next year, when a vast amount of his work is available in the Knowledge Graph, the team will be able to show him something very powerful for him (and others) to use.
“This has been a labor of love and passion for us,” says Davis. “We’re hopeful that this will serve as inspiration for other projects in similar areas.”
Image used under license from Shutterstock.com