Google recently presented the so-called Knowledge Graph: it is aiming at identifying the meaning of search queries and the related results, rather than simply matching a text with the Web pages including the same words. Google used to be essentially an empiricist machine, crafted with almost no intrinsic knowledge, but endowed with an enormous capacity to learn associations between individual bits of information. Approaches like this usually rely on people’s helps in tagging Web contents to infer meanings.
Imagine also computers understanding web pages, for example by using an application allowing the machine (no matter the language) to identify the different items that make up Web pages (e.g. images, videos, text, music, or ads).
Diffbot, a startup, based in Palo Alto, California, is about doing that. It offers API allowing machines to “capture” the purpose of various objects of Web pages. Basic idea is using the visual learning technology of self-driving cars and applying it to Web pages. For example: on article pages, Diffbot can pick out headlines, the text of articles, pictures, and tags; and on home pages, it can determine basic layout elements like headlines pictures, links to articles, and ads. Obviously there are many more types of Web pages than two: the company is working on about 18 main types. This could enable for example making topics-comparison sites, or reshaping contents for mobile apps, or other applications.
Now let’s make another step: imagine computers, not only understanding web pages, bus also autonomously contributing to the self-creation of a workplace social network. Think about an application tracking opened web-pages (with their items), documents (internal on the laptop) created or read, assigning topic words to each item by mining its content, and then computing similarities to create knowledge maps and family trees centered around people and subject areas. People will be automatically connected based on this self-created workspace through inferred attitudes, interests, expertise, etc.
HP’s Collective Project is doing exactly this to foster collaboration within organizations: a sort of social network by data mining the desktops.
In the future, data mining machines and the Internet will allow “networks” to self-emerge and morph dynamically. Let’s be ready for this.