Posts Tagged ‘Web’

Towards Self-Emerging Knowledge Networks ?

Friday, June 1st, 2012 by Antonio Manzalini

Google recently presented the so-called Knowledge Graph: it is aiming at identifying the meaning of search queries and the related results, rather than simply matching a text with the Web pages including the same words. Google used to be essentially an empiricist machine, crafted with almost no intrinsic knowledge, but endowed with an enormous capacity to learn associations between individual bits of information. Approaches like this usually rely on people’s helps in tagging Web contents to infer meanings.

Imagine also computers understanding web pages, for example by using an application allowing the machine (no matter the language) to identify the different items that make up Web pages (e.g. images, videos, text, music, or ads).

Diffbot, a startup, based in Palo Alto, California, is about doing that. It offers API allowing machines to “capture” the purpose of various objects of Web pages. Basic idea is using the visual learning technology of self-driving cars and applying it to Web pages. For example: on article pages, Diffbot can pick out headlines, the text of articles, pictures, and tags; and on home pages, it can determine basic layout elements like headlines pictures, links to articles, and ads. Obviously there are many more types of Web pages than two: the company is working on about 18 main types. This could enable for example making topics-comparison sites, or reshaping contents for mobile apps, or other applications.

HP’s Collective Project mines documents to visualize how one employee is connected to colleagues

Now let’s make another step: imagine computers, not only understanding web pages, bus also autonomously contributing to the self-creation of a workplace social network. Think about an application tracking opened web-pages (with their items), documents (internal on the laptop) created or read, assigning topic words to each item by mining its content, and then computing similarities to create knowledge maps and family trees centered around people and subject areas. People will be automatically connected based on this self-created workspace through inferred attitudes, interests, expertise, etc.

HP’s Collective Project is doing exactly this to foster collaboration within organizations: a sort of social network by data mining the desktops.

In the future, data mining machines and the Internet will allow “networks” to self-emerge and morph dynamically. Let’s be ready for this.

Are you getting ready for May 15th?

Sunday, April 15th, 2012 by Roberto Saracco

The Logo of ADay.Org

It started many years ago when taking a picture was all about “film”. 100 photographers in the USA were asked to take pictures in each State during a week. That is why it was called USA 24/7: photos taken at any hour during 7 weeks.

Then Internet and digital photography became mass market and America 24/7 was opened up to the public. Anyone could take pictures over that specific week and send them to a central place where theory were analyzed and a few selected to be published in a book. In the week of May 12-18, 2003, tens of thousands of Americans took shots of what they considered highlights of daily life and generated some 2 TB of pictures. An amazing volume of storage at that time, and an amazing amount of transmission over the Internet.

Out of that a book was created and you can still get it on Amazon.

Now it is ADay in the world, a testimony of the spread of Internet and digital cameras all over the world.

Capture daily life on May 15th 2012

On this one single day we ask you to pick up your camera and help us photograph daily life. What is close to you? What matters to you? We will connect your images to images from all around the world, creating a unique online experience where photographs will be shared, compared and explored. Your view on life will be preserved to inspire generations to come.

On May 15th, as you see on the clip I cut from the ADay website, you can take a picture with your camera and send it to the organizers. How many people will do that? A million? I think it will be more, will see what the statistics will be on May 16th.
However, even a million snapshots makes for  some 5TB, considering a mix of photos taken with a reflex camera, more likely to be used by people that aim for nice pictures…, and pictures taken with cell phones. By the way, I hope the organizers will publish also the statistics about the “tool” chosen for taking the pictures.

What is of interest to me is the possibility to cluster and involve all people of the world in a task. Communications and the Web have changed our social relationships, as I mentioned two days ago they have warped space and time.

Studies on collective intelligence as well as those on autonomic systems are pointing to a yet to be explored wold of knowledge and possibilities. The GAIA paradigm so far applied to ecology will be extended to include the GAIA of minds and I find this particularly fascinating.

The more the Web will grow, the simpler the ranking will be

Monday, March 19th, 2012 by Antonio Manzalini

In this paper  Ranking stability and super-stable nodes in complex networks Albert-László Barabási and Gourab Ghoshal  present some interesting results deepening our understanding of the interplay between a network topology and the associated dynamical processes.

Paper elaborates about the role of the underlying network structure in the effectiveness of Pagerank a network-based diffusion algorithm. The question they’ve investigate is: could Pagerank be inherently more accurate for some networks than for others?

By focusing on stability of ranking of top nodes to perturbations, they’ve obtained a series of unexpected results: in summary, real networks with heavy-tailed degree distributions naturally lead to a set of super-stable nodes having such a high number of ‘recommendations’ that their ranking becomes independent of who recommends them.  This is like to say that across a large number of systems a small number of components (nodes) are bound to have a disproportionate role in the system. Scale-free property of the web leads to the emergence of a small number of super-stable nodes, for which a simple count of the in-degree offers the correct relative ranking.

Mathematical PageRanks for a simple network, expressed as percentages.

If so, we can argue that early success of Google, compared with its competitors, was not because of better coverage, but its Pagerank algorithm, that offered a superior user experience as a consequence of the scale-free nature of the web graph.

Then, growing the Web, it will become easier to identify the top-ranked nodes; this, instead of making searches more difficult, will allow better ranking.

Imagine the implications in those areas from science to marketing where ranking has a role.