Monday, May 12, 2008

Powerset: New Natural Language Query Processing Tool

Powerset was just launched today. As described by CNET, it "brings a new, rich semantic dimension via natural language query processing." For now, it is only applying this technology to Wikipedia, but its broader applications--and I am thinking in terms of application to biblical text--are intriguing. So what does Powerset do? Again quoting CNET, "Powerset's engine has read 2.5 million Wikipedia pages and extracted 'meaning' from the sentences, creating a navigation and semantic layer on top of the popular Web encyclopedia." To understand more clearly what this means, read the CNET article and this one at PCWorld (where it is called a "Google-killer"!) in addition to watching the introductory video on the Powerset web page. I have also created a 2'15" video showing how it works with "Septuagint" as a search term.

What Powerset is trying to do is get beyond simply using keywords to identify what is relevant in a text. Keywords can be effective if a real, live person goes through a text and applies the keywords. This is basically what Logos has done when it links a biblical text to "topics," and it can use
something that is specifically oriented in this way like the New Nave's Topical Bible. Of course, this approach is only as good as the person who applies the keywords/topics.

Is there a way to get a machine to analyze a text? The easiest approach is to check word frequency on a passage, filter out trivial words, and highlight the most often used ones. Logos is implementing this kind of approach with its "Important Words" section in its Passage Gui
de. Here, for example, are the important words in Mark 6:30-44.I am not clear on the algorithms that Logos is using to generate this list (e.g., why is αρτους less important than ιχθυων even though it occurs at least as often?), but it does provide a helpful start at seeing some of the main concepts in the passage. Powerset, if I understand this correctly, is taking this approach another step further by not only analyzing frequency but also trying to understand the meaning of the sentences. I.e., it appears to be looking for subjects, predicates, and complements in the sentences in the Wikipedia article. Note how this data is presented in its Outline, Show Factz view. Some of its returns make questionable sense, but in general, this is a quick and efficient way for speed scanning a passage and getting a better sense of its content.

In any case, this is an interesting new textual analysis tool, and I will be interested to see how it might be applied to biblical texts.

