subbu.org

Yahoo’s Term Extraction + Google Search API

without comments

I just finished a major enhancement to my blog. The idea is show links
from the web related to the current entry. To see how this works, just
click on the "Similar Entries from Google" link above. This feature uses
Yahoo’s

Term Extraction
service to extract keywords from my Movable Type blog
entries, and feed these keywords to Google’s
Search API to
fetch related entries. The feature is still experimental, and I need to
make a few more refinements.

The first step is to extract some keywords for each Movable Type blog
entry. I wanted this step to be automated. Yahoo has a term extraction API
available as a REST service now, and Nick
Gerakines just wrote a
MT-KeywordExtractor
plugin, which automatically generates keywords as you save an entry. I had
to make a few tweaks to this plugin for two reasons. Firstly, this plugin
uses HTTP method GET to get keywords for the given content, and the blog
entry content itself is encoded in the request URL. This does not work for
large posts due to URL length restrictions. So, I modified this plugin to
use HTTP POST to submit the entry content. The second issue is the
keywords returned from the service. In most cases I tested, this service
returned too many keywords with no apparent ranking. Since I was going to
feed these keywords into Google’s search API, I just picked the first few
tags returned. The results are mixed. The search results did not make
sense always.

The second step is to feed these keywords to
the Google search API. This turned out to be a trivial step.

Update (04/01/2006): This feature is no longer experimental. I updated the scripts to feed as many keywords as Yahoo returned to Google.

Update (06/16/2006): After using this for almost three months, I decided to disable the MT-KeywordExtractor plugin in my blog. The terms extracted by Yahoo’s term extraction service are too vague and broad.

Written on March 26th, 2006 at 9:00 pm

Tagged with ,

RSS feed

Comments »

No comments yet.

Name (required)
E-mail (required - never shown publicly)
URI
Subscribe to comments via email
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.

Trackback responses to this post