Google Video's Achilles' Heel
BURLINGAME, CALIF. - In May 2003, Google co-founder Sergey Brin co-authored a paper along with other Google developers on the topic of matching relevant news articles on the Web to broadcast-television news. The authors’ goal was to develop a way in which, while users were watching TV—perhaps on their PCs—they would see links to other articles or promotions relevant to the content they were watching. This is already being done manually by television networks when they promote other shows or Web sites on the lower portion of the screen.
Trying to search videos without access to the spoken text is like trying to search for a book that mentions Roman emperor Nero by looking through book reviews. That is the way most of today’s Web-search sites look for multimedia files—by scanning “metadata” associated with Web content. Metadata, which includes text that appears on a page with video files and sometimes “tags” (manually entered by users, as with Del.icio.us and Flickr, both now owned by Yahoo! (nasdaq: YHOO - news - people )), is information that’s packaged along with the videos, be it television schedules, network descriptions or keywords used on other Web sites to link to the videos.
Nearly three years after Brin’s 2003 paper, Google (nyse: GOOG - news - people ) still relies on such metadata for its fledgling video-search service. For a company that has built its strategic lead, and lofty market cap, on best-of-breed search technology, this is a major weakness in Google’s video-search offering. Indeed, there are at least three upstarts, TVEyes, BBN and Autonomy (via Blinkx) that already offer full text audio-search services.
These companies got their start in the business by performing speech-recognition searches for the government. The U.S. Departments of Defense and Homeland Security have a great interest in being able to search for words across various streams of audio and video data. In fact, almost all speech-recognition software derives from Defense Advanced Research Projects Agency (DARPA)-funded research from the 1970s. (DARPA also developed the network of computers that would eventually become the Internet.)
Because of the costliness and complexity of developing speech-recognition technology, bigger companies (Google, Yahoo!) will either have to license the technology or acquire companies who already have it. This isn’t the kind of project you can ask a few engineers to bang out in a week. However, the defense contractors haven’t waited around for Google and Yahoo! to come knocking. Instead, those contractors and enterprise vendors have begun to build their own front-end interfaces to open these searches to the consumer market.
TVEyes, for example, has been around for seven years and counts the Department of Defense as one of its biggest customers. Its consumer search business, tveyes.com, and podcast-search site podscope.com account for relatively little of the company’s revenue. But CEO David Ives expects rapid growth in this sector. Says Ives, “We’re trying to be an arms dealer” for the bigger search companies, licensing out its technology.
What you’ll find on the TVEyes site is a search that pulls results from the online versions of major network’s media content. A search for “Dubai” returned video clips from Foxnews.com and CNN.com. The results are based mainly on a phonetic transcript of the video clips, and you can play the snippet that contains your search term right on the TVEyes page. For the rest of the clip, a link will send you to the content maker.
On TVEyes’ Podscope.com, you can search about 100,000 audio and video podcasts, giving audio and video bloggers (known as vloggers—check out the Forbes Best of the Web reviews of vlogging sites) a chance to submit their sites to the search index. This lets you find out what amateur content creators have to say about a topic, and it helps podcasters get exposure.
AOL announced in September 2005 that it was developing a podcast search that will be powered by Podscope. That hasn’t launched yet but likely will this spring. For video search, AOL has acquired Singingfish.com and Truveo, video-search engines that search based on metadata—like Google and Yahoo! do—instead of audio transcriptions.
The AOL/Podscope partnership will mark the first time that a major Internet company is using full-text search for audio or video files (though for the time being AOL only plans to include audio files—no video).
BBN, a Cambridge, Mass.-based government contractor, is confident that it might snag the next deal or two with its new search offering, Podzinger. Alex Laats, president of the Delta Division of BBN and of Podzinger, said that the company is poised to outgrow Podscope because the BBN text transcripts are more relevant for searchers than the audio and video clips shown by Podscope. Podzinger offers very short text transcripts of your search term in context, so you can see whether you want to listen to the clip. Each word in the sequence is clickable and loads the audio clip to start at the point the designated word is spoken.
BBN has been doing acoustic research since the late 1940s and was a major part of much of DARPA’s research into speech recognition. The company built important Internet precursors and continues to be a team leader in DARPA’s latest speech-recognition project, which aims to translate speech on the fly for soldiers in foreign countries. (BBN landed $16.4 million in funding for the project’s first year.)
BBN’s launched Podzinger in January 2006, much to the surprise of TVEyes, which was a BBN customer at the time. TVEyes used BBN’s speech-to-text tools as one of its sources of transcripts for tveyes.com, podscope.com and the company’s enterprise offerings (it has several companies from which it aggregates transcripts and compares them). But, according to Ives, the sudden competition from one of its technology providers led him to sever the relationship.
Both Podscope and Podzinger allow you to save RSS feeds of your searches to view when your search term comes up again via a feed reader like Bloglines or Yahoo!‘s MyWeb.
Blinkx gives you even more of this kind of personalization via its “personal TV channels,” which let you specify topics that you like. The site then collects videos about those topics and strings them together as personalized programming.
When users do a Blinkx search, they shouldn’t be surprised if their computer speakers suddenly start blasting, as the site automatically plays search results. The site gets its results both from media partners and from the Web at large. Users can turn various content providers, like IFilm, ABC and the BBC, on and off, depending on what they’re looking for.
Blinkx’s technology comes from U.K.-based enterprise-software vendor Autonomy (with which the company shares a San Francisco office). In fact, co-founder and CTO Suranga Chandratillake was the U.S. CTO of Autonomy prior to founding Blinkx. In addition to Blinkx, Autonomy counts the U.S. and U.K. governments as customers of its speech-recognition technology.
Because Blinkx hasn’t yet booked any revenue, the company is working on the cheap. Chandratillake says that, in order to operate efficiently, Blinkx doesn’t transcribe every video it finds on the Web. Though it transcribes all of the speech pulled out of videos from its content partners, it only runs other videos through speech-recognition software when there isn’t enough metadata with the files for the search engine to figure out what it contains.
The company plans to make money by selling video ads that it will display along with videos from content partners. Those partners—British network ITN is the first—will provide more extensive programming to Blinkx in return for a share of the ad revenue.
While these companies are figuring out how to deliver to consumers, Google, Yahoo! and Microsoft (nasdaq: MSFT - news - people ) appear to be waiting in the wings, presumably deciding whether to build or buy superior audio-video searching technology. Cash-rich Microsoft may be the farthest along in this pursuit; it has been doing internal voice-transcription research for years (though it ultimately derives from the DARPA research like the others). A recent example of Microsoft speech-recognition software is the voice commands that Acura is building into its new navigation-system-equipped cars.
In the end, it is not just the established search engines like Google and Yahoo! or the traditional programming guides like TVGuide.com that should be concerned. As Web video and audio searching becomes more accurate and relevant, it is likely to provide a big boost to Web video-on-demand usage. Ultimately, this will mean more competition and market share erosion for network and cable television.
blinkx, the smartest thing on your computer and on the Web, is changing the way that people think about search. Only blinkx can harness the desktop, Web and TV - unifying content with a one stop search tool. Available to users on the Web at www.blinkx.com and www.blinkx.tv, as well as free to download, blinkx automatically and intelligently links to content anywhere and in any format, on the Web, on the desktop and even on TV. blinkx’s conceptual toolbar means users are no longer limited to keyword search; instead, blinkx assesses all the information that the user is actively viewing, and automatically recommends and retrieves relevant content, from local, Web and TV searches, based on context. In addition, www.blinkx.tv makes thousands of hours of TV content fully searchable and available on demand for the first time. blinkx is a privately-held company based in San Francisco and London. More information is available at www.blinkx.com.