Surfing TV on the Internet
Today, Blinkx, a San Francisco-based video-search engine, announced a new search tool that it hopes will allow people to more easily find entire episodes of television shows such as Lost, 24, and Desperate Housewives. The tool, called Blinkx Remote, offers a quick, concise way to find TV shows—from content providers around the Web—and related information instead of having to wade through video-search results that include partial clips of shows, commentaries, and random collections of episodes in no particular order. Blinkx Remote appears at the top of Blinkx search results when a person searches for the title of a show, and it lets people pin down the exact season and episode that they want to find. In addition, the tool offers links to information about the shows from online sources such as Wikipedia and IMDB.com, as well as links to sites, including Amazon and iTunes, where users can purchase DVDs or high-quality downloads of the show. Essentially, Blinkx Remote is an attempt to create a one-stop shop for all online TV surfing.
Within the past six months, says Suranga Chandratillake, cofounder and CTO of Blinkx, more and more full-length television shows have arrived on the Web. And within the past three months, he says, people have started to change the way they search for online video. Instead of just looking for highlights from The Daily Show, for instance, people are looking for ways to watch the whole show. “There’s been a massive increase in TV content online,” says Chandratillake, “and users have caught up with that reality.”
Searching for a particular video online is tricky, and being able to identify shows by season and episode, as well as folding in other relevant information as Blinkx Remote does, is even more of a challenge, says Chandratillake. Typically, online video search relies on a handful of approaches, such as looking through text metadata associated with a video, which includes file names and extra bits of identifying information; looking at text labels, called tags, which users assign to clips; and looking at the text around the video on the site where it is displayed. But these approaches have drawbacks, says Chandratillake. For instance, metadata often doesn’t have enough information to identify a video, and the weakness of user tags, he says, is that anyone can label a video with “Britney Spears,” whether or not it has anything to do with the pop star. And increasingly, Chandratillake adds, sources of long-form shows such as ABC.com have very little information on the site around the video. “Many completely use Flash,” he says, “which makes for a cool-looking interface, but [it] makes it very hard for search engines.”
Chandratillake believes that his company has figured out a better way to search for videos across the Web and for TV shows in particular. The Blinkx search engine uses speech-recognition technology in addition to standard metadata and surrounding text searches. For each video that the Blinkx engine encounters, it extracts audio information—strings of phonemes—that it uses to create a searchable index of words. The recognition system assembles these phonemes into words by taking into account which words typically appear in which contexts; “sail” might appear with “boat,” for instance. Also, the system uses all other information, from metadata to surrounding text, that provides clues as to how the phonemes fit together. (See “Millions of Videos, and Now a Way to Search Inside Them.”)
Blinkx Remote adds a few new tricks to the company’s standard speech-recognition system. Chandratillake explains that Blinkx has developed software that can automatically match a searched television show to other types of information from resources beyond the one that supplied the video. This requires being able to identify and assemble disparate pieces of information from around the Web—video, text, and links from numerous different sources, such as ABC.com and IMDB.com—to automatically create a concise result for a single show. In this way, he says, “it’s sort of like the Semantic Web approach,” in which information from a number of different sources is combined to produce a high-level concept. In the case of Blinkx Remote, that high-level concept is a rich, multimedia set of data about a given television episode.
The new tool “should be helpful” for finding television shows, says Horacio Franco, chief scientist in the speech-technology and research laboratory at SRI International, a research company based in Menlo Park, CA. Franco is working on systems that can recognize speech in video with high accuracy by matching audio to large vocabulary databases. Recognizing speech in video is a tough problem, though, because often there is background noise, or multiple people are talking, he says. Franco suspects that eventually, the most accurate video-search engines will also include types of optical character-recognition software that can read words that appear in videos, such as names on storefronts, episode credits, and news tickers.
Right now, Chandratillake says, Blinkx has found about 300 online shows that can be accessed using Blinkx Remote. In total, Blinkx has indexed more than seven million hours of video and audio content. And while the debate over copyrighted material rages around YouTube, Blinkx avoids these issues because users don’t upload videos. Instead, Blinkx indexes videos that are hosted by other sources (including YouTube). It has partnerships with more than 100 content providers, indexing video from sources ranging from A&E Television Networks to Rollingstone.com.