There is another very interesting speech recognition team working at
Carnegie Mellon - developing open source software.
http://www.speech.cs.cmu.edu/
As the developers of Transana, we have followed these technologies very
closely. There are some interesting things being done with massively
parallel computing at the major phone companies to replace human
operators, but that is proprietary software. The Berkeley and CMU
software projects are the leading scholarly efforts in the US.
Mitre Corp has done lots of this sort of work for the US Government.
There is a presentation from 2002 that sums up the types of speech
recognition tools available.
http://www.mitre.org/work/tech_papers/tech_papers_02/hu_multimedia/hu_multimedia.pdf
One of the directions we took that was different than some other
multimedia analysis tools was to focus on scalability issues - such as
large projects with multiple analysts/locations with potentially very
large collections. We developed the infrastructure to store large
collections of video and find episodes using a relatively robust
metadata catalog. We also looked at integrating tools that allowed users
to search the audio track for works and phrases.The company FastTalk
(now Nexidia) has a very interesting search technology that does
phonemic analysis of most audio formats (including the audio track of
most video) and allows users to do natural language searches by breaking
down search terms into phonemes. I've seen this work and it is very
impressive. Other tricks that we have used (but currently don't ship
with Transana) are automatic pause detection and timing as well as pitch
shifts (of a certain, reconfigurable magnitude). The combination of
pause and pitch shift is a pretty good indicator of speaker or theme
change.
Chris
Alan Stockdale wrote:
> On Mon, 18 Dec 2006 12:34:29 +0200, Pentti Luoma <[log in to unmask]> wrote:
>
>> When I asked Päivi Rissanen, if the speech to text -option were only "an
>> urban myth", her answer was "no".
>>
>
> There are labs working on the problem:
> e.g. http://www.icsi.berkeley.edu/Speech/mr/
> Not really practical or affordable for most of us at the moment I think.
--
Dr. Chris Thorn
Asst. Scientist, Value-Added Research Center &
Director of Technical Services
Wisconsin Center for Education Research
1025 W. Johnson St., Room 767
Madison, WI 53706
http://facstaff.wcer.wisc.edu/cathorn
Tel: 608-263-2709 Fax: 608-265-9300
|