The next frontier is to tap the quiet genius that exists outside organizations to attract innovations from people who are prepared to work with a company, even if they don't work for it.
New York Times, Mar 2006

Voice activated notepad for cell phones

halixand
halixand is offlineSend a Message to halixandAdd halixand as a FriendSend a Hat Tip to halixand
  • Submitted by: halixand
  • Created: Dec 7, 2007, 1:36 am
  • Share on Facebook
  • Promote
 

Join Cambrian House

People

Ideas

Businesses

Connect with talented people. Collaborate on ideas. Realize your vision.
It's free! Like love in the sixties!

The Elevator Pitch

For them, who have been waiting for technology to make their lives easier the product is a speech-to-text engine for cellular phones that recognizes names, addresses, appointments, and other important information during a phone call and saves it in the appropriate place without input from the user. Unlike traditional cell phone interfaces our product could be the answer to the decades old question "How does technology make my life easier?".

The Idea

It's actually a little bit more complex than in the title, let me give you an example. You are on the phone, maybe driving, and you are talking to a very important person who is about to give you a phone number that can change your life, but you cannot stop, pull over, or jot down that number anywhere because you don't have a pen. Ask yourself if that has happened to you before? The solution to that dilemma comes in the form of a software that listens an active call on your cell phone and recognizes information that needs to be saved, such as phone numbers, addresses, emails, names, time and dates, etc. and then proceeds to create a new contact, show a map, or create an appointment or a reminder with whatever information was exchanged during the phone call.
I'm sure that there will be a demand for this type of technology by the cell phone makers. I'm also sure that someone has already thought of something similar, in fact, take a look at this : http://domino.watson.ibm.com/comm/research.nsf/pages/r.uit.innovation.html

How would that technology be created? That's where I need help. I would think that the speech-to-text technologies out there are pretty advanced for this, but I'm not sure. Dragon NaturallySpeaking is an amazing software, try it if you haven't.

So that's why I'm here. Let me know what you think and if you could help. I highly encourage constructive criticism.

I thought of this idea when I was...

...going through everyday life.


Comments Posted

ccozad
ccozad Posted: December 7, 2007, 1:48 am

A simpler solution might be to just have a button (perhaps the one on the side, like the one used to initiate voice dials) that can start or stop a recording of the conversation. That would be a lot easier to implement.

Granted the true voice command and recognition would be cool... but I would rather see a solution sooner rather than later.

Rich2809
Rich2809 Posted: December 7, 2007, 5:00 am

It is a cool idea. I have to agree with ccozad that the simple solution in the button. Lots of phones have this already. My Blackberry can do this. Not easy without looking at the phone though

micco
micco Posted: December 7, 2007, 7:59 am

Speech to text is pretty advanced, but it will have some problems with the line quality on many cell phone calls. In addition, you're adding another complexity in that you want not just speech to text but linguistic processing to identify important information. That's a hard problem, and likely to make mistakes at the wrong time. It's worth working on, but it would be difficult to get to a useful first version.

halixand
halixand Posted: December 7, 2007, 10:56 am

Well ccozad, a solution could be a simple button on the side that initiates a voice recording, BUT there's no way you can make money off of this, because it will be up to the manufacturer to implement it. You're proposing a hardware solution to the problem, which I don't think is a solution at all.
The way it could work, is for the software to look for predefined keywords from you, once you've trained the software to recognize your voice. You train it to recognize when you say "Let me write this down" and it automatically saves that information. I think by using keywords and training the software like that it doesn't need to be very complex at first. The biggest problem like micco said will be for the software to recognize what information is relevant, and by using predefined keywords, it will only have to match them against the user's voice.

ccozad
ccozad Posted: December 7, 2007, 11:08 am

Nope, not a hardware solution. I was envisioning a software utility that you purchase or download that uses the existing button.

ccozad
ccozad Posted: December 7, 2007, 11:17 am

Basic idea is this:

- User purchases software
- That software runs in the back ground all the time.
- When you are in a call, you press some predefined button (didn't mean to confuse by using the term button... we are not talking about a new button here...)
- The software starts recording the conversation to an audio file in the phone's SD card (or perhaps the RAM, since SD write may be too slow for real time audio recording)
- The recording continues until the memory is full or you press the button again.
- You email or offload the audio file and listen to it later.

Point is humans are built for audio processing... I am thinking initial investment here... as in the difference between $5,000 and $500,000 for upfront development.

Perhaps I am over estimating the cost of the "correct" solution, but I am sure it won't be very cheap.

Rich2809
Rich2809 Posted: December 7, 2007, 11:28 am

Why not just record the entire call?

halixand
halixand Posted: December 7, 2007, 8:27 pm

Sure, recording the entire call seems like an easier solution, but how does that make people's lives easier? Then they have to listen to it a second time and write down that information.
ccozad, one question I have is, if the software is running in the background, and listening for keywords spoken by user, and then writing them to a file, how much processing memory does a phone need to have to be able to do this? What I'm asking is this type of software possible with current smart phones?
Apple will be releasing an SDK for third party apps sometime in february, so I anybody here is up for it we could definitely do it before that... or am I too ambitions?

Summertime
Summertime Posted: December 9, 2007, 2:18 am

You might be able to use voice commands: "record" and "stop-record".
Though, it would be better not to interrupt the conversation. With only two commands, I wonder if you could train up the linguistic processing fairly easily. ??

Mathias
Mathias Posted: December 9, 2007, 4:32 pm

You should check out GrandCentral.com - it provides a larger feature set, but one of the things that it allows you to do is "Keep notes of information like directions or instructions without needing to write; perfect if you are driving or don't have a pen." Link is here: http://grandcentral....itworks/call_record.

halixand
halixand Posted: December 9, 2007, 11:20 pm

Thanks Mathias, I wasn't aware of this, but yet again, I think most smart phones these days can record calls. Like they say on this page, recording a conversation might be against the law.

Summertime, when you 'train' the software, what will actually be happening is the software will record say a .wav file for each keyword you want it to recognize and then when it's listening to a call all it will have to do is match what you're saying to the different .wav files and that's how you can issue commands. When it hears you say "Let me write this down" that will be its cue to start the speech-to-text engine and save all the information.

Kevin_Cox
Kevin_Cox Posted: December 10, 2007, 3:45 am

You could simply record all calls. I think my smart phone has some interesting voice programs on it. Not sure if it can do this though.

micco
micco Posted: December 10, 2007, 8:05 am

Recording all calls would quickly fill up memory and make me do a lot more file maintenance than I want to do. Maybe it could have a save/delete option after every call to help somewhat, but if I'm on a two hour conference call and really only want to save the one phone number one of the people said, then I'd much rather have finer-grained control than all or none.

I already have a voice recorder on my phone and I can activate it during calls. I don't think it records both side, only my mic input, but that would be a minor issue to overcome if you create your own. My phone allows me to set up a couple of hot-keys to launch apps, so I can trigger the voice recorder when needed.

I think the app described here would be very useful and marketable. It's an incremental improvement on apps that already exist, so it's not a "killer app", but could be profitable with the right marketing.

Summertime
Summertime Posted: December 11, 2007, 9:59 pm

Your explanation of training with wav files is what I was thinking of. A "let me write this down" cue is discreet, as is "thank you, I have it down". I am overlooking the legalities of this misdirection for the moment.

PhilipH
PhilipH Posted: December 13, 2007, 5:22 pm

A major difference between this and other voice-recognition applications is that here I'm not asking the software to understand MY voice but everyone else's. The former can be accomplished relatively (!) straightforwardly through training but the latter is much more difficult, as it's vital that the software can cope with any mix of accent, nationality, gender and talking speed, while recognising the information you're only going to hear once with a 100% success rate... it's a tall order!

micco
micco Posted: December 14, 2007, 8:18 am

PhilipH, on the flip side of that technical difficulty, you're trying to build a system that will recognize any voice on just a few words (a start and stop trigger). That's a whole lot easier than trying to do the same thing for all arbitrary input. Not trivial, but I think Summertime's idea of monitoring for an on and off cue it probably feasible. You could make it robust enough to recognize a wide variety of voices and still provide a training option to let the user improve its accuracy on their own voice.

ccozad
ccozad Posted: December 14, 2007, 12:13 pm

"one question I have is, if the software is running in the background, and listening for keywords spoken by user, and then writing them to a file,"

In my suggestion I was not including the audio processing part. You trigger the recording with a simple button press. Waiting for a button press is immensely easier than doing constant voice processing

---------------------

And PhillipH brings up a wonderful point. You are asking the phone to interpret speech of someone who has not trained with the software. This will probably be a processing nightmare... I have used Dragon Naturally Speaking and training was a painful process.... and I am a native speaker, speak pretty loudly and I am a pronunciation nut (think Ross on Friends...)... so if it had trouble with my voice, how is it going to perform with the other variables like accents, low talkers and background noise?

THOUGH... that does get me thinking. What if you train the software for voice recognition on you own phone and when you call someone your voice finger print is sent with your call (i.e. instructions on how to process your voice) granted it would be a lot of data to send, it certainly would be cool.

------------------------------------

"Sure, recording the entire call seems like an easier solution, but how does that make people's lives easier? Then they have to listen to it a second time and write down that information."

Just for the record I was suggesting recording a segment of the call, just the important part you designate by pressing the record/stop button.

Yes, they will have to listen to it a second or third time. But to me, a person spending 30 extra seconds to transcribe a recording is better than thousands or hundreds of thousands of how developing and complex voice processing algorithms for a mobile footprint.

I love the Star Trek part of the dream, but to get to that dream, I believe you have to start some where more attainable and work your way up.

ccozad
ccozad Posted: December 14, 2007, 12:17 pm

*hours developing and validating complex voice.....

Yes Gord, I still want my edit button... accountability... shmountability... I thumb my nose at your accountability.

I make typos and want to fix them!... though I understand that there are more "important" things to work on... :-)

Summertime
Summertime Posted: December 14, 2007, 4:20 pm

I was suggesting "record" and "stop" cues trained on one voice. We don't need people at the other end instructing our phones. The physical button is fine but problematic when you are occupied and in "hands free" mode.

CharonV
CharonV Posted: December 15, 2007, 12:18 pm

Quote:
"
It's actually a little bit more complex than in the title, let me give you an example. You are on the phone, maybe driving, and you are talking to a very important person who is about to give you a phone number that can change your life, but you cannot stop, pull over, or jot down that number anywhere because you don't have a pen."

Most phones now have a caller identifying service ?

Is it really realistic to expect that someone who is about to change your life will not accept a return call from you ? will not text you with "The important phone number" ?

Interrupting the call, by issuing instructions such as "record now" to a machine, could really unbalance such a character ?

However since everything is possible, why not prepare for such a moment by.

a) Buying a digital voice activated recording device. They work very well and last for 30 to 40 hours recording time. They are not much bigger that a pen and can be carried in a pocket, or left on the car seat.

b) Buying a phone with a built in recorder ?

I believe in the motto K.I.S.S.

good luck

Myla_HB
Myla_HB Posted: December 17, 2007, 12:19 am

"You are on the phone, maybe driving, and you are talking to a very important person who is about to give you a phone number that can change your life, but you cannot stop, pull over, or jot down that number anywhere because you don't have a pen. ...The solution to that dilemma comes in the form of a software that listens an active call on your cell phone and recognizes information that needs to be saved, such as phone numbers, addresses, emails, names, time and dates, etc. and then proceeds to create a new contact, show a map, or create an appointment or a reminder with whatever information was exchanged during the phone call. "

I thought my Sony Ericsson (a 3G type) do some of this things already...while on active call-speaker on- you can browse your menu and do some stuff, you can even turn on the record mode. You can also retrieve back the phone no.that made the call.

Time to switch to new brand dude. =)

Brenden
Brenden Posted: December 18, 2007, 8:36 am

This is a interesting idea, I worry though about someone hacking my phone and listening into my phone calls (marketers). That is a worry that is way down the road, and not something to worry about yet.

If it came on my phone (from the manufacture) I would use it. I do not think I would buy it as a stand along program.

Good luck 4 stars

ooper
ooper Posted: December 18, 2007, 8:31 pm

So, every heard of reqall?

http://www.reqall.com--yes, mispelled :)

You call a number, say a command (todo, note, etc), record the whole message, including when and how you want to receive it (email, text message, etc). Browse it later in the browser if you wish, organize it, and so forth. Special layouts for phones, including iPhone.

Software AND a human (YES, a human) in India annotates a message. Just don't tell him/her your most inner secrets ;) --though I don't think they'll care.

It's *quite* accurate, and they'll soon have other languages.

Right now it's free, but I *think* it'll be about $10

Pretty hard to compete with that, in my opinion.

GordonMcDowell
GordonMcDowell Posted: December 19, 2007, 8:58 am

halixand, I'd like to see this kind of feature on a phone, but aside from the new possibilities Android (Google) phones might offer in the future, applications typically can't be written that execute while calls are being taken.

 

Post A Comment

Got something to say?
Log in to post a comment.

 
Ideas Submitted
7259