Imagination is more important than knowledge...
Albert Einstein
Mini Callout

Intelligent search spider

siliconglen
siliconglen is offlineSend a Message to siliconglenAdd siliconglen as a FriendSend a Hat Tip to siliconglen
  • Submitted by: siliconglen
  • Created: Jul 4, 2006, 5:47 pm
  • Share on Facebook
  • Promote
 

Join Cambrian House

People

Ideas

Businesses

Connect with talented people. Collaborate on ideas. Realize your vision.
Not freeish. Not freesque. It's free!

The Idea

I have already built a prototype for this idea. Build an open source spider that has site based rules, submitted by users for interpreting websites and turning them into meaningful, semantic content. This means existing queries will return much more effective results. Intelligent Google. Google has admitted their search still has a long way to go. Try this on Google: Java Jobs in London (England) on 40K or over, no duplicates. Management jobs within an hour by car or public transport in central Scotland A second hand Toyota with <50K miles to buy for under <$10K A restaurant within 10 miles that is open at 5pm. This software will allow search engines to make much more sense of pages. PageRank measures the relevancy of a page on the internet. This measures the relevancy of a page in terms of its semantic content. Many of the above searches take place on custom sites. This generic search replaces those sites and assimilates their profits.

I thought of this idea when I was...

Part1: http://www.silicongl...ave-on-internet.html Part2: http://www.silicongl...r-enlightenment.html Part3: http://www.silicongl...ies-looking-for.html I need an organisation with the resources of Cambrian House to make this work. This needn't be a huge venture, the search engine can be decentralised and run on people's PCs for just the sites they want to look at. This is a tiny amount of work in contrast to the huge bottlenect that most search engines set up to be. My background: Author of UK's first guide to getting on the Internet First in UK's first degree in Object Technology (masters, with distinction) Chartered IT Professional (British Computer Society) Chartered Engineer (UK Engineering Council) 25 years software development and management expertise. Company Director. Partway through MBA. More info http://www.siliconglen.com/news/


Comments Posted

PCMAN
PCMAN Posted: July 17, 2006, 11:06 pm

Basically the next google! Would it replace or complement existing web search engines? Would it need a new catch phrase that is actually the website name as in "to google" or "to yahoo"?

On the other hand I don't know if most people query on such a specific phrase. Most enter the generic words and manually serach the first few pages of the 5,904,201 hits. Generally they find what they want or reword the search based on what they found in the previous search.

There has been a good body of work done on semantic phrase parsers already. The problem is with the inconsistent way humans phrase their queries.

siliconglen
siliconglen Posted: July 18, 2006, 6:10 pm

Thanks for the praise. As for the verb, I hadn't thought of that but I do own prosume.com which could be interpreted as proactively consuming the web. Prosume is already a marketing verb (professional consumer) Not many two syllable or less .com names left I guess.

jahting
jahting Posted: July 23, 2006, 6:45 am

In fact it looks like a specific application of my proposal "Webpage Postprocessing/Mashup Infrastructure".

Merman
Merman Posted: August 4, 2006, 1:30 pm

I like the idea, although I don't understand the intricacies of how search engines are created.

Sounds like a "better mousetrap" than the generalized returns you get on Google.

ElJordo
ElJordo Posted: August 15, 2006, 10:53 am

I love it but I agree with PCMAN, 10 different people have 10 different ways of phrasing the same question... still if everyone gave up that easily nothing would get done. I would love to get in on this one!!

Lal
Lal Posted: August 23, 2006, 5:38 pm

There are many Web 2.0 search sites that are using social networking (some with Firefox extns) that allow "ranking" and recommendations (from sites 2 reviews etc...)
Also other search sites that are DIY via ROLLYO (or something like that) which is an Amazon/Alexis offering 2 create u're own search engine etc..

IF u're NOT talking about search engines as above and MORE talking about semantic meaning (And indexing etc.) from various web sites then that sounds a better option. However thats a harder nut 2 crack as turning raw text into semantic MEANING ain't easy

Tordek
Tordek Posted: October 4, 2006, 9:09 pm

Sites should be designed to be understandable, both to humans and to machines. That is the semantic web. It is the designer's fault if their site is so wrong it can't be understood.

bazdesign
bazdesign Posted: August 1, 2007, 6:18 pm

You would have to go through and search a lot of sites and therefore use up a lot of bandwidth. Big companies can afford the cost but for my PC and the little straw it uses to access the internet I'm thinking the spider won't go very far before maxing my ISP's limits and it would probably take forever. You mention decentralizing it on other people's PC but it would just overwhelm there bandwidth too I think.

I think the search ability you mention is sometime we might see in the future but I don't expect my PC to do the work instead I would think it would be in the realm of the Googleplex.

superavit
superavit Posted: August 1, 2007, 8:39 pm

I beleive Google has the capabilities of semantic interpretation of sorts. Take a look at the adds the post to the side of your Gmail account.

However, the building of a "open" and portable solution could be marketable or easily packaged for a big contendor to purchase.

Additionally, some flavor of the solution could be applied to the interpretation of sections of large corporate datawarehouses. Specifically on non standarized data fields (e.g. BLOBs).

JoeMerchant
JoeMerchant Posted: August 2, 2007, 7:08 pm

I'm assuming you have a "freshness measure" in your search... There's a lot of "Real Estate For Sale" in Florida on the Web at literally unbelievable prices, because the web pages are 5 years old.....

Kevin_Cox
Kevin_Cox Posted: August 3, 2007, 2:27 am

"submitted by users" That will cause problems.

X_Tergwin_X
X_Tergwin_X Posted: August 3, 2007, 6:54 am
DELETED
MattR
MattR Posted: August 3, 2007, 7:33 am

like the idea, however, competition is hot Google and Yahoo are already pouring millions of £'s into this type of technology. Would be a matter of getting there first (with much less resources) and some very clever marketing.

micco
micco Posted: August 3, 2007, 8:00 am

This is a great idea, definitely the next wave in internet search, and even if you can demonstrate a useful generic semantic parsing of a limited number of sites, your technology could easily get bought up by the big players.

Do you have the semantic processing in place? If not, I'd recommend you start with something small so you can prototype more easily and have a more constrained set to demo on. You could pick something like Craigslist since their posts are a reasonably consistent format but still have the breadth to give you useful content and are free-form enough to demonstrate the accuracy of your technique. If you start with a well-defined piece of content like that, you can focus development on the semantic parsing rather than the spidering.

joyce
joyce Posted: August 4, 2007, 9:00 am

wow! trying to improvise google! hmmmm......

Big_Sister_K
Big_Sister_K Posted: August 4, 2007, 1:08 pm

ostentatious!

olani_x
olani_x Posted: August 4, 2007, 1:58 pm

It's a good iniciative. But don't idealize open source as the road to make improve any software. Open source software is nearly perfect in small applications such as Instant Messengers, small "gadgets" and utilities, but it's far from efficiency in complex and big applications. You can check that with open source text, calculus sheet, diaporama, video or audio editors.

I will tell you why this happens. For small applications, the time spent to fix bugs or add new features is small enough to feed the enthousiasm, but in big applications is a waste of time, effort and money for the coder, so he gives up.

Coming back to search engines, many startups emerged trying to "improve" what has been done till now. I advice you to check the interesting alternative proposition of (I don't remember the name) one who created the result entries manually (using human ressources) to provide high quality results.

Advice: apply your concept to a different field. Why don't you create a spider for files search on the internet (docs, pdfs, zips, mp3..) Going that way you may find the road to success.

MrY
MrY Posted: August 4, 2007, 9:29 pm

It sounds very good. Would you be able to have this complement the general search engine so that all people can do this type of search, or do you need to segment your area within the internet where it can do its specific search to find the most relevant information from a set number of sites. I guess that i am not 100% clear on it yet, especially since you mention that you can run it on your own computer, which means that it only does a limited pre-ranking of sites? Would that mean that this would initially be a search application for these new vertical search engines that are being developed. I am just wondering how it would be deployed.

Dutch_Vincent
Dutch_Vincent Posted: August 5, 2007, 3:51 am

Great idea, but I think it will cover much more than just online search. You would need software to be able to dissect a language (yes, different algorithms for different languages) and I think this is a very tough road to go. If it works however, it might turn into the next Google.

PhilipH
PhilipH Posted: August 5, 2007, 6:39 am

It's a nice idea, but breaking into the existing search engine market will be VERY tough. You'd have to come up with something a little less obvious and hence that the big boys aren't already working on, I think. Even then, they have the experience and expertise to become very competitive very quickly...

Literate_saint
Literate_saint Posted: August 5, 2007, 11:37 am

tough one indeed!

ccozad
ccozad Posted: August 7, 2007, 3:43 pm

I wouldn't fight Google on their bread and butter. You would probably make more money getting a job at Google then trying to oust them from their position.

Of course, if you can come up with the algorithms, you stand a fighting chance. Create the algorithms and then consider pursuing the idea.

bentobox
bentobox Posted: August 7, 2007, 10:34 pm

You might want to check out Apache Lucene (http://lucene.apache.org/java/docs/) and Grub (http://www.grub.org/). Grub is being released as Open Source by Wikia.

siliconglen
siliconglen Posted: August 9, 2007, 3:01 am

I was thinking more along the lines of this could be interesting enough to Google (or a competitor) that they would want to buy it (like mylivesearch) and also that it's potentially disruptive to eBay if people can list structured data for free, the usefulness of having everything hosted by ebay is certainly less.

Summertime
Summertime Posted: April 30, 2008, 1:43 am

Semantic enhanced search sounds really good to me. You did say that it would enhance Google and the others. I think you could stay under their radar, unless and until it becomes a success. Excellent point to cut into Ebay's action; that could make it a big winner for a Google purchase. Many long-time Ebay users are looking to reinvent it without the extortionate fees. I don't like the idea of dedicating my small computing resources to searching. That almost escaped me at first. Are we misinterpreting how the spider works?

DennisJ
DennisJ Posted: April 30, 2008, 2:07 pm

Good luck on this. Anything that can weaken eBay's hold on the market has to be a good idea.

landsky
landsky Posted: April 30, 2008, 5:51 pm

I'm not competent to comment. As a tech outsider, I only can say that this feels like a good one. Can you simplify the search portion by controlling the way the input language is constructed? There, that's my entire body of knowledge!

daraddishman
daraddishman Posted: May 1, 2008, 12:46 am

I notice this is a fairly old idea that has been around a few times. What is the current status of the idea? Where is it at, still looking for resources? And what kind of resources are you looking for?

Kevin_Cox
Kevin_Cox Posted: May 3, 2008, 5:16 am

Is the prototype available for review? If so, I would be interested in a link.

Major search company's probably won't be able to use much of your spider to there advantage. Due to the massive scale of things. It would be like telling Google to use bubble sort to rank results. (For all that don't speak geek: Bubble Sort = Slow, were talking snail speed)

But, I think if you focused on in-house, on-site search and marketing company solutions. You would have a greater chance for success.

There is a few open source spiders around.
One example:
http://www.openwebspider.org/

BizFunder
BizFunder Posted: May 3, 2008, 12:54 pm

Well, there is a lot of guessing taking place, but you need to launch your prototype to convince more people. In any case, it wonÂ't hurt to roll out an alternative search engine that has the merits to compete.

Kevin_Cox
Kevin_Cox Posted: May 3, 2008, 7:43 pm

The real key would be using it for applications other then searching the entire web. Since that market is more then full.

vanhees
vanhees Posted: May 4, 2008, 3:24 pm

this is an old idea

Kevin_Cox
Kevin_Cox Posted: May 6, 2008, 12:03 am

"I have already built a prototype for this idea."
So, where is that prototype? I have asked before and I don't see it.

folamour
folamour Posted: May 6, 2008, 1:37 pm

web 3.0! we need a very 'intelligent' program, or a way for users to mark the pages they visit with tags. A plugin in the browser could help do that.

Brenden
Brenden Posted: May 6, 2008, 2:48 pm

old

Maxman
Maxman Posted: May 8, 2008, 12:45 am

Lost me for sure, not hard to do. Any thing that would clear up my screen would be a plus. simplfy yes!

Bazekok
Bazekok Posted: May 10, 2008, 11:52 pm

I agree that current search tools have problems but, in spite of many ideas and working prototypes, today search's GUI used by the majority of people is still that one textbox.

I had sometimes ago an idea like yours but then I thought it would eventually get us a new Wikipedia semantic thing,
cheers

siliconglen
siliconglen Posted: May 27, 2008, 9:09 am

It may be an old idea, but it's still relevant and getting good reviews. So, what I need is funding and development to take it forward. Also those of you wanting to see the prototype, is isn't available online at the moment.

So what happens next to this idea, does it just sit here being ignored?

Sorry it's taken me a while to respond, it would be great if I was automatically notified when comments were posted.

siliconglen
siliconglen Posted: May 27, 2008, 10:57 am

Have fixed the auto notification so I will be a bit prompter at replying in future!

 

Post A Comment

Got something to say?
Log in to post a comment.

 
Tea Time at Cambrian House, Home of Crowdsourcing
Ideas Submitted
7172