Smeggy's Forums

Forums where you CAN vent!

Skip to content

Trawling the "DEEP" Web

Got any cool links? Or even links to avoid? Let us know.

Top Forum Index Page New Posts

Posted on

      

I debated where to put this post and decided that Cool Websites was the nearest match for this post.

I've been thinking about the fact that when any of us do a web search, and for most of us I'm guessing that's usually through google, we only ever browse through maybe the top 50 or so pages at a push. And sometimes that can be the top 50 results out of many millions!

Well I was wondering, does anyone know how to get "deeper"? You know, get past the top 50, 100, or even 10,000 pages?

I mean, there must be tonnes of great information out there, that never reaches the top of the search browser, and I want to be able to view some of it!

So... Short of browsing 10 pages at a time, which would be a very laborious way of reaching say for example page 200,000... Any other suggestions on how to do this?

Image
"TONGUE-tied and twisted, just an earthbound misfit, I"

User avatar
Haunting Beauty
Posts: 4600
Joined: Fri Feb 20, 2009 4:20 pm
Location: Scotland
Current Mood: Spine chillingly spooky

Top Forum Index Page New Posts

Posted on

      

Dunno, its' difficult one. Maybe try refining the search by adding more terms or avoiding terms by putting a minus sign in front of them.

For example when searching for images for the Let's Count Thread often it comes up with a billion pictures of a nokia phone. So i'll refine the search from ,say,

6303
to
6303 -nokia

-----|0| None are more hopelessly enslaved than those who falsely believe they are free. |0|-----

"Capitalism profits from War - Humanity profits from Peace."

User avatar
Aliens Ate My Chicken!
Posts: 118653
Joined: Sat Dec 15, 2007 8:32 am
Location: Smegland
How Hot Are You?: The Big Bang!!
Current Mood: Won Tons Mons

Top Forum Index Page New Posts

Posted on

      

On occasions, I've had to go deep into Google when looking for something. What I have noticed is the deeper you go, the more obscure it becomes. On some occasions I've finished up with one word in English, surrounded by Japanese/Korean/Chinese characters (delete as appropriate).
The only way I could find to do it was to jump to the end of each page block, and keep working from there, but to be honest it wasn't very helpful, so I came to the conclusion that what was left probably wasn't really worth looking at.
It reminds me of the British Library. All those books, and who knows what information might be contained in them, so I understand your curiousity.

War is God's way of teaching Americans geography. Ambrose Bierce (1842-1913)
-----------------------------------------------------------------------------
They used to say if an infinite number of chimps typed we would get the works of Shakespeare. The internet has proved that this is not the case...

User avatar
Grand Master
Posts: 6578
Joined: Fri May 30, 2008 7:45 pm

Top Forum Index Page New Posts

Posted on

      

ghostgirl wrote:So... Short of browsing 10 pages at a time, which would be a very laborious way of reaching say for example page 200,000... Any other suggestions on how to do this?



Search engine optimisers (SEO's) - not always real people either - will stop most from delving into the past , or giving access to sites that may be unpopular, or not be quite up to date. And of course your ISP will have its own link limits set by their own staff and preferences (which you cannot disable)

I take it you have de-restricted your searches in any engine you are using

Have you tried the Wayback machine (good for any site/pages over six years old ?)

The best facial is a daily facial.

User avatar
Insane Poster
Posts: 23634
Joined: Sat Apr 18, 2009 6:40 pm
How Hot Are You?: Siberia
Current Mood: unknown

Top Forum Index Page New Posts

Posted on

      

Simple answer; don't use google, use dogpile, or metacrawler

Or you can spy on what everyone else is searching the web for here. Uncheck 'omit adult terms' for some really interesting results!


User avatar
Loves Smeggy's
Posts: 1418
Joined: Thu Apr 23, 2009 5:03 am
Location: Edinburgh
How Hot Are You?: The Big Bang!!
Current Mood: Electric!

Top Forum Index Page New Posts

Posted on

      

Channel Hopper wrote:
ghostgirl wrote:So... Short of browsing 10 pages at a time, which would be a very laborious way of reaching say for example page 200,000... Any other suggestions on how to do this?



Search engine optimisers (SEO's) - not always real people either - will stop most from delving into the past , or giving access to sites that may be unpopular, or not be quite up to date. And of course your ISP will have its own link limits set by their own staff and preferences (which you cannot disable)

I take it you have de-restricted your searches in any engine you are using

Have you tried the Wayback machine (good for any site/pages over six years old ?)


:thumb: Interesting, thanks CH, Waybank sounds useful for at least some of the trawling I want to do. I also limit my searches to particular File Types sometimes, which can be helpful as well, and/or attempt to predict precise File Names, such as "mohamedatta filetype:swf" or "chathamhouse filetype:ppt" etc, or else search directly fof "index-femacamps.doc" for example. Such examples often produce interesting results.

I'd appreciate any other similar methods you can think of. :)

Image
"TONGUE-tied and twisted, just an earthbound misfit, I"

User avatar
Haunting Beauty
Posts: 4600
Joined: Fri Feb 20, 2009 4:20 pm
Location: Scotland
Current Mood: Spine chillingly spooky

Top Forum Index Page New Posts

Posted on

      

ghostgirl wrote:
Channel Hopper wrote:
ghostgirl wrote:So... Short of browsing 10 pages at a time, which would be a very laborious way of reaching say for example page 200,000... Any other suggestions on how to do this?



Search engine optimisers (SEO's) - not always real people either - will stop most from delving into the past , or giving access to sites that may be unpopular, or not be quite up to date. And of course your ISP will have its own link limits set by their own staff and preferences (which you cannot disable)

I take it you have de-restricted your searches in any engine you are using

Have you tried the Wayback machine (good for any site/pages over six years old ?)


:thumb: Interesting, thanks CH, Waybank sounds useful for at least some of the trawling I want to do. I also limit my searches to particular File Types sometimes, which can be helpful as well, and/or attempt to predict precise File Names, such as "mohamedatta filetype:swf" or "chathamhouse filetype:ppt" etc, or else search directly fof "index-femacamps.doc" for example. Such examples often produce interesting results.

I'd appreciate any other similar methods you can think of. :)


I didn't know you could specify a filetype. Is there a help page showing a complete list of search parameters?

-----|0| None are more hopelessly enslaved than those who falsely believe they are free. |0|-----

"Capitalism profits from War - Humanity profits from Peace."

User avatar
Aliens Ate My Chicken!
Posts: 118653
Joined: Sat Dec 15, 2007 8:32 am
Location: Smegland
How Hot Are You?: The Big Bang!!
Current Mood: Won Tons Mons

Top Forum Index Page New Posts

Posted on

      

Yes you can search by File Type. Using chathamhouse as the example the format for searching by File Type is: chathamhouse filetype:ppt
Click this link to see the results, as you can see only ppt (powerpoint presentations) are listed.
http://www.google.co.uk/search?hl=en&as ... afe=images


I don't know where there's a complet list of parameters, but I can give you the ones that I know. :)



To search for web pages that have similar content to a given site, type "related:" followed by the website address into the Google search box. For example: related:www.bbc.co.uk


You can also get Google to ‘fill in the blank’ by adding an asterisk (*) at the part of the sentence or question that you want finished into the Google search box. For example: Nichola Tesla discovered *. Or try the query [ Obama voted * on the * bill ] to give you stories about different votes on different bills.


If you want to search not only for your search term but also for its synonyms, place the tilde sign (~) immediately in front of your search term. For example: ~fast food


By putting double quotes around a set of words ("great wall of china"), you are telling Google to consider the exact words in that exact order without any change.


You can also specify that your search results must come from a given website. For example, the query [ iraq site:nytimes.com ] will return pages about Iraq but only from nytimes.com. Or specify a whole class of sites, for example [ iraq site:.gov ] will return results only from a .gov domain and [ iraq site:.iq ] will return results only from Iraqi sites.


Attaching a minus sign immediately before a word indicates that you do not want pages that contain this word to appear in your results. The minus sign should appear immediately before the word and should be preceded with a space. For example, [ anti-virus -software ] will search for the words 'anti-virus' but exclude references to software.


Google's default behavior is to consider all the words in a search, but if you want to specifically allow either one of several words, you can use the OR operator (note that you have to type 'OR' in ALL CAPS). For example, [ San Francisco Giants 2004 OR 2005 ] will give you results about either one of these years.

Image
"TONGUE-tied and twisted, just an earthbound misfit, I"

User avatar
Haunting Beauty
Posts: 4600
Joined: Fri Feb 20, 2009 4:20 pm
Location: Scotland
Current Mood: Spine chillingly spooky

Top Forum Index Page New Posts

Posted on

      

still using Zionist Google then I see :chin:


User avatar
Loves Smeggy's
Posts: 1418
Joined: Thu Apr 23, 2009 5:03 am
Location: Edinburgh
How Hot Are You?: The Big Bang!!
Current Mood: Electric!

Top Forum Index Page New Posts

Posted on

      

ghostgirl wrote:Yes you can search by File Type. Using chathamhouse as the example the format for searching by File Type is: chathamhouse filetype:ppt
Click this link to see the results, as you can see only ppt (powerpoint presentations) are listed.
http://www.google.co.uk/search?hl=en&as ... afe=images


I don't know where there's a complet list of parameters, but I can give you the ones that I know. :)



To search for web pages that have similar content to a given site, type "related:" followed by the website address into the Google search box. For example: related:www.bbc.co.uk


You can also get Google to ‘fill in the blank’ by adding an asterisk (*) at the part of the sentence or question that you want finished into the Google search box. For example: Nichola Tesla discovered *. Or try the query [ Obama voted * on the * bill ] to give you stories about different votes on different bills.


If you want to search not only for your search term but also for its synonyms, place the tilde sign (~) immediately in front of your search term. For example: ~fast food


By putting double quotes around a set of words ("great wall of china"), you are telling Google to consider the exact words in that exact order without any change.


You can also specify that your search results must come from a given website. For example, the query [ iraq site:nytimes.com ] will return pages about Iraq but only from nytimes.com. Or specify a whole class of sites, for example [ iraq site:.gov ] will return results only from a .gov domain and [ iraq site:.iq ] will return results only from Iraqi sites.


Attaching a minus sign immediately before a word indicates that you do not want pages that contain this word to appear in your results. The minus sign should appear immediately before the word and should be preceded with a space. For example, [ anti-virus -software ] will search for the words 'anti-virus' but exclude references to software.


Google's default behavior is to consider all the words in a search, but if you want to specifically allow either one of several words, you can use the OR operator (note that you have to type 'OR' in ALL CAPS). For example, [ San Francisco Giants 2004 OR 2005 ] will give you results about either one of these years.


Cool :) I knew abotu the minus sign to exclude terms.

the other I use often is 'define:' whne you want a dictionary type definition fo a word

For example ...

define: farting

gives ... 'fart: a reflex that expels intestinal gas through the anus' and other definitions
Related phrases are "walter the farting dog farting preacher farting fire farting through the ages amber the farting amputee history of farting" :rofl:

-----|0| None are more hopelessly enslaved than those who falsely believe they are free. |0|-----

"Capitalism profits from War - Humanity profits from Peace."

User avatar
Aliens Ate My Chicken!
Posts: 118653
Joined: Sat Dec 15, 2007 8:32 am
Location: Smegland
How Hot Are You?: The Big Bang!!
Current Mood: Won Tons Mons

Top Forum Index Page New Posts

Posted on

      

Best one I had was

Search - 'Salmonella'

Result - 'Ebay. get Salmonella today'

The best facial is a daily facial.

User avatar
Insane Poster
Posts: 23634
Joined: Sat Apr 18, 2009 6:40 pm
How Hot Are You?: Siberia
Current Mood: unknown

Top Forum Index Page New Posts

Posted on

      

http://www.google.com/help/cheatsheet.html

Interesting. You an even use google as a calculator

-----|0| None are more hopelessly enslaved than those who falsely believe they are free. |0|-----

"Capitalism profits from War - Humanity profits from Peace."

User avatar
Aliens Ate My Chicken!
Posts: 118653
Joined: Sat Dec 15, 2007 8:32 am
Location: Smegland
How Hot Are You?: The Big Bang!!
Current Mood: Won Tons Mons


Share this Topic on Facebook:

Return to Cool Websites

Similar topics

  • Topics
    Replies
    Views
    Last post
    Top of Page

Who is online

Users browsing this forum: No registered users and 3 guests