Difference between revisions of "Invisible Web"
(New page: The Invisible Web (also known as the Deep Web Deepnet, Dark Web or the Hidden Web) refers to World Wide Web content that is not part of the surface Web, which is indexed by...) |
(→Resding on the Invisible or Deep Web) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
The Invisible Web (also known as the [[Deep Web]] [[Deepnet]], [[Dark Web]] or the [[Hidden Web]]) refers to World Wide Web content that is not part of the surface Web, which is indexed by standard search engines. Accessing the Invisible Web is an important part of the repertoire of [[Investigative Research]]. | The Invisible Web (also known as the [[Deep Web]] [[Deepnet]], [[Dark Web]] or the [[Hidden Web]]) refers to World Wide Web content that is not part of the surface Web, which is indexed by standard search engines. Accessing the Invisible Web is an important part of the repertoire of [[Investigative Research]]. | ||
− | + | ==Techniques== | |
− | ===Invisible or Deep Web | + | ===Searching for filetypes=== |
+ | Many researchers search for PDF files by using the search limiter filetype:pdf. The search engine [http://www.osun.org/ OSUN.org] provides an interface for searching PDF, DOC and PPT files. | ||
+ | |||
+ | ==Reading on the Invisible or Deep Web== | ||
*[http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html Invisible or Deep Web: What it is, How to find it, and Its inherent ambiguity] UC Berkeley - Teaching Library Internet Workshops | *[http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html Invisible or Deep Web: What it is, How to find it, and Its inherent ambiguity] UC Berkeley - Teaching Library Internet Workshops | ||
*Wright, Alex '[http://www.nytimes.com/2009/02/23/technology/internet/23search.html?th&emc=th Exploring a 'Deep Web' That Google Can’t Grasp], [[New York Times]] 2009-02-22. | *Wright, Alex '[http://www.nytimes.com/2009/02/23/technology/internet/23search.html?th&emc=th Exploring a 'Deep Web' That Google Can’t Grasp], [[New York Times]] 2009-02-22. | ||
Line 7: | Line 10: | ||
*Garcia, Frank [http://web.archive.org/web/19961205083117/http://tcp.ca/Jan96/BusandMark.html Business and Marketing on the Internet] ''Masthead'' volume 9 issue = 1 January 1996 | *Garcia, Frank [http://web.archive.org/web/19961205083117/http://tcp.ca/Jan96/BusandMark.html Business and Marketing on the Internet] ''Masthead'' volume 9 issue = 1 January 1996 | ||
*Lesk, Michael [http://www.lesk.com/mlesk/ksg97/ksg.html How much information is there in the world?] | *Lesk, Michael [http://www.lesk.com/mlesk/ksg97/ksg.html How much information is there in the world?] | ||
− | Sriram Raghavan, Hector Garcia-Molina [http://ilpubs.stanford.edu:8090/456/1/2000-36.pdf Crawling the Hidden Web] Stanford Digital Libraries Technical Report, 2000 | + | *Sriram Raghavan, Hector Garcia-Molina [http://ilpubs.stanford.edu:8090/456/1/2000-36.pdf Crawling the Hidden Web] Stanford Digital Libraries Technical Report, 2000 |
*Ntoulas, Alexandros, Petros Zerfos, and Junghoo Cho [http://oak.cs.ucla.edu/~cho/papers/ntoulas-hidden.pdf Downloading Hidden Web Content] [[UCLA]] Computer Science 2005 | *Ntoulas, Alexandros, Petros Zerfos, and Junghoo Cho [http://oak.cs.ucla.edu/~cho/papers/ntoulas-hidden.pdf Downloading Hidden Web Content] [[UCLA]] Computer Science 2005 | ||
*Luciano, Barbosa and Juliana Freire [http://www.cs.utah.edu/~lbarbosa/publications/ache-www2007.pdf An Adaptive Crawler for Locating Hidden-Web Entry Points] WWW Conference 2007, 2007 | *Luciano, Barbosa and Juliana Freire [http://www.cs.utah.edu/~lbarbosa/publications/ache-www2007.pdf An Adaptive Crawler for Locating Hidden-Web Entry Points] WWW Conference 2007, 2007 | ||
*Luciano, Barbosa and Juliana Freire [http://www.cs.utah.edu/~lbarbosa/publications/webdb2005.pdf Searching for Hidden-Web Databases]. WebDB 2005, 2005 | *Luciano, Barbosa and Juliana Freire [http://www.cs.utah.edu/~lbarbosa/publications/webdb2005.pdf Searching for Hidden-Web Databases]. WebDB 2005, 2005 | ||
*Jayant, Madhavan, David Ko, Łucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Halevy [http://www.cs.cornell.edu/~lucja/Publications/I03.pdf Google’s Deep-Web Crawl] VLDB Endowment, ACM, 2008 | *Jayant, Madhavan, David Ko, Łucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Halevy [http://www.cs.cornell.edu/~lucja/Publications/I03.pdf Google’s Deep-Web Crawl] VLDB Endowment, ACM, 2008 | ||
− | Cohen Laura [http://www.internettutorials.net/deepweb.html Internet Tutorials: The Deep Web] | + | *Cohen Laura [http://www.internettutorials.net/deepweb.html Internet Tutorials: The Deep Web] |
* Barker, Joe (Jan 2004). ''[http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html Invisible Web: What it is, Why it exists, How to find it, and its inherent ambiguity]'' UC Berkeley - Teaching Library Internet Workshops. | * Barker, Joe (Jan 2004). ''[http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html Invisible Web: What it is, Why it exists, How to find it, and its inherent ambiguity]'' UC Berkeley - Teaching Library Internet Workshops. | ||
* Gruchawka, Steve (June 2006). ''[http://techdeepweb.com/ How-To Guide to the Deep Web]'' TechDeepWeb.com, [http://TechDeepWeb.com/ http://TechDeepWeb.com] | * Gruchawka, Steve (June 2006). ''[http://techdeepweb.com/ How-To Guide to the Deep Web]'' TechDeepWeb.com, [http://TechDeepWeb.com/ http://TechDeepWeb.com] | ||
* Hamilton, Nigel (2003). [http://turbo10.com/papers/deepnet.pdf ''The Mechanics of a Deep Net Metasearch Engine''] - 12th World Wide Web Conference poster. | * Hamilton, Nigel (2003). [http://turbo10.com/papers/deepnet.pdf ''The Mechanics of a Deep Net Metasearch Engine''] - 12th World Wide Web Conference poster. | ||
− | * | + | * Bin He, Chang, Kevin Chen-Chuan 2003 [http://eagle.cs.uiuc.edu/pubs/2003/unifiedschema-sigmod03-hc-mar03.pdf Statistical Schema Matching across Web Query Interfaces ''Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data'' |
* Bin, He, Patel, Mitesh; Zhang, Zhen; Chang, Kevin Chen-Chuan [http://doi.acm.org/10.1145/1230819.1241670 Accessing the Deep Web: A Survey] ''Communications of the ACM (CACM)'', 94–101, 2007 May volume 50 issue 2 doi=10.1145/1230819.1241670+ | * Bin, He, Patel, Mitesh; Zhang, Zhen; Chang, Kevin Chen-Chuan [http://doi.acm.org/10.1145/1230819.1241670 Accessing the Deep Web: A Survey] ''Communications of the ACM (CACM)'', 94–101, 2007 May volume 50 issue 2 doi=10.1145/1230819.1241670+ | ||
* Panagiotis G. Ipeirotis, Gravano, Luis; Sahami, Mehran 2001 [http://qprober.cs.columbia.edu/publications/sigmod2001.pdf Probe, Count, and Classify: Categorizing Hidden-Web Databases] ''Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data'' pages 67-78. | * Panagiotis G. Ipeirotis, Gravano, Luis; Sahami, Mehran 2001 [http://qprober.cs.columbia.edu/publications/sigmod2001.pdf Probe, Count, and Classify: Categorizing Hidden-Web Databases] ''Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data'' pages 67-78. | ||
* King, John D. |coauthors= Li, Yuefeng; Tao, Daniel; Nayak, Richi [http://sky.fit.qut.edu.au/~kingj2/downloads/king07mining.pdf Mining World Knowledge for Analysis of Search Engine Content] ''Web Intelligence and Agent Systems: an International Journal'' pages 233–253, 2007 November volume 5 issue 3 | * King, John D. |coauthors= Li, Yuefeng; Tao, Daniel; Nayak, Richi [http://sky.fit.qut.edu.au/~kingj2/downloads/king07mining.pdf Mining World Knowledge for Analysis of Search Engine Content] ''Web Intelligence and Agent Systems: an International Journal'' pages 233–253, 2007 November volume 5 issue 3 | ||
− | * McCown, Frank, Liu, Xiaoming; Nelson, Michael L.; Zubair, Mohammad | + | * McCown, Frank, Liu, Xiaoming; Nelson, Michael L.; Zubair, Mohammad [http://library.lanl.gov/cgi-bin/getfile?LA-UR-05-9158.pdf Search Engine Coverage of the OAI-PMH Corpus] [[IEEE Internet Computing]] pages 66–73 2006 Mar/Apr volume 10 issue 2 doi = 10.1109/MIC.2006.41 |
* Price, Gary, Sherman, Chris ''The Invisible Web : Uncovering Information Sources Search Engines Can't See'' 2001 July CyberAge Books isbn=0-910965-51-X | * Price, Gary, Sherman, Chris ''The Invisible Web : Uncovering Information Sources Search Engines Can't See'' 2001 July CyberAge Books isbn=0-910965-51-X | ||
* Shestakov, Denis (June 2008). ''[https://oa.doria.fi/handle/10024/38506 Search Interfaces on the Web: Querying and Characterizing]''. TUCS Doctoral Dissertations 104, University of Turku | * Shestakov, Denis (June 2008). ''[https://oa.doria.fi/handle/10024/38506 Search Interfaces on the Web: Querying and Characterizing]''. TUCS Doctoral Dissertations 104, University of Turku |
Latest revision as of 13:38, 8 February 2010
The Invisible Web (also known as the Deep Web Deepnet, Dark Web or the Hidden Web) refers to World Wide Web content that is not part of the surface Web, which is indexed by standard search engines. Accessing the Invisible Web is an important part of the repertoire of Investigative Research.
Techniques
Searching for filetypes
Many researchers search for PDF files by using the search limiter filetype:pdf. The search engine OSUN.org provides an interface for searching PDF, DOC and PPT files.
Reading on the Invisible or Deep Web
- Invisible or Deep Web: What it is, How to find it, and Its inherent ambiguity UC Berkeley - Teaching Library Internet Workshops
- Wright, Alex 'Exploring a 'Deep Web' That Google Can’t Grasp, New York Times 2009-02-22.
- Michael K. |last=Bergman | title = The Deep Web: Surfacing Hidden Value | journal = The Journal of Electronic Publishing | year = 2001 | month = August | volume = 7 | issue = 1 | url = http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104 |doi=10.3998/3336451.0007.104
- Garcia, Frank Business and Marketing on the Internet Masthead volume 9 issue = 1 January 1996
- Lesk, Michael How much information is there in the world?
- Sriram Raghavan, Hector Garcia-Molina Crawling the Hidden Web Stanford Digital Libraries Technical Report, 2000
- Ntoulas, Alexandros, Petros Zerfos, and Junghoo Cho Downloading Hidden Web Content UCLA Computer Science 2005
- Luciano, Barbosa and Juliana Freire An Adaptive Crawler for Locating Hidden-Web Entry Points WWW Conference 2007, 2007
- Luciano, Barbosa and Juliana Freire Searching for Hidden-Web Databases. WebDB 2005, 2005
- Jayant, Madhavan, David Ko, Łucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Halevy Google’s Deep-Web Crawl VLDB Endowment, ACM, 2008
- Cohen Laura Internet Tutorials: The Deep Web
- Barker, Joe (Jan 2004). Invisible Web: What it is, Why it exists, How to find it, and its inherent ambiguity UC Berkeley - Teaching Library Internet Workshops.
- Gruchawka, Steve (June 2006). How-To Guide to the Deep Web TechDeepWeb.com, http://TechDeepWeb.com
- Hamilton, Nigel (2003). The Mechanics of a Deep Net Metasearch Engine - 12th World Wide Web Conference poster.
- Bin He, Chang, Kevin Chen-Chuan 2003 [http://eagle.cs.uiuc.edu/pubs/2003/unifiedschema-sigmod03-hc-mar03.pdf Statistical Schema Matching across Web Query Interfaces Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data
- Bin, He, Patel, Mitesh; Zhang, Zhen; Chang, Kevin Chen-Chuan Accessing the Deep Web: A Survey Communications of the ACM (CACM), 94–101, 2007 May volume 50 issue 2 doi=10.1145/1230819.1241670+
- Panagiotis G. Ipeirotis, Gravano, Luis; Sahami, Mehran 2001 Probe, Count, and Classify: Categorizing Hidden-Web Databases Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data pages 67-78.
- King, John D. |coauthors= Li, Yuefeng; Tao, Daniel; Nayak, Richi Mining World Knowledge for Analysis of Search Engine Content Web Intelligence and Agent Systems: an International Journal pages 233–253, 2007 November volume 5 issue 3
- McCown, Frank, Liu, Xiaoming; Nelson, Michael L.; Zubair, Mohammad Search Engine Coverage of the OAI-PMH Corpus IEEE Internet Computing pages 66–73 2006 Mar/Apr volume 10 issue 2 doi = 10.1109/MIC.2006.41
- Price, Gary, Sherman, Chris The Invisible Web : Uncovering Information Sources Search Engines Can't See 2001 July CyberAge Books isbn=0-910965-51-X
- Shestakov, Denis (June 2008). Search Interfaces on the Web: Querying and Characterizing. TUCS Doctoral Dissertations 104, University of Turku
- Wright, Alex (Mar 2004). In Search of the Deep Web, Salon.com, http://www.salon.com/tech/feature/2004/03/09/deep_web/
- Firms Push for a More Searchable Federal Web - article in the Washington Post, Thursday, December 11, 2008; Page D01 by Peter Whoriskey.
- DeepDyve A research search engine to access content in the Deep Web
- DeepPeep a search engine project to "discover the hidden web"