Fetch a page from GoogleTag(s): Networking


You can't directly fetch a page from Google because a check is made (by Google) to restrict access to "real" browser so a "403" HTTP code is returned to your Java program.

You need to fool Google by pretending to be a legitimate browser.

String search= "What you want to search for";
String google="http://www.google.ca/search?q=" 
   + search + "&hl=en&ie=UTF-8&oe=UTF8";


URL urlObject = new URL(google);
URLConnection con = urlObject.openConnection();
con.setRequestProperty
  ( "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" );
webData = new BufferedReader(new InputStreamReader(con.getInputStream()));
...
Note : You may want to take a look at http://www.google.com/apis/ to learn how to interact with Google via the official API's.

As seen above, you can directly override the HTTP header. Another way is to start the program with a modified System.property http.agent.

>java 
  "-Dhttp.agent=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" MyClass
or in your program
System.setProperty
  ("http.agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");

blog comments powered by Disqus