RSS

Extracting HTML source from a URL website


Was just thinking of trying something short and sweet and thought of trying out a snippet for extracting code from the entered url.
Following is the code have not declared the namespaces on top but used them directly in the code to bring more clarity on which namespace the object comes from.

The code is self explanatory so wont add any explanations over here..

</// <summary>
/// Extracts the source from the url entered.
/// </summary>
/// <param name="url">url to fetch the source from.</param>
/// <returns>string: source for the url entered.</returns>
public static string GetHtmlPageSource(string url)
{

System.IO.Stream st = null;
System.IO.StreamReader sr = null;

try
{
// make a Web request
System.Net.WebRequest req = System.Net.WebRequest.Create(url);

// get the response and read from the result stream
System.Net.WebResponse resp = req.GetResponse();
st = resp.GetResponseStream();
sr = new System.IO.StreamReader(st);
// read all the text in it
return sr.ReadToEnd();
}
catch (Exception ex)
{
return string.Empty;
}
finally
{
// close the stream & reader objects.
sr.Close();
st.Close();
}
}



UPDATE:

If you need to authenticate the request use the following just before you make the request to read the source

// authenticate using the credentials passed for getting access to the page.
if (username != null && password != null)
req.Credentials = new System.Net.NetworkCredential(username, password);
// get the response and read from the result stream
.
.
.
  1. Mike

    August 25, 2009 at 1:05 PM

    Thanks Mate was looking for a similar code for my piece

Post a Comment

Copyright © Shounak S. Pandit. Powered by Blogger.

Ads