iNET Interactive - Online Advertising Agency
          
   Home    Authors    About    Login    Contact Us
   Search:   
Advanced Search     
  Articles

  ASP (26)
  ASP.NET (19)
  C and C++ (4)
  CFML (2)
  CGI and Perl (16)
  Flash (2)
  Java (7)
  JavaScript (28)
  PHP (92)
  MySQL (13)
  MSSQL (3)
  HTML (34)
  SEO (9)
  Visual Basic (12)
  CSS (13)
  SSI (5)
  XML (12)
  C# (14)

  Developer News

May 15, 2008
Reader Question - Would you host your client's work on your website?
About
 
May 15, 2008
How to Create an Ajax Autocomplete Text Field: Part 6
WebReference.com
 
May 14, 2008
Poll: Are the browser safe colors still needed?
About
 
May 14, 2008
Google Doctype launched
About
 
May 14, 2008
Web Editor Reviews - 6 New Reviews
About
 
May 14, 2008
Build Beautiful Buttons in Photoshop, Part I
SitePoint
 
Courtesy of moreover.com
 
Want to receive new articles via e-mail? Click here!
/Home /CGI and Perl

Perl :: LWP + MIME or how to send a web-page by e-mail 

  Views:    14233
  Votes:    2
by Temzupin de Rabzentalf 12/17/03 Rating: 

Synopsis:

In this tutorial I’ll tell how to download HTML-page, including all images, and send it by e-mail using Perl modules LWP::UserAgent and MIME::Lite.
Pages: 
The Article

In this tutorial I'll tell how to download HTML-page, including all images, and send it by e-mail using Perl modules LWP::UserAgent and MIME::Lite. Personally I receive fresh thumbnails from www.deviantart.com daily using this method, because don't want to manually download and save these pages. :-) We will need the next modules for our script (you can download them at http://www.cpan.org)

LWP::UserAgent - WWW user agent class
MIME::Lite - lite MIME encoder/decoder
URI::URL - to work with URL
HTML::LinkExtor - to receive a list of all URL's in a document
Time::Local - to transform time to seconds

As an example we'll see, how to download all thumbnails for one day in the section of sci-fi wallpapers from http://www.deviantart.com.

The site is constantly refreshing, so we better download only yesterday's thumbnails, 'cause they are not changing anymore. An URL of a page we want is composed of:
http://browse.deviantart.com/wallpaper/scifi/?startts=<start_time_stamp>&endts=<end_time_stamp>
For example, the page http://browse.deviantart.com/wallpaper/scifi/?startts=1071648000&endts=1071734400 will include all the sci-fi wallpapers thumbnails for December 17, 2003.

Now, a few words for those, who has just begun to learn Perl.
How do I download a web-page?

require LWP::UserAgent;
$ua = LWP::UserAgent->new;

$ua->proxy(['http', 'ftp'], 'proxy-server address');
$req = new HTTP::Request('GET' => 'page to be downloaded');

if ($res->is_success) { $page = $res->content; }
How do I send a e-mail with an attachment?
require MIME::Lite;
$msg = MIME::Lite->new(
   From =>'your@address.com',
   To =>'recipient@address.com',
   Subject =>'Subject',
   Type => 'multipart/related');

$msg->attach(
   Type =>'text/plain; charset=windows-1251',
   Data => message text);

$msg->attach(
   Type => 'image/gif',
   Path => path to the file,
   Filename =>'img.gif');
$msg->send();



Let's see how the script works now.
Determine URL of the document
Download web-page content
Seek for all images on the page and download them
Change links in the documents to absolute values
Attach external files CSS, JavaScript
Encode all images and assemble MIME-object
Send the message by e-mail

I'll describe the technical realization of the script  schematically, but if anything is unclear - see the script itself.
Let's determine a time stamp for yesterday's date and a day before yesterday also.

$yesterday = time() - 86400;
$before_yesterday = time() - 86400;


Determine URL of the page to be downloaded according to this template.
$url_page="http://www.deviantart.com/wallpaper/scifi/?startts=".$yesterday."&endts=".$before_yesterday;

Actually download the page contents using LWP module:

 if ($url_page && $url_page=~/^(https?|ftp|file|nntp):\/\//)
 {

  my $req = new HTTP::Request('GET' => $url_page);
  my $res = $ua->request($req);
  $gabarit = $res->content;
 }


Include external CSS and JavaScript. I'll show it in a very simplified way, but you'll be able to understand if you wish - download the file with the scripts and include it into neccessary location in the HTML-file.

 CSS-file = '<style type="text/css">'."\n".'<!--'."\n". file with CSS ."\n-->\n</style>\n";
HTML-file =~s/<link([^<>]*?)href="?([^\" ]*)"?([^>]*)>/ CSS-file /iegmx;
JavaScript-file = '<script><!--'."\n". file with JavaScript ."\n-->\n</script>\n";
HTML-file =~s/<script([^>]*)src="?([^\" ]*js)"?([^>]*)>/ JavaScript-file /iegmx;


Now walking over all of the links and changing relative path with absolute. This is necessary to make sure that you be able to jump exactly to the location that link was pointing on the original web-page.

 my $analyseur = HTML::LinkExtor->new;
 $analyseur->parse($gabarit);
 my @l = $analyseur->links;
  
 foreach my $url (@l)
 {
  my $urlAbs = URI::WithBase->new($$url[2],$racinePage)->abs;
  chomp $urlAbs;
  if ( ($$url[0] eq 'a') && ($$url[1] eq 'href') && ($$url[2]) && (($$url[2]!~m!^http://!) && ($$url[2]!~m!^mailto:!)) )
  {
   $gabarit=~s/\s href= [\"']? $$url[2] [\"']?/ href="$urlAbs"/gimx;
  }
 }


Now we should locate all the images in the document, download them, determine their types and return them, encoded with MIME.

 if ( ((lc($$url[0]) eq 'img') || (lc($$url[0]) eq 'src')) )
 {
  push(@mail, create_image_part($urlAbs));
 }

 
if (lc($ur)=~/gif$/) {$type="image/gif";}
 elsif (lc($ur)=~/jpg$/) {$type = "image/jpg";}
 else { $type = "application/x-shockwave-flash"; }
 my $res2 = $ua->request(new HTTP::Request('GET' => $ur));
 $buff1=$res2->content;
 $file_name = substr($ur,rindex($ur,"/")+1,length($ur));
  
# encode next image
 my $mail = new MIME::Lite( Data => $buff1, Encoding =>'base64', 'Filename'=>$file_name);
 $mail->attr('Content-type'=>$type);
 $mail->attr('Content-Location'=>$ur);

Create MIME-object, fill "From", "To" and "Subject" fields. If there was no images on the page - then the message will have the type "text/html", otherwise - "multipart/related".

$mail = new MIME::Lite
   'From' =>
'somebody@somewhere.com',
   'To' => $to_email,
   'Subject' => $url_page,
   'Data' => $html;
 $mail->attr("Content-type" => $content_type);
 if (@mail)
 {
  $mail->replace("Type" => "multipart/related");
# attach every image
  foreach (@mail) {$mail->attach($_);}
 }

Now send page by e-mail.
MIME::Lite->send('smtp', "SMTP-server address", Timeout=>60);
$mail->send();
 

Script execution

Place our script in a folder, where execution of CGI scripts is enabled and make the file executable

 chmod 750 /usr/local/www/cgi-bin/html_on_email3.pl

To automate the process entirely, we can run our script by CRON. For that matter we'll add one string to file /etc/crontab 

 0 9 * * * root /usr/local/www/cgi-bin/html_on_email3.pl

and every morning we will have a fresh set of thumbnails of the sci-fi wallpapers in mailbox,  with the real links to the actual images as well.

Pages: 

Similar/related articles:


 
  Sponsors