Need help with creating a script

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

jman3451
Posts: 3
Joined: Sat Apr 30, 2011 11:56 pm UTC

Need help with creating a script

Postby jman3451 » Sun May 01, 2011 12:10 am UTC

I want to create a script that will look at a specific value on a webpage and will refresh it at a specific time interval (maybe 5 to 10 minutes) and will create a popup window alerting me if the value changes.

I know Java and Python pretty well but am still not sure about how to approach this problem (I'm thinking that Python would be the better language to use in this situation).

I feel like my main issue with this is accessing the data on the webpage and refreshing it.

If someone can get me to a starting point that'd be great! Thanks!

User avatar
naschilling
Posts: 142
Joined: Wed Apr 06, 2011 2:52 pm UTC
Contact:

Re: Need help with creating a script

Postby naschilling » Sun May 01, 2011 3:49 am UTC

This will mostly depend on what OS you have and what the web page is written in. First, I hope the website is in XHTML Strict. That will make parsing it much simpler. You can use an XML parser, determine the XPath of the object you want to monitor, and query it periodically.

If you are running Linux, you can create a simple Perl script to do it and run that as a cron job. I bet someone out there can do it in 3 lines of code.

Windows will be a bit trickier. You can write a command line Java program pretty quickly. It won't be as quick or pretty as the Linux solution.
If you don't have walls, why would you need Windows?

User avatar
starfruitinc
Posts: 35
Joined: Sat Dec 26, 2009 4:13 am UTC

Re: Need help with creating a script

Postby starfruitinc » Sun May 01, 2011 4:35 am UTC

Python has a http library (Urllib I think). I have used it before and it returns the entire HTML code of the specified page. It may be of use to you. The data can be parsed with an XML library IMO - which I think Python does have (http://docs.python.org/library/xml.dom.html). Or you could use regex. Either way would work
Go FOSS! Rants and discussions on Software and Programming
http://guysonfoss.blogspot.com

A Linux holiday:
http://guysonfoss.blogspot.com/2010/12/special-holiday-episode-and-poem.html

Ankit1010
Posts: 135
Joined: Fri Feb 11, 2011 11:32 am UTC

Re: Need help with creating a script

Postby Ankit1010 » Sun May 01, 2011 5:03 am UTC

I think Perl would be really well suited to this, its awesome at dealing with webpages.
Use the "LWP::Simple" package. That will enable you to use the "get" command, which stores the source code of the page you give it as a string. Then you can use a regex to get the exact data you want.Implementing the whole thing in a timer within a loop should be easy after that.

Like the previous posters said, you could do it easily in a a few lines of code in perl.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Need help with creating a script

Postby EvanED » Sun May 01, 2011 5:41 am UTC

naschilling wrote:If you are running Linux, you can create a simple Perl script to do it and run that as a cron job. I bet someone out there can do it in 3 lines of code.

I disagree that a cron job is the best way; I think better would be to have one process running in the background, either with a timer interrupt scheduled or just sleeping for 10 sec or something at a time checking if 5 minutes has passed each time it wakes up.

If you have cron do it, you need some way of getting the information from one invocation to the next, which means there will be some file on the hard drive; this is unnecessary if you just keep the process running. It would probably be also slightly more involved to activate and deactivate it.

Windows will be a bit trickier. You can write a command line Java program pretty quickly. It won't be as quick or pretty as the Linux solution.

Both Python and Perl run on Windows just fine; if you take my advice and do it in one process, it'll probably look exactly the same. If you want to go with the cron solution, you can use Windows's task scheduler.

jman3451
Posts: 3
Joined: Sat Apr 30, 2011 11:56 pm UTC

Re: Need help with creating a script

Postby jman3451 » Sun May 01, 2011 6:48 am UTC

Yeah I'd prefer Windows since that is where I do most of my general computing (I typically only use linux to do specific tasks or to mess around) and I'm planning on having this run for a long period of time (e.g. weeks) although not necessarily continuously.

And this will be for a webpage with just html, some css, and javascript. But all I need to get to is some text.

User avatar
diabolo
Posts: 72
Joined: Fri Aug 08, 2008 4:17 pm UTC
Location: france

Re: Need help with creating a script

Postby diabolo » Sun May 01, 2011 9:12 am UTC

naschilling wrote:First, I hope the website is in XHTML Strict.

If it's not xhtml strict but not too messed up either it might be possible to use something like HTML Tidy or Tagsoup to fix it and hopefully get some valid xhtml to work with.

jman3451 wrote:And this will be for a webpage with just html, some css, and javascript. But all I need to get to is some text.

If you need to locate some specific data in the html structure, for example you're interested in the contents of <div id="foo">, use an HTTP and an XML library as advised in the previous posts.

However if you don't care about the structure and can work with only the text of the webpage you could try dumping the page from a text-only browser. This way you get plain text you can regexp in.
Lynx is supposed to run on Windows so check if you can work with the output from something like

Code: Select all

lynx --dump --nolist "url"

User avatar
thedufer
Posts: 263
Joined: Mon Aug 06, 2007 2:11 am UTC
Location: Northern VA (not to be confused with VA)
Contact:

Re: Need help with creating a script

Postby thedufer » Mon May 02, 2011 4:29 am UTC

EvanED wrote:I disagree that a cron job is the best way; I think better would be to have one process running in the background, either with a timer interrupt scheduled or just sleeping for 10 sec or something at a time checking if 5 minutes has passed each time it wakes up.


Isn't this what cron is, basically? Using cron instead of your own daemon removes the overhead of building a daemon.

EvanED wrote:If you have cron do it, you need some way of getting the information from one invocation to the next, which means there will be some file on the hard drive; this is unnecessary if you just keep the process running.


I don't see how storing a single small file is worse than building a daemon from the ground up and hoping it stays running. Plus, you want to store a file on shutdown anyway - having it all the time is just easier.

I'd suggest using python with urllib2 to get the page and Beautiful Soup to parse it. Parsing webpages with regex is never a good idea - Beautiful Soup is much more likely to not care about slight changes to the format, and much easier to change if the structure of the page changes significantly.

rflrob
Posts: 235
Joined: Wed Oct 31, 2007 6:45 pm UTC
Location: Berkeley, CA, USA, Terra, Sol
Contact:

Re: Need help with creating a script

Postby rflrob » Tue May 03, 2011 8:57 pm UTC

thedufer wrote:Plus, you want to store a file on shutdown anyway - having it all the time is just easier.



Since no one seems to have mentioned this, to pop up an alert, you probably want to look at on of the GUI toolkits, like Tkinter, PyQt, or any of I'd say at least a half-dozen more that my CLI-based experience neither knows nor cares about.
Ten is approximately infinity (It's very large)
Ten is approximately zero (It's very small)

User avatar
Steax
SecondTalon's Goon Squad
Posts: 3038
Joined: Sat Jan 12, 2008 12:18 pm UTC

Re: Need help with creating a script

Postby Steax » Wed May 04, 2011 2:59 am UTC

Do you mind having a web browser open while this process is running? (If it's your main workstation then you might as well have a browser open a lot.)

If so, consider a greasemonkey script that uses the DOM to check said element (or jQuery for more complex elements to track) that just sets an interval and does an Alert() when it changes. Not as elegant as writing an actual program for the purpose, but if you need a quick-and-dirty solution that works...
In Minecraft, I use the username Rirez.

User avatar
zed0
Posts: 179
Joined: Sun Dec 17, 2006 11:00 pm UTC

Re: Need help with creating a script

Postby zed0 » Thu May 05, 2011 8:18 am UTC

naschilling wrote:If you are running Linux, you can create a simple Perl script to do it and run that as a cron job. I bet someone out there can do it in 3 lines of code.

Well, it's not in perl, but I was bored:

Code: Select all

zed0@codd:~$ while true; do wget -qO- xkcd.com | sed -ne 's/.*<title>xkcd: \(.*\)<\/title>.*/\1/p'; sleep 300; done;

This just fetches the title of the current xkcd strip. (of course for xkcd there are json feeds etc that would be preferable to use)
Something like this could be adapted pretty easily to pretty much whatever part of a website you want without bothering to parse the entire page.

jman3451
Posts: 3
Joined: Sat Apr 30, 2011 11:56 pm UTC

Re: Need help with creating a script

Postby jman3451 » Fri May 06, 2011 9:07 pm UTC

wow thanks for the script
I think I'll still write a python script using beautifulsoup just for fun too

Thanks to everyone who gave suggestions too!


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 9 guests