PDA

View Full Version : Help needed on how to parse part of an html file


mistuk
2005.05.10, 01:51 PM
Hi there,

I'm looking to develop a dashboard widget to display aviation weather reports (METAR's). The user will input an airfield which is sent to a remote web site (not under my control) - this runs a perl script which sends back a complete web page. I only need to extract part of this page for the widget.

Question is, how do I go about doing this? I suspect using PHP might be a good way of doing it, but I'm very new to PHP.

Hope someone can help!

Thanks,

Kevin

OneSadCookie
2005.05.10, 03:12 PM
Perl or Ruby would be a much better place to start. I seem to recall reading somewhere that calling shell scripts from Dashboard widgets is easy, too. Start by writing the Ruby script to download the HTML page and find the bits you need and print the information to standard output. Then you can think about putting it behind a Dashboard widget :)

iefan
2005.05.11, 08:43 AM
Widgets can use the XmlHttpResquest object, and if the web page is a properly formed XML document then it should be pretty easy to extract the data from inside the widget just using javascript.

Edit: Had the name of the XmlHttpRequest object wrong.

Barcode
2005.05.11, 09:00 AM
You should be able to do the whole widget with only javascript/css/html.

I suggest you remove the middle tier PHP site from the design. Instead have the user select an airport code from a predefined list (stored locally in a text, CSV file). The airport code file will contain pairs for airport code and the url to the airport web site. When they select the airport it will read the URL and pass it to a javascript XMLHttpRequest object.

http://developer.apple.com/internet/webcontent/xmlhttpreq.html

This object can read the response of the site. The hard part is then to parse the html for display... javascript has some good parsing capabilities with regular expressions.

Burden
2005.05.12, 10:26 AM
I suggest you remove the middle tier PHP site from the design. Instead have the user select an airport code from a predefined list (stored locally in a text, CSV file). The airport code file will contain pairs for airport code and the url to the airport web site.

To clarify, mistuk is probably talking about this:

http://weather.noaa.gov/cgi-bin/mgetmetar.pl?cccc=kmdw

The only unique part of that URL is the 4-letter ICAO code on the end. There's no need for an "airport code file" to form that URL from user input.

jeanbreckman
2009.01.31, 01:11 AM
i also had the same question. any more suggestion guys?

RandiR
2009.04.10, 11:16 AM
Definitely use biterscripting for parsing html. Download and install it free at http://www.biterscripting.com. Install all their sample scripts with the following command

script "http://www.biterscripting.com/Download/SS_AllSamples.txt"

Now, you can use any of their sample scripts (or develop your own customized scripts). I will show a couple of examples below.

script "SS_WebPageToCSV.txt" page("http://somepage") number(1)

The above will extract the first (1) table from the web page and put it in a CSV format.

script "SS_WwbPageToText.txt" page("http://somepage")

The above will extract the plain text from that page.

To extract the information you need from a web page, determine what appears before and after that information. Let's say the string "abc" appears before and "xyz" appears after. "abc" and "xyz" can be anything, including html tags, special characters, etc. The following commands will extract the information you need.

var str page, info ; cat "http://www.somepage" > $page
# Extract everything after abc into $info.
stex "^abc^[" $page > $info
# Remove from $info everything beginning with xyz.
stex "[^xyz^" $info > null
# $info has now the info you need. Print it (or use it in other ways).
echo $info

Randi