htmlSQL
a PHP class to query the web by an SQL like language
htmlSQL is a experimental PHP class which allows you to access HTML
values by an SQL like syntax. This means that you don't have to write
complex functions (regular expressions) to extract specific values.
The htmlSQL queries look like this:
This query returns an array with all links that contain
the attribute class="list".
All web transfers in htmlSQL are using the awesome
Snoopy class
(package version 1.2.3)
But for file or string queries Snoopy is not required. You find all
Snoopy related documents (copyright, readme, etc) in the snoopy_data/
folder.
How to use
Just include the "snoopy.class.php" and the "htmlsql.class.php" files
into your PHP scripts and look at the examples (examples/) to get an
idea of how to use the htmlSQL class. It should be very simple :-)
Background & Idea
I had this idea while extracting some data from a website. As I realized
that the algorithms and functions to extract links and other tags are
often the same - I had the idea to combine all functions to an universal
usable class. While drinking a coffee and thinking on that problem, I
thought it would be cool to access HTML elements by using SQL. So I
started creating this class...
Warning!
The eval() function is used for the WHERE statement. Make sure that all
user data is checked and filtered against malicious PHP code.
Never trust user input!
Todo
- enhance the HTML parser
- test htmlSQL with invalid and bad HTML files
- replace the ugly eval() method for the WHERE statement
with an own method
- more error checks
- include the LIMIT function/method like in SQL
License
htmlSQL uses a modified BSD license, you find the full license text
in the "htmlsql.class.php".
Related projects
After I finished htmlSQL I searched for a good name because my first idea "webSQL" was already taken by another
project. As I was searching for a free name on Google I found these related projects:
Feedback
You can reach me by
E-Mail or by using the contact form on my
contact page.