Celebrating the birth of my first “WebBot”

April 13th, 2008 by ScottK | No Comments | Filed in Python

Today is a special day for me as I have given birth to my first fully functional and interactive webbot. This momentous occasions has lifted me from creating spiders written in PHP and even JavaScript to a whole new level of software automation that I have been tinkering with for a while. As the added bonus I have learned a bit more python and actually accomplished early on the true functionality of what I wanted to achieve.

 Given that I am the primary developer for Zookoda I know that the feed retrieval system can be a bit awkward at times. The library written in ColdFusion isn’t as adaptive as it could be and I wanted to find another way to retrieve feeds. Add to the fact that some members feeds are only retrieved via a login only makes things more difficult. Enter Python and the mechanize and ClientForm libraries.

Using both these libraries I was easily able to detect whether a I had a log in form based upon the system I was hitting and entering the credentials for the login form. Easy enough as form.click() submitted the form along with the cookie to retain the session to retrieve the authentication for getting the feed. So why didn’t I stop here?

The login problem and subsequent retrieval was quite easy, but I wanted to go up against something that was way more difficult. Being that as developers for IZEA we use campfirenow as a means of collaborating and as it turns out I found my challenge.

First the login is a POST method on the form. Second, sending a message is done via the post but the message box is done through an XMMLHttpRequest. So clicking the “Send Message” means you have to have JavaScript enabled. Coincidentaly logging in with the mechanize and ClientForm was no problem, but the sending of a message was quite difficult.

So how does this relate? My baby ScooterBot is able to enter the configured room and if required, log in. It can then parse the XHTML and read the message id, who the person was that made the message, and the message itself. If you are a user of campfirenow then you know the XHTML changes, i.e. the rows you see. So this proves my original intent that my program can be adaptive.

The second challenge I asked of myself was to response to the entries (message) entered by others. Here is where the program took a hair pulling turn. The textarea you enter text and subsequent submit button is unobtrusively watched by JavaScript, and as Python scrapers don’t have the capability I was out of luck. Until I discovered that the mechanize did support it as “mechanize.urlopen(<complete_url> + ”/speak”, urllib.urlencode({”message” : message, “t” : time.time()})

At which point the correct POST was made, with urllib.urlencode as the second arg, and that mechanize still held my session cookie. Trust me I tried for hours with out it and only came up against the log in page. 

I won’t bore you with the “interactivity” as it’s far from final as of yet. ScooterBot does look at commands given to it and acts based upon those. It even re-acts to such things as “@ScooterBot: I Love you” at which point response with “<person>: I am kinda partial to you too”. The beginning of sentiment is there as well as it looks for keywords with users conversations and interjects a pre-programmed phrase.

It is still very rough though as it’s only one day old ;) However, years ago I was one day old and I have learned so much :)

Tags: , , ,


Share Your Thoughts