Does your favorite web service have a crappy interface? Make your own with Python, Python-Requests and BeautifulSoup!

There are some pretty useful sites out there, but some interfaces are just plain annoying.
Take for example: they have millions of users, but haven’t touched their interface since the beginning; if you get lots of messages it becomes a pain to go through them all very quickly.
I figured it might be easier to use my hacking skills to create my own interface.
Step 1: Login
First take a look at the source code of the form on the login page:

<form action="" method="post" id="frmLogin" name="frmLogin" class="form right">
	<div id="login-box">
		<input name="url" id="url" class="title" type="hidden">
		<input name="username" id="username" class="title input" type="text" value="l33tman">
        <label class="headline txtBlue size12 label username" for="username">Username</label>
		<input name="password" id="password" class="title input" type="password">
		<label class="headline txtBlue size12 label password" for="password">Password</label>
        <script type="text/javascript">
            var nowt = new Date(),
                tempt_F = nowt.getTimezoneOffset();
            document.write('<input type='hidden' value='' + tempt_F + '' name='tfset'/>');
        </script><input type="hidden" value="300" name="tfset">
		<input name="login" id="login" class="button norm-blue submit" type="submit" value="Check Mail!">
        <input name="callback" id="callback" type="hidden" value="">
        <input name="sid" id="sid" type="hidden" value="wcqugtcmwbpb2rvn345x4mxk">
    <script type="text/javascript">
        if (document.getElementsByTagName("html").lang == undefined || document.getElementsByTagName("html").lang == null) {
            var html = document.getElementsByTagName("html")[0];
            html["lang"] = "en";

We will use python-requests to make all our requests with a simulated user session. See for more details.
We will start by passing in all those input values to

import requests
session = requests.session()
payload = dict(username=username,
response ="", data=payload)

By using instead of the plain, we retain all the cookie information necessary to simulate an actual logged in user.
Step 2: Collect the message links
BeautifulSoup makes parsing html extremely simple. See for the docs.
Say we have an html string and we’d like to find all the html elements with the “message” class. Here’s how we would do that with BeautifulSoup:

soup = BeautifulSoup(html)
for message in soup.find_all('a', 'message'):
    # process your message

In step 1 we logged into and got a response object. We can pass the html contents of this object to BeatifulSoup to begin parsing.
For our case we need the next_page link and the links to the messages (the html code of POF is terrible, so some hackery was necessary to get the elements properly):

soup = BeautifulSoup(response.text)
next_page = soup.find('a', text='Next Page')attrs['href']
message_links = []
for message_link in soup.find_all(attrs={'href': re.compile('viewallmessages.*')}):

Step 3: Collect the content
We now need to go to each link and fetch the message content and user data.
Continuing to use the session object for all requests, we get:

def parse_all_messages(links):
    messages = []
    for link in links:
        comment_page = session.get(link)
        soup = BeautifulSoup(comment_page.text)
        for message in soup.find_all(attrs={'style': re.compile('width:500px.*')}):
            user = soup.find('span', 'username-inbox')
            user_image_url = soup.find('td', attrs={'width':"60px"}).img.attrs['src']
    return sorted(messages, key=lambda m: to_date(m['date']), reverse=True)

Step 4: Pretty Print the data
I have opted to use Jinja2 to render the html, but this is not at all necessary. Jinja2 is a simple templating library that is used in many python web frameworks. See for a more in depth tutorial.
It’s fairly simple to use:

>>> from jinja2 import Template
>>> template = Template('Hello {{ name }}!')
>>> template.render(name='John Doe')
u'Hello John Doe!'

Be careful to properly encode your strings when using Jinja2. POF has some malformed characters which required cleaning the strings with “”.encode(‘ascii’, ‘ignore’)
Step 5: Run it!
Below is the script in its entirety.

# A simple script to scrape your pof messages and
# print them to single html file. Also outputs to json.
# Usage:
# sudo pip install beautifulsoup4 requests jinja2
# python <username> <password> <output_prefix>
# firefox output_prefix.html
# Author:
# Ramin Rahkhamimov
import requests
from bs4 import BeautifulSoup
import re
from jinja2 import Template
import json
import sys
from datetime import datetime
pof_url = lambda x: "" % x
session = requests.session()
def append_message_links(e, links):
    soup = BeautifulSoup(e.text)
    for a in soup.find_all(attrs={'href': re.compile('viewallmessages.*')}):
    next_page = soup.find('a', text='Next Page')
    return next_page and pof_url(next_page.attrs['href'])
def get_all_message_links(username, password):
    links = []
    payload = dict(username=username,
    e ="processLogin.aspx"), data=payload)
    next_page = append_message_links(e, links)
    while next_page:
        e = session.get(next_page)
        next_page = append_message_links(e, links)
    return set(links)
def clean_string(string):
    return string.encode('ascii', 'ignore')
def to_date(date_string):
    return datetime.strptime(date_string, '%m/%d/%Y %I:%M:%S %p')
def parse_all_messages(links):
    messages = []
    for link in links:
        comment_page = session.get(link)
        soup = BeautifulSoup(comment_page.text)
        for message in soup.find_all(attrs={'style': re.compile('width:500px.*')}):
            user = soup.find('span', 'username-inbox')
            user_image_url = soup.find('td', attrs={'width':"60px"}).img.attrs['src']
    return sorted(messages, key=lambda m: to_date(m['date']), reverse=True)
def save_messages(messages, prefix):
    template = Template("""
            .user, .message, .date {
                display: inline-block;
                vertical-align: top;
            .message {
                width: 500px;
                padding-left: 10px;
        {% for message in messages %}
            <a href="{{message.user_url}}" class="user">
            <img src="{{message.user_image_url}}"/>
            <div class="message">
            <div class="date">
        {% endfor %}
    with open('%s.html' % prefix, 'w') as f:
    with open('%s.json' % prefix, 'w') as f:
if __name__ == '__main__':
    if len(sys.argv) != 4:
        print "Usage: <username> <password> <output_prefix>"
    links = get_all_message_links(sys.argv[1], sys.argv[2])
    messages = parse_all_messages(links)
    save_messages(messages, sys.argv[3])

Install requests, beautifulsoup4 and jinja2 and run with python. Depending on your inbox size, this may take a couple of minutes. Once the script is done running, open the newly create html file with your favorite browser:

sudo pip install requests beautifulsoup4 jinja2
python your_username your_password output
firefox output.html

This script can be easily tweaked to be used with your favorite service provider.