Posts Tagged ‘application’

Calculating User Reputation in Your App Using Bayesian

November 3rd, 2008 by ScottK | No Comments | Filed in Programming

Almost everyone that has had the opportunity to meet you in some form or fashion: in person; read your work; watched you on TV, has made an opinion about you; given you their reputation score. Likewise I’m sure you’ve determined the reputation of everyone you’ve meet as well. With reputation you can determine who the trusting people you know, who do you have to dis-trust, etc.

Even as you read this post and along with others of mine you can even subjectively give me a reputation of a good writer, or not. Someone knowledgeable, or not.

We know and live with the importance of reputation scoring; so how can we apply it to our programming to determine the bad members vs. the good members of our applications? It’s quite easy really, once you break out the Bayesian calculations and at least one positive/negative attribute. So let’s look at how simple the calculation in a Python funtion is; certainly can be used the same in any language.

def probability(badvotes, goodvotes, user_weight=0.5): #Written in Python BTW
    proportion = float(badvotes) / (float(goodvotes) + float(badvotes))
    S = 0.5
    n = goodvotes + badvotes
    return (float(S * user_weight) + (n * proportion)) / (S + n)

The probability function takes the positive/negative scored plus an optional weight adjustment to produce a result that is between ~0.0 and ~1.0; 0.5 being the default neutrality.

prob = float(badvotes) / (float(goodvotes) + float(badvotes))

This line gives us the proportion of bad votes to good votes.

S = 0.5

S in the Bayesian odds calculation weight. In our example the odds are that the writer is neither good or bad at writing. If we set S=0.75 the odds are that the writer is a bad writer, likewise S=0.25 the writer is a good writer.

n = goodvotes + badvotes

Easy enough, get the total votes.

return (float(S * user_weight) + (n * proportion)) / (S + n)

Here’s the heart of the calculation. Return the:
a.). Total odds of offset times the weighting we assigned,
plus
b.) The number of votes made times the proportion of bad votes to good votes
divided by
c.) The odds plus the number of votes

All done and in four lines of code. So let’s see this in action!

You run a content delivery site and are looking for new authors to write for you. You want writers with good articles that deliver on time and make good comments on your blog. Stephen has been recommended by many users of your site. You noticed Peter because he’s written the most articles for you. Here’s the stats for both to use in our calculations.

Peter:
Articles submitted: 100
Good votes: 75
bad votes: 15
Articles on time: 20
Articles late: 80
Comments voted as good: 10
Comments voted as bad: 20

Stephen:
Articles submitted: 200
Good votes: 110
Bad votes: 30
Articles on time: 190
Articles late: 10
Comments voted as good: 50
Comments voted as bad: 70

So here are the probabilities:
Peter:
Being a good writer: 0.1685 (probability(15,75)) < .5 A good writer.
Writing articles on time: 0.7985 (probability(80,20)) > .5 not a good reputation for delivering on time.
Being a good member: 0.6639 (probability(20,10)) > .5 not so good as being a site member
Total overall reputation: 0.5406 (0.1685 + 0.7895 + 0.6639) / 3

Stephen:
Being a good writer: 0.2153 (probability(30,110)) < .5 A good writer.
Writing articles on time: 0.0511 (probability(10,190)) < .5 Actually a really good reputation score.
Being a good member: 0.5829 (probability(70,50)) > .5 not so good as being a site member
Total overall reputation: 0.2830 (0.2153 + 0.0511 + 0.5829) / 3

From these three stats you may be able to make a choice of which writer. Even if you average the probabilities for a total overall reputation, Peter: 0.5406 and Stephen: 0.2830, you would think that Stephen is the best choice since 0.2830 being closer to zero is awesome, and Peter’s score of 0.5406 is slightly above the bad member mark of 0.5. What we really want is a good writer that writes good articles on time. Only site interaction is really just secondary.

So now let’s look at the primary reputation and weight the secondary calculations using the user_weight argument. The two primary considerations will feed the third secondary calculation. Just like a lot of our assessments of other people take other factors into consideration.

So here are the probabilities:
Peter:
Being a good writer: 0.1685 (probability(15,75)) < .5 A good writer.
Writing articles on time: 0.7985 (probability(80,20)) > .5 not a good reputation for delivering on time.
Being a good member: 0.6636 (probability(20,10, ((0.1685+0.7985) / 2))) > .5 not so good as being a site member
Total overall reputation: 0.5435 (0.1685 + 0.7985 + 0.6636) / 3

Stephen:
Being a good writer: 0.2153 (probability(30,110)) < .5 A good writer.
Writing articles on time: 0.0511 (probability(10,190)) < .5 Actually a really good reputation score.
Being a good member: 0.5814 (probability(70,50, ((0.2153 + 0.0511) / 2))) > .5 not so good as being a site member.
Total overall reputation: 0.2826 (0.2153 + 0.0511 + 0.5814) / 3

Remember ~0.0 = Best and ~1.0 = worst. The differences are small but you can see that by finding all your primary considerations for reputation you can apply them to any secondary considerations for a different total outcome. Peter started with a total overall reputation of 0.5406, but with weighting the secondary attribute of comment interactivity lost some reputation, 0.5435. Stephen on the other hand was started as a great writer reputation, 0.2830, and when we didn’t look at the comment interactive as importantly gained further reputation, 0.2826.

So there you have it, building a reputation system is relatively easy. All you need is four lines of code and at least one attribute you can have negative counts and positive counts. Identifing the primary attributes and averaging these to sub-attributes can then be feed to n+ sub-attributes, so on and so forth. The choice of design is completely yours to dtect who your best and worst members are.

Tags: , ,

CherryPy, The Setup

September 27th, 2008 by ScottK | No Comments | Filed in CherryPy

When the requirement was given to create a web application that was merely more than an API service CherryPy was chosen. CherryPy is an HTTP Framework without all the bells and whistles of database libraries, and templating libraries. As a web service provider it’s turned out to be very fast because of the lack of the other systems.

However what is the use of a web service that doesn’t have a connected database and some minor templating views? Since Ruby on Rails is the predominate framework in our shop and only a few of use work with Python I wanted to use CherryPy in the style of Ruby On Rails so that others transitioning from RoR would be familiar.

While I’m certainly not an evangelist for RESTFul resources I do find that it has it’s place and certainly helps me contemplate what is going to happen when a request is given for certain call methods, (POST, GET, PUT) etc. The CherryPy request dispatcher uses the routes library, which fully supports RESTFul controllers.

Another consideration is how to set up your environments; development, qa, production, etc. This was kind of tricky as a few others I talked to used the PyYAML library to inject global variables throughout their libraries. I found that this doesn’t have to be the case. Why bloat your application with another library with knowing when to import your controllers will allow you to have all that you need within standard CherryPy config.

When setting up your CherryPy application in a manner of the MVC framework you need to have the folder structure. I’ve included a downloadable zip that includes the folders neccessary for this post and subsequent posts. You’ll clearly see that it arranges every system rationally for easy look up and organisation. The key however is in the deploy.py file as it is the start and stop script for your application.

Feel free to download the CherryPy Example 1 file and get a feel for where I am going with the subsequent posts. Subsequent posts are going to cover everything from A-Z on setting up a RESTFul CherryPy application that you can daemonize for a really fast API server that is light on the templating side.

Tags: , , , , ,