There has been a lot of recent debate regarding how to improve quality control on HackerNews (HN), and to his credit, Paul Graham (pg) has tried a lot of tactics. There is a very clear set of HN guidelines, which very few members these days probably read. For a while, pg tried playing around with the karma formula and, even if I disagree about the way karma should be measured, at least he gave it an effort. He also hid comment karma from everyone but the author, to help slow the demonstrable deterioration of the discussion section; apparently this has been successful in pg's observations. Nevertheless, I do believe that we are seeing a continuing trend downward in overall article quality on the front page1.
In this post, I present a honeypot approach to combating group-think and quality deterioration in article selection on social news sites.
A honeypot is an article that is link-bait or otherwise in direct violation of the site's guidelines, but is intentionally submitted by an admin as a test to see if users inappropriately upvote it2. For each user, three scores are tracked: the number of honeypots seen, the number of honeypots upvoted, and the number of honeypots flagged. A user's seen count increments when they load a page with a honeypot article displayed3. If a user upvotes one of the honeypots, their upvoted score is incremented; if the user flags the honeypot, their flagged score is increment. We then take the ratio of the difference of the flagged and upvoted scores, and the seen score to get a honeypot ratio, h:
is the target user
is the number of honeypots the user flagged
is the number of honeypots the user upvoted
is the total number of honeypots seen by the user
This produces a ratio in the range [-1,1]. We may want to punish people who excessively flag everything, otherwise they'll always have a max h score4:
is the total number articles a user has flagged
The range of this h-ratio is [-2,1). However, because is very likely going to be much larger than for every user, we would expect that no user will ever receive a score near 1 in practice. If the h-ratio of a user is less than an admin-specified threshold, we flag the user as detrimental to the overall quality of the site and their upvotes would either be discounted or ignored entirely.
Since it's not always feasible to expect admins to find or label honeypots, it would be nice to have some way to crowdsource honeypots implicitly. To do this, we take the top N users, ranked by their explicit honeypot ratio, and label them as "super flaggers". For each article, we track the percentage of super flags they've received and if this percentage is above an admin-specified threshold, we automatically label that article as a honeypot:
function getSuperFlaggers(honeypots, users, superFlaggersCount): # Order users by their honeypot ratio, in descending order ranked_users = users.sort(u => u.honeypotRatio) return ranked_users.subset(0, superFlaggersCount) function isHoneypot(article, threshold, superFlaggers): count = 0 seen = 0 foreach sf in supperFlaggers: seen += 1 if seen(sf, article) count += 1 if flagged(sf, article) percent = count / seen return percent >= threshold
It would make sense to check this value every time an article hits a certain number of upvotes. One would also likely want to ensure that seen has reached a high enough value to be significant before checking. If an article is labeled as a honeypot, then all users who have already seen the article should have their values retroactively updated.
The above post presented two reasonable approaches to improving the quality of the front page on social news sites like HackerNews. Social news sites have long been believed to suffer from the deterioration over time, with recent evidence supporting that belief. While anecdotal evidence suggests hiding comment karma has helped improve discussion quality, article selection quality has remained largely unchanged5. The honeypots approach I described above may help ebb the flow of upvotes to link-bait and inappropriate articles by enabling the community to moderate itself implicitly.
If you liked this article, please vote it up on HackerNews. Assuming you think it meets the quality guidelines.
- For instance, see this front-page article from 10/23/11. ↩
- In practice, it may be more reasonable for admins to label user submissions that they see hit the front page and are judged as violating the guidelines. ↩
- If multiple honeypots are displayed, the count is incremented by the number of honeypots seen for the first time by the user. ↩
- Technically, this is only true if they flag everything and upvote nothing. ↩
- The exception here may be voting ring and other clique detection algorithms. However, such algorithms are designed to prevent manipulation of the system rather than enforcing quality guidelines. ↩
- Artificial Intelligence
- Social News
- March 2013
- October 2011
- August 2011
- May 2011
- April 2011
- February 2011
- October 2010
- September 2010
- July 2010
- March 2010