Ratings: Weighting is harming Prog Archives |
Post Reply | Page <1 3456> |
Author | |||||||
Dean
Special Collaborator Retired Admin and Amateur Layabout Joined: May 13 2007 Location: Europe Status: Offline Points: 37575 |
Posted: January 06 2009 at 04:24 | ||||||
Low is something below an average - I have no I care what the average is, low is not 900 votes, average is probably 100 or so. I don't accept that weighting giving a 9% difference on 43 votes is "a large skew" - what you don't know (and cannot tell) is how skewed the results are without weighting.
I don't know about you, but when people give any album a rating that is below the average for that album I don't automatically see sabotage, but someone who simply didn't like it, so I'd like to know what they didn't like about it.
You've already said it is possible to automatically monitor voting patterns for sabotage - I've asked for details on how this can be done - on 21,000 members - considering that a lot of "sabotage" is done using multiple accounts with proxy IP addresses or dynamically allocated IP address - it is difficult enough keeping track of people who set up multiple accounts with fixed IP addresses. Beardfish was a poor example - look at Pendragon. I know that Pure has been sabotaged and I'm fairly confident that Sleeping In Traffic has not - please examine the ratings for these two albums and tell me where the sabotage is. I can assure you that simple analysis of voting-trends will not find or reveal it.
I really don't get the "in-crowd" and the "parochial and disingenuous" jibes. But I guess I'm on the inside looking out.
Because this is a multinational site where we insist the reviews are written in English - ratings-only allows non-English speakers the opportunity to share in the rating of their favourite Prog albums. It would be parochial (though not disingenuous) for us exclude these voters.
Unfortunately that opens up the site for abuse by people who want to hype their favourites, bash they're pet-hates and attempt to manipulate the Top-XX charts. We have seen this enough times to know it happens on a regular basis, and not just for popular or contentious albums.
Regretably that penalises honest rater-onlys such as yourself.
Of course the weighting system does not prevent people who can write a mere 100 words on a particular release from abusing the system, but it is more difficult to do that consistently and not get caught-out.
Edited by Dean - January 06 2009 at 04:28 |
|||||||
What?
|
|||||||
Finnforest
Special Collaborator Honorary Collaborator Joined: February 03 2007 Location: The Heartland Status: Offline Points: 16913 |
Posted: January 06 2009 at 05:16 | ||||||
No Mark, there is no insult here. You just don't like the fact that not everyone buys your theory that PA is going to crash and burn if we don't follow your advice. To the contrary, the site is doing quite well and the reasons for Max's set-up are solid. But don't play the victim today--i didn't "insult" you in this post. The injustice as you see it is a perception issue, an opinion. Not a fact. Pointing that out after 5 pages of your argument does not merit the "black eye" emoticon. You've been treated well here by all despite my defensiveness over the work of our Collabs. I've seen no one truly attack you, I wonder if that would be the case if you waltzed into PE or similar prog site and proclaimed their ratings useless. Thanks. Edited by Finnforest - January 06 2009 at 06:32 |
|||||||
Windhawk
Special Collaborator Honorary Collaborator Joined: December 28 2006 Location: Norway Status: Offline Points: 11401 |
Posted: January 06 2009 at 05:18 | ||||||
Interesting. When even IMDB has gone over to using weighted ratings. I would assume they have their reasons for that - and the crown argument of the thread starter appears to be somewhat busted here now.
A continud discussion as to how much or not a weighting should be might be appropriate - but if the admins calculations are correct here and the difference is in the 10-15% range at max; what's the problem? As far as I know, when people are looking around to buy music they will look it up in a number of places; and read several reviews as well before deciding - at least when shopping on the net. Most will seek out samples too these days. As ratings go, they show an indication of popularity in terms of broadness of appeal and the general appeal amongst the scope of those who have it. And so far in life I don't think I've ever encountered people buying an album based on ratings alone... |
|||||||
Websites I work with:
http://www.progressor.net http://www.houseofprog.com My profile on Mixcloud: https://www.mixcloud.com/haukevind/ |
|||||||
Mr ProgFreak
Forum Senior Member Joined: November 08 2008 Location: Sweden Status: Offline Points: 5195 |
Posted: January 06 2009 at 06:17 | ||||||
^ I think that in the case of IMDB they also use the reviews to identify raters who can be trusted. They also have that feature of "rating reviews". Of course that can be used to to compute a "trust level" for reviewers - together with other factors, like for example whether people are consistently submitting trustworthy ratings over an extended period of time. Most of the manipulative votes come in "bursts".
|
|||||||
Dean
Special Collaborator Retired Admin and Amateur Layabout Joined: May 13 2007 Location: Europe Status: Offline Points: 37575 |
Posted: January 06 2009 at 06:41 | ||||||
^ IMDb also only use ratings from regular reviewers when computing their Top-100 ... and they give no indication of what constitutes a "regular reviewer".
|
|||||||
What?
|
|||||||
Mr ProgFreak
Forum Senior Member Joined: November 08 2008 Location: Sweden Status: Offline Points: 5195 |
Posted: January 06 2009 at 06:49 | ||||||
^ yes, I remember reading about that. Apparently your ratings become more important if you submit reviews over an extended period of time. That makes a lot of sense to me, and maybe I will implement something like that at PF some day. However, I would make it more transparent, and I also think that I would limit the range of weights to the factor of 2 or maybe 3.
|
|||||||
Angelo
Special Collaborator Honorary Collaborator / Retired Admin Joined: May 07 2006 Location: Italy Status: Offline Points: 13244 |
Posted: January 06 2009 at 11:33 | ||||||
I'm with stupid, err, I mean Bob - this has entered the yes no stage, so I'm off to warmer places (it's -9 C here now - only people like Peter enjoy a cold beer at those temperatures)
|
|||||||
ISKC Rock Radio
I stopped blogging and reviewing - so won't be handling requests. Promo's for ariplay can be sent to [email protected] |
|||||||
Uncle Spooky
Forum Groupie Joined: July 31 2007 Location: UK Status: Offline Points: 59 |
Posted: January 09 2009 at 04:45 | ||||||
Just to clear up confusion here, IMDb's "weighting" here refers to active vote stuffing/lazy voting filters and the usual statistical methods for weighting individual entries across larger samples, not assigning weight to individuals. Mark Edited by Uncle Spooky - January 09 2009 at 04:53 |
|||||||
Uncle Spooky
Forum Groupie Joined: July 31 2007 Location: UK Status: Offline Points: 59 |
Posted: January 09 2009 at 04:47 | ||||||
This simply means that voters have to pass a certain threshold of number of votes cast before they are included in the Top charts. Again, no weighting is applied to those included in the top charts. Cheers, Mark |
|||||||
Dean
Special Collaborator Retired Admin and Amateur Layabout Joined: May 13 2007 Location: Europe Status: Offline Points: 37575 |
Posted: January 09 2009 at 05:55 | ||||||
There is no confusion - neither site uses a simple arithmetic average of all votes cast. IMDb has the luxury of large sample sizes so statistical weighting has a reasonable level of confidence. Unfortunately we do not have large sample sizes so statistical analysis would be so inaccurate as to be meaningless. If we applied IMDb methods then most albums would be have zero ratings and many people who submitted ratings-only would be excluded completely. The system isn't perfect, but we do try to include everybody's opinion.
However, both sites do use the same Bayesian algorithm when computing the Top 100.
|
|||||||
What?
|
|||||||
debrewguy
Special Collaborator Honorary Collaborator Joined: April 30 2007 Location: Canada Status: Offline Points: 3596 |
Posted: January 09 2009 at 21:04 | ||||||
But does this mean that some albums reviewed a hundred times or so are not as good or bad as they're rated ?
And if so, how do we move another hundred people to review the same album to see if the previous hundred reviewers got it all wrong ? And having done that, would we get still another hundred people to review the reviews and the albums and vote on which set of reviewers is kinda right ? Heck, let's save time, me & T rate the RIO/Avant-Garde; Rocktopus takes care of the prog metal, Sean Trane does the Neo, Mandrakeroot does Raga rock, and admin strip all Symph albums of their ratings so we can start all over, then we get Baldfriede to handle the crossover, with Raff eliminating the eclectic & jazz fusion genres until the Electronic prog lovers notice that Kraft has split from Werk. Then, after our 11th beer, me & VB admit that the site is really a put on by the staff of Kerrang. |
|||||||
"Here I am talking to some of the smartest people in the world and I didn't even notice,” Lieutenant Columbo, episode The Bye-Bye Sky-High I.Q. Murder Case.
|
|||||||
Mr ProgFreak
Forum Senior Member Joined: November 08 2008 Location: Sweden Status: Offline Points: 5195 |
Posted: January 10 2009 at 01:58 | ||||||
^
Of course you have a point - we shouldn't take this all too serious. However, when a website implements a system which gives different weights to the votes depending on the users's status ... in that case I think it's important for the website to try to be transparent about the algorithm. Especially when people submit their rating and the new album average does not change in the expected way, there should be some way for them to find out how it works. Which reminds me that I should add/update those explanations at PF too ... |
|||||||
Atavachron
Special Collaborator Honorary Collaborator Joined: September 30 2006 Location: Pearland Status: Offline Points: 65258 |
Posted: January 10 2009 at 02:15 | ||||||
ahh, the Bayesian algorithm, use it every day
|
|||||||
Mr ProgFreak
Forum Senior Member Joined: November 08 2008 Location: Sweden Status: Offline Points: 5195 |
Posted: January 10 2009 at 02:59 | ||||||
^ actually I'm wondering who brought that up ... I'm pretty sure that PA doesn't use Bayesian filters. You could not apply them to ratings ... only reviews, but PA is monitoring them manually.
|
|||||||
Dean
Special Collaborator Retired Admin and Amateur Layabout Joined: May 13 2007 Location: Europe Status: Offline Points: 37575 |
Posted: January 10 2009 at 05:23 | ||||||
Not Bayesian Filters, you were the only person to mention Filters.
Bayesian Weighting is not filtering:
br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)
|
|||||||
What?
|
|||||||
Mr ProgFreak
Forum Senior Member Joined: November 08 2008 Location: Sweden Status: Offline Points: 5195 |
Posted: January 10 2009 at 05:31 | ||||||
^ and now you introduced "Bayesian Weighting" ...
Actually "Weighted Mean" or "Weighted Average" means something different - it means applying weights to all the ratings. Maybe M@x should remove the link on the charts page to http://en.wikipedia.org/wiki/Weighted_average#Example. The thing you're describing ... I've never heard it being referred to as "Bayesian", but I guess you're right. The principle is explained here: http://en.wikipedia.org/wiki/Bayesian_average, so that's the link which should be used on the charts page. |
|||||||
Dean
Special Collaborator Retired Admin and Amateur Layabout Joined: May 13 2007 Location: Europe Status: Offline Points: 37575 |
Posted: January 10 2009 at 06:25 | ||||||
^ True on all counts - I originally used the term Bayesian Algorithm, which can apply to either filtering or weighting, however since I said it was used to calculate the Top 100 it infers weighting. Yes, the weighted averages link should be removed - it applied to the previous algorithm used to calculate individual album averages and is no longer used. Bayesian weighting is only used to calculate chart position and not the displayed average value, which is why CTTE has a lower average than WYWH but has a higher chart position. Of course any statistical probablity based system is doomed to failure on the small sample populations we have here. Analysis of an album with only 6 votes is meaningless, even the a straight arithmetic mean is pointless - if 3 people love it and 3 people hate it that does not make the album "average", quite the reverse in fact. No amount of weighting will give a meaningful number because there isn't one. Even for albums with 900 votes the average tells you nothing because it does not take into account your personal taste or predilection.
The best computer to analyse a set of ratings is still the human brain, the numbers are just numbers.
|
|||||||
What?
|
|||||||
Mr ProgFreak
Forum Senior Member Joined: November 08 2008 Location: Sweden Status: Offline Points: 5195 |
Posted: January 10 2009 at 06:35 | ||||||
It would be interesting for the users to see the bayesian average along with the arithmetic mean, but from my own website I can say that it's a bit difficult to implement. However I'll try to do that.
Well, I think that the numbers are quite useful. Of course they don't represent the "true" rating of the album ... there is no such thing. As far as I'm concerned, ratings are useful because they enable the system to provide suggestions - even if only two people rate something highly, I might want to check it out. BTW: I already thought of what you're describing in the highlighted section. At PF I'm calculating the standard deviation for each album, and here you can see the album with the highest values. For large numbers of ratings with about equal "haters" and "lovers", it might even make sense to tweak the resulting average in some way. At PF I'm doing that by also considering the median value in the resulting average. |
|||||||
Dean
Special Collaborator Retired Admin and Amateur Layabout Joined: May 13 2007 Location: Europe Status: Offline Points: 37575 |
Posted: January 10 2009 at 07:15 | ||||||
Doubly so if those two people have similar tastes to you.
The question then is which way to tweak the average. Do you tweak it in favour of the "lovers" or "haters"? Common sense says towards the "lovers" ... (a low rating by a "hater" is in effect a high rating ) ... but the problem there is what if the low ratings were from people who love the genre/artist but hate the album.
Standard deviation does give more information - we could flood the page with numbers, but that is a distraction which would open us to even more criticism by people who would not appreicate what the numbers mean. We do plot the distributions on each album page - people should be using that graph to draw their own conclusions rather than concentrating on the individual scores (sorry they don't display properly here, but in essence the 3.74 rating is better explained by the 44% of people who gave the album 4-stars):
3.74
(97 ratings) Essential: a masterpiece of progressive music (22%) |
|||||||
What?
|
|||||||
Desoc
Forum Senior Member Joined: December 12 2006 Location: Oslo, Norway Status: Offline Points: 216 |
Posted: April 10 2009 at 09:20 | ||||||
Well, I realize that this thread has been inactive for some weeks now, but I feel the need to make the question reappear, partly because I feel the debate was largely inconclusive.
I didn't join the crowd the last time around, but I must admit that the debate puzzled me. I have the deepest respect for most of the collaborators and the time and effort they put into this site. But this thread was a curious showcase.
Regarding the debate
To my eyes, the debate consisted mainly of non-collaborators (in particular, but not limited to one single person) that was questioning a particular (and very visible and impactable) feature of the site, against a massive load of collaborators who (with a couple of exceptions) went right down in the trenches to defend their privileges. I don't think these privileges are the reason they are active, so the total reaction was peculiar, and it certainly stopped me from engaging in the debate.
Well, this is not meant to be a rant against collabs, whose efforts - as I said - I admire. But this thread leaves the impression that there is a certain defiance against the common people here, which is an impression that gains noone, regardless of its accuracy. (Something similar can be found in this thread: http://www.progarchives.com/forum/forum_posts.asp?TID=55758 and this: http://www.progarchives.com/forum/forum_posts.asp?TID=55741&PN=2) I think Mark had valid points, and I was surprised at how he was met. Take it as friendly advice.
Reviews vs ratings
I believe that collabs in general write better reviews than non-collabs. Thus, I think the exhibition of their reviews should reflect this. For my part, the frontpage feed could consist of collab reviews only. And collabs should be rewarded manyfold for their efforts in various ways.
But ratings are an entirely other issue. Being a good reviewer doesn't mean that your opinion is more qualified. And what is the point of the rating system? First and foremost it is to show the standing of an album amongst the community at large. As such, the current system must be said to be misleading.
Possible changes?
When I say that the debate was largely inconclusive, I refer to the fact that most of the defendants where people who "gained" on the current system, and those few who raised voices were (with a couple of exceptions) not. But there were a few concrete proposals that hardly anyone commented.
I'm curious to know if displaying different averages is something that could be considered, or alternatively why not. Even if the current system remains the "standard", would anything be lost if people could additionally chose basic average, non-weighted ratings, collab-only ratings etc?
And what about having a filter on the chart pages that removes the weighting? Or is that technically impossible?
A specific example
I'd like to close with an example of how the current system works. Let me attract your attention to this album: http://www.progarchives.com/album-reviews.asp?id=12217
This album had a rating of 4,67 or something and was on the top 100 chart if minimum # of ratings was lowered. Then one collab gave it a 1-star rating, without review, and it dropped like a rock. First of all, this shows that it's dead wrong that it's the quality of reviews that matters. Secondly, it shows that this system doesn't primarily reward collabs - it primarily punishes the other 17 raters, many of them with well-crafted reviews. And this would have been the case also if the collab in question wrote a review alongside.
So in conclusion, if the site owners feel that weighting by mixing reviews and ratings is important for giving incentives to writing reviews, then I will hold my peace. But then my advice as a regular user would be to at least treat all reviews equal, and rather give other kinds of bonuses to collabs. At any rate, the current weighting is - sorry - ridiculously biased.
|
|||||||
Post Reply | Page <1 3456> |
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |