Ratings: Weighting is harming Prog Archives - Progressive Rock Music Forum

Author

Message

Topic Search

Topic Options

Dean

Special Collaborator

Retired Admin and Amateur Layabout

Joined: May 13 2007
Location: Europe
Status: Offline
Points: 37575

Posted: January 06 2009 at 04:24

Uncle Spooky wrote:

Dean wrote:

Uncle Spooky wrote:

Weighting skews the results towards the Collaborators votes because that is what it is designed to do

Yes, that is apparent.

glad we sorted that out then

Uncle Spooky wrote:

[quote]
and they are only effective on albums with a low total number of ratings to prevent nefarious people hyping-up an album - the weighting on albums with hundreds of ratings is less relevant.

Define "low". As I showed with the Beardfish album, with 43 votes, there's a large skew caused by weighting towards collabs. It's even worse with albums with low votes that contain collab reviews.

Low is something below an average - I have no I care what the average is, low is not 900 votes, average is probably 100 or so. I don't accept that weighting giving a 9% difference on 43 votes is "a large skew" - what you don't know (and cannot tell) is how skewed the results are without weighting.

I don't know about you, but when people give any album a rating that is below the average for that album I don't automatically see sabotage, but someone who simply didn't like it, so I'd like to know what they didn't like about it.

Uncle Spooky wrote:

And as I've already said it's possible to automatically monitor voting patterns for sabotage and flag up suspect entries. Sabotage voters can be transparently blocked, or percentages of votes can be dropped for all entries to account for sabotage and so on. It doesn't take 24/7 effort once implemented.

You've already said it is possible to automatically monitor voting patterns for sabotage - I've asked for details on how this can be done - on 21,000 members - considering that a lot of "sabotage" is done using multiple accounts with proxy IP addresses or dynamically allocated IP address - it is difficult enough keeping track of people who set up multiple accounts with fixed IP addresses. Beardfish was a poor example - look at Pendragon. I know that Pure has been sabotaged and I'm fairly confident that Sleeping In Traffic has not - please examine the ratings for these two albums and tell me where the sabotage is. I can assure you that simple analysis of voting-trends will not find or reveal it.

Uncle Spooky wrote:

Skewing for the in-crowd is parochial and disingenuous.

I really don't get the "in-crowd" and the "parochial and disingenuous" jibes. But I guess I'm on the inside looking out.

Uncle Spooky wrote:

And if non-collab ratings are distrusted so much, why allow them? It seems to me you should only allow ratings with reviews, that way you get to decide who is worthy?

Mark

Because this is a multinational site where we insist the reviews are written in English - ratings-only allows non-English speakers the opportunity to share in the rating of their favourite Prog albums. It would be parochial (though not disingenuous) for us exclude these voters.

Unfortunately that opens up the site for abuse by people who want to hype their favourites, bash they're pet-hates and attempt to manipulate the Top-XX charts. We have seen this enough times to know it happens on a regular basis, and not just for popular or contentious albums.

Regretably that penalises honest rater-onlys such as yourself.

Of course the weighting system does not prevent people who can write a mere 100 words on a particular release from abusing the system, but it is more difficult to do that consistently and not get caught-out.

Edited by Dean - January 06 2009 at 04:28

What?

Finnforest View Drop Down

Special Collaborator

Honorary Collaborator

Joined: February 03 2007
Location: The Heartland
Status: Offline
Points: 16913

Posted: January 06 2009 at 05:16

Uncle Spooky wrote:

Finnforest wrote:

Mark, forgive the "whiner" comments, apologies for that.

No problem.

It's just that we go through some of these same issues over and over again. Folks who have contributed little or nothing to the site come by and tell Collabs, who have worked unbelievably hard over many long hours, that what we are doing is wrong and worth little.

Again, I understand. I've done my time on the front line and appreciate how stressful it can be and how easy it is to feel under appreciated, etc but the truth is if the site wasn't so important I wouldn't be here putting my case.

Ya know, that is what the real "insult" here is, my friend. Give the site a little credit please for the good it does rather than harping about your perceived injustices.

Heh, and now you're being insulting again Ouch

That anybody cares enough to stand up in front of you guys and risk the flak should be taken as a compliment...

Look, I am being totally reasonable in my arguments. I've not presented anything in a whiny, aggressive, lame manner. Just reasoned observation. I appreciate that this has possibly been discussed before, but that shouldn't stop people speaking their mind in a reasonable way when something is perceived as wrong. USA invades IRAQ? It's happening as planned, why should anybody stand up and speak out against it? Women don't get to vote? Whatever? etc. OK, these are slightly different extremes, but the principle remains.

If a reasonable answer is given then people shut up and go away, but so far the only honest answer I've seen is that the system is designed to give collaborators weighting. The inference being that non-collabs are at best mistrusted, at worst seen as saboteurs with hidden agendas.

I find this unacceptable and have presented an argument for why.

Discussion is nothing to be feared unless the answers are painful to give...

Mark

No Mark, there is no insult here. You just don't like the fact that not everyone buys your theory that PA is going to crash and burn if we don't follow your advice. To the contrary, the site is doing quite well and the reasons for Max's set-up are solid. But don't play the victim today--i didn't "insult" you in this post. The injustice as you see it is a perception issue, an opinion. Not a fact. Pointing that out after 5 pages of your argument does not merit the "black eye" emoticon. You've been treated well here by all despite my defensiveness over the work of our Collabs. I've seen no one truly attack you, I wonder if that would be the case if you waltzed into PE or similar prog site and proclaimed their ratings useless.
Thanks.

Edited by Finnforest - January 06 2009 at 06:32

Windhawk

Special Collaborator

Honorary Collaborator

Joined: December 28 2006
Location: Norway
Status: Offline
Points: 11401

Posted: January 06 2009 at 05:18

Interesting. When even IMDB has gone over to using weighted ratings. I would assume they have their reasons for that - and the crown argument of the thread starter appears to be somewhat busted here now.

A continud discussion as to how much or not a weighting should be might be appropriate - but if the admins calculations are correct here and the difference is in the 10-15% range at max; what's the problem?

As far as I know, when people are looking around to buy music they will look it up in a number of places; and read several reviews as well before deciding - at least when shopping on the net. Most will seek out samples too these days.

As ratings go, they show an indication of popularity in terms of broadness of appeal and the general appeal amongst the scope of those who have it. And so far in life I don't think I've ever encountered people buying an album based on ratings alone...

Websites I work with:

http://www.progressor.net
http://www.houseofprog.com

My profile on Mixcloud:
https://www.mixcloud.com/haukevind/

Mr ProgFreak View Drop Down

Forum Senior Member

Joined: November 08 2008
Location: Sweden
Status: Offline
Points: 5195

Posted: January 06 2009 at 06:17

^ I think that in the case of IMDB they also use the reviews to identify raters who can be trusted. They also have that feature of "rating reviews". Of course that can be used to to compute a "trust level" for reviewers - together with other factors, like for example whether people are consistently submitting trustworthy ratings over an extended period of time. Most of the manipulative votes come in "bursts".

Dean

Special Collaborator

Retired Admin and Amateur Layabout

Joined: May 13 2007
Location: Europe
Status: Offline
Points: 37575

Posted: January 06 2009 at 06:41

^ IMDb also only use ratings from regular reviewers when computing their Top-100 ... and they give no indication of what constitutes a "regular reviewer".

What?

Mr ProgFreak View Drop Down

Forum Senior Member

Joined: November 08 2008
Location: Sweden
Status: Offline
Points: 5195

Posted: January 06 2009 at 06:49

^ yes, I remember reading about that. Apparently your ratings become more important if you submit reviews over an extended period of time. That makes a lot of sense to me, and maybe I will implement something like that at PF some day. However, I would make it more transparent, and I also think that I would limit the range of weights to the factor of 2 or maybe 3.

Angelo

Special Collaborator

Honorary Collaborator / Retired Admin

Joined: May 07 2006
Location: Italy
Status: Offline
Points: 13244

Posted: January 06 2009 at 11:33

I'm with stupid, err, I mean Bob - this has entered the yes no stage, so I'm off to warmer places (it's -9 C here now - only people like Peter enjoy a cold beer at those temperatures)

ISKC Rock Radio
I stopped blogging and reviewing - so won't be handling requests. Promo's for ariplay can be sent to [email protected]

Uncle Spooky View Drop Down

Forum Groupie

Joined: July 31 2007
Location: UK
Status: Offline
Points: 59

Posted: January 09 2009 at 04:45

Windhawk wrote:

Interesting. When even IMDB has gone over to using weighted ratings. I would assume they have their reasons for that - and the crown argument of the thread starter appears to be somewhat busted here now.

Just to clear up confusion here, IMDb's "weighting" here refers to active vote stuffing/lazy voting filters and the usual statistical methods for weighting individual entries across larger samples, not assigning weight to individuals.

Mark

Edited by Uncle Spooky - January 09 2009 at 04:53

http://last.fm/user/Mark_H

Uncle Spooky View Drop Down

Forum Groupie

Joined: July 31 2007
Location: UK
Status: Offline
Points: 59

Posted: January 09 2009 at 04:47

Dean wrote:

^ IMDb also only use ratings from regular reviewers when computing their Top-100 ... and they give no indication of what constitutes a "regular reviewer".

This simply means that voters have to pass a certain threshold of number of votes cast before they are included in the Top charts. Again, no weighting is applied to those included in the top charts.

Cheers,

Mark

http://last.fm/user/Mark_H

Dean

Special Collaborator

Retired Admin and Amateur Layabout

Joined: May 13 2007
Location: Europe
Status: Offline
Points: 37575

Posted: January 09 2009 at 05:55

Uncle Spooky wrote:

Dean wrote:

^ IMDb also only use ratings from regular reviewers when computing their Top-100 ... and they give no indication of what constitutes a "regular reviewer".

There is no confusion - neither site uses a simple arithmetic average of all votes cast. IMDb has the luxury of large sample sizes so statistical weighting has a reasonable level of confidence. Unfortunately we do not have large sample sizes so statistical analysis would be so inaccurate as to be meaningless. If we applied IMDb methods then most albums would be have zero ratings and many people who submitted ratings-only would be excluded completely. The system isn't perfect, but we do try to include everybody's opinion.

However, both sites do use the same Bayesian algorithm when computing the Top 100.

What?

debrewguy

Special Collaborator

Honorary Collaborator

Joined: April 30 2007
Location: Canada
Status: Offline
Points: 3596

Posted: January 09 2009 at 21:04

But does this mean that some albums reviewed a hundred times or so are not as good or bad as they're rated ? Confused

And if so, how do we move another hundred people to review the same album to see if the previous hundred reviewers got it all wrong ?
And having done that, would we get still another hundred people to review the reviews and the albums and vote on which set of reviewers is kinda right ?
Heck, let's save time, me & T rate the RIO/Avant-Garde; Rocktopus takes care of the prog metal, Sean Trane does the Neo, Mandrakeroot does Raga rock, and admin strip all Symph albums of their ratings so we can start all over, then we get Baldfriede to handle the crossover, with Raff eliminating the eclectic & jazz fusion genres until the Electronic prog lovers notice that Kraft has split from Werk.
Then, after our 11th beer, me & VB admit that the site is really a put on by the staff of Kerrang.

"Here I am talking to some of the smartest people in the world and I didn't even notice,” Lieutenant Columbo, episode The Bye-Bye Sky-High I.Q. Murder Case.

Mr ProgFreak View Drop Down

Forum Senior Member

Joined: November 08 2008
Location: Sweden
Status: Offline
Points: 5195

Posted: January 10 2009 at 01:58

Of course you have a point - we shouldn't take this all too serious. However, when a website implements a system which gives different weights to the votes depending on the users's status ... in that case I think it's important for the website to try to be transparent about the algorithm. Especially when people submit their rating and the new album average does not change in the expected way, there should be some way for them to find out how it works.

Which reminds me that I should add/update those explanations at PF too ... Embarrassed

Atavachron View Drop Down

Special Collaborator

Honorary Collaborator

Joined: September 30 2006
Location: Pearland
Status: Offline
Points: 65258

Posted: January 10 2009 at 02:15

ahh, the Bayesian algorithm, use it every day

Mr ProgFreak View Drop Down

Forum Senior Member

Joined: November 08 2008
Location: Sweden
Status: Offline
Points: 5195

Posted: January 10 2009 at 02:59

^ actually I'm wondering who brought that up ... I'm pretty sure that PA doesn't use Bayesian filters. You could not apply them to ratings ... only reviews, but PA is monitoring them manually.

Dean

Special Collaborator

Retired Admin and Amateur Layabout

Joined: May 13 2007
Location: Europe
Status: Offline
Points: 37575

Posted: January 10 2009 at 05:23

Mr ProgFreak wrote:

^ actually I'm wondering who brought that up ... I'm pretty sure that PA doesn't use Bayesian filters. You could not apply them to ratings ... only reviews, but PA is monitoring them manually.

Not Bayesian Filters, you were the only person to mention Filters.

Bayesian Weighting is not filtering:

br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)

What?

Mr ProgFreak View Drop Down

Forum Senior Member

Joined: November 08 2008
Location: Sweden
Status: Offline
Points: 5195

Posted: January 10 2009 at 05:31

^ and now you introduced "Bayesian Weighting" ... Wink

Actually "Weighted Mean" or "Weighted Average" means something different - it means applying weights to all the ratings. Maybe M@x should remove the link on the charts page to http://en.wikipedia.org/wiki/Weighted_average#Example.

The thing you're describing ... I've never heard it being referred to as "Bayesian", but I guess you're right. The principle is explained here: http://en.wikipedia.org/wiki/Bayesian_average, so that's the link which should be used on the charts page. Smile

Dean

Special Collaborator

Retired Admin and Amateur Layabout

Joined: May 13 2007
Location: Europe
Status: Offline
Points: 37575

Posted: January 10 2009 at 06:25

^ True on all counts - I originally used the term Bayesian Algorithm, which can apply to either filtering or weighting, however since I said it was used to calculate the Top 100 it infers weighting.

Yes, the weighted averages link should be removed - it applied to the previous algorithm used to calculate individual album averages and is no longer used. Bayesian weighting is only used to calculate chart position and not the displayed average value, which is why CTTE has a lower average than WYWH but has a higher chart position.

Of course any statistical probablity based system is doomed to failure on the small sample populations we have here. Analysis of an album with only 6 votes is meaningless, even the a straight arithmetic mean is pointless - if 3 people love it and 3 people hate it that does not make the album "average", quite the reverse in fact. No amount of weighting will give a meaningful number because there isn't one. Even for albums with 900 votes the average tells you nothing because it does not take into account your personal taste or predilection.

The best computer to analyse a set of ratings is still the human brain, the numbers are just numbers.

What?

Mr ProgFreak View Drop Down

Forum Senior Member

Joined: November 08 2008
Location: Sweden
Status: Offline
Points: 5195

Posted: January 10 2009 at 06:35

Dean wrote:

^ True on all counts - I originally used the term Bayesian Algorithm, which can apply to either filtering or weighting, however since I said it was used to calculate the Top 100 it infers weighting.

It would be interesting for the users to see the bayesian average along with the arithmetic mean, but from my own website I can say that it's a bit difficult to implement. However I'll try to do that.

Dean wrote:

Of course any statistical probablity based system is doomed to failure on the small sample populations we have here. Analysis of an album with only 6 votes is meaningless, even the a straight arithmetic mean is pointless - if 3 people love it and 3 people hate it that does not make the album "average", quite the reverse in fact. No amount of weighting will give a meaningful number because there isn't one. Even for albums with 900 votes the average tells you nothing because it does not take into account your personal taste or predilection.

The best computer to analyse a set of ratings is still the human brain, the numbers are just numbers.

Well, I think that the numbers are quite useful. Of course they don't represent the "true" rating of the album ... there is no such thing. As far as I'm concerned, ratings are useful because they enable the system to provide suggestions - even if only two people rate something highly, I might want to check it out.

BTW: I already thought of what you're describing in the highlighted section. At PF I'm calculating the standard deviation for each album, and here you can see the album with the highest values. For large numbers of ratings with about equal "haters" and "lovers", it might even make sense to tweak the resulting average in some way. At PF I'm doing that by also considering the median value in the resulting average.

Dean

Special Collaborator

Retired Admin and Amateur Layabout

Joined: May 13 2007
Location: Europe
Status: Offline
Points: 37575

Posted: January 10 2009 at 07:15

Mr ProgFreak wrote:

Dean wrote:

if 3 people love it and 3 people hate it that does not make the album "average", quite the reverse in fact.

Doubly so if those two people have similar tastes to you.

Mr ProgFreak wrote:

BTW: I already thought of what you're describing in the highlighted section. At PF I'm calculating the standard deviation for each album, and here you can see the album with the highest values. For large numbers of ratings with about equal "haters" and "lovers", it might even make sense to tweak the resulting average in some way. At PF I'm doing that by also considering the median value in the resulting average.

The question then is which way to tweak the average. Do you tweak it in favour of the "lovers" or "haters"? Common sense says towards the "lovers" ... (a low rating by a "hater" is in effect a high rating

) ... but the problem there is what if the low ratings were from people who love the genre/artist but hate the album.

Standard deviation does give more information - we could flood the page with numbers, but that is a distraction which would open us to even more criticism by people who would not appreicate what the numbers mean. We do plot the distributions on each album page - people should be using that graph to draw their own conclusions rather than concentrating on the individual scores (sorry they don't display properly here, but in essence the 3.74 rating is better explained by the 44% of people who gave the album 4-stars):

3.74
(97 ratings)

Essential: a masterpiece of progressive music (22%)

Excellent addition to any prog music collection (44%)

Good, but non-essential (21%)

Collectors/fans only (10%)

Poor. Only for completionists (3%)

What?

Desoc

Forum Senior Member

Joined: December 12 2006
Location: Oslo, Norway
Status: Offline
Points: 216

Posted: April 10 2009 at 09:20

Well, I realize that this thread has been inactive for some weeks now, but I feel the need to make the question reappear, partly because I feel the debate was largely inconclusive.

I didn't join the crowd the last time around, but I must admit that the debate puzzled me. I have the deepest respect for most of the collaborators and the time and effort they put into this site. But this thread was a curious showcase.

Regarding the debate

To my eyes, the debate consisted mainly of non-collaborators (in particular, but not limited to one single person) that was questioning a particular (and very visible and impactable) feature of the site, against a massive load of collaborators who (with a couple of exceptions) went right down in the trenches to defend their privileges. I don't think these privileges are the reason they are active, so the total reaction was peculiar, and it certainly stopped me from engaging in the debate.

Well, this is not meant to be a rant against collabs, whose efforts - as I said - I admire. But this thread leaves the impression that there is a certain defiance against the common people here, which is an impression that gains noone, regardless of its accuracy. (Something similar can be found in this thread: http://www.progarchives.com/forum/forum_posts.asp?TID=55758 and this: http://www.progarchives.com/forum/forum_posts.asp?TID=55741&PN=2) I think Mark had valid points, and I was surprised at how he was met. Take it as friendly advice.

Reviews vs ratings

I believe that collabs in general write better reviews than non-collabs. Thus, I think the exhibition of their reviews should reflect this. For my part, the frontpage feed could consist of collab reviews only. And collabs should be rewarded manyfold for their efforts in various ways.

But ratings are an entirely other issue. Being a good reviewer doesn't mean that your opinion is more qualified. And what is the point of the rating system? First and foremost it is to show the standing of an album amongst the community at large. As such, the current system must be said to be misleading.

Possible changes?

When I say that the debate was largely inconclusive, I refer to the fact that most of the defendants where people who "gained" on the current system, and those few who raised voices were (with a couple of exceptions) not. But there were a few concrete proposals that hardly anyone commented.

I'm curious to know if displaying different averages is something that could be considered, or alternatively why not. Even if the current system remains the "standard", would anything be lost if people could additionally chose basic average, non-weighted ratings, collab-only ratings etc?

And what about having a filter on the chart pages that removes the weighting? Or is that technically impossible?

A specific example

I'd like to close with an example of how the current system works. Let me attract your attention to this album: http://www.progarchives.com/album-reviews.asp?id=12217

This album had a rating of 4,67 or something and was on the top 100 chart if minimum # of ratings was lowered. Then one collab gave it a 1-star rating, without review, and it dropped like a rock. First of all, this shows that it's dead wrong that it's the quality of reviews that matters. Secondly, it shows that this system doesn't primarily reward collabs - it primarily punishes the other 17 raters, many of them with well-crafted reviews. And this would have been the case also if the collab in question wrote a review alongside.

So in conclusion, if the site owners feel that weighting by mixing reviews and ratings is important for giving incentives to writing reviews, then I will hold my peace. But then my advice as a regular user would be to at least treat all reviews equal, and rather give other kinds of bonuses to collabs. At any rate, the current weighting is - sorry - ridiculously biased.