Stopheim ratings: Statistical check
Hi all! I wanted to statistically compare Stopgamov’s grades with some big data-how they relate? Is there interdependence? Is it possible to do this at all?
I bring to your attention a small study that did not set high goals and objectives, but was done mainly on jokes.
Data preparation or how not to do the monkey work
First you need to assemble a base of games with which we will further have fun.
Information from the page with the distribution of all reviews for grades must be brought into a boring table view. Total 73 pages, on average of 30 games on each – this is 2 190 games. Suppose two minutes to reprint the names manually and rest – this is 73 hours!
Something else is needed. At the end of each title with a review, it is written “Review”, which prompted the idea – you can open the page code, copy it to the exel (it looks terrible, but I can’t do anything else), and filter the lines where this is the same “Review”. No sooner said than done. Some pathetic 40 minutes later the base of Stopgamov’s assessments appeared.
Now you need to find what to compare grades with. In Reddte, a post was found in which someone posted a huge file with the Game Rankings rating database before closing (the site itself is now a strike on metacritic). The file dates from December 2019., So a year and a half of new games will have to be missed.
As a result, 1,287 games released in 2010–2019 were gained.
Visualization of the received data. 1. Distribution of points for assessment of SG. Dependence is visible, and the median (read: average) score grows for each assessment. 2. Distribution of pieces of games in points and estimates. Critics are clearly supportive, most of the assessments between 70-80 points. 3. Dry statistics. 4. Just a screen of how the base looks like.
Visualization of the received data. 1. Distribution of points for assessment of SG. Dependence is visible, and the median (read: average) score grows for each assessment. 2. Distribution of pieces of games in points and estimates. Critics are clearly supportive, most of the assessments between 70-80 points. 3. Dry statistics. 4. Just a screen of how the base looks like.
Visualization of the received data. 1. Distribution of points for assessment of SG. Dependence is visible, and the median (read: average) score grows for each assessment. 2. Distribution of pieces of games in points and estimates. Critics are clearly supportive, most of the assessments between 70-80 points. 3. Dry statistics. 4. Just a screen of how the base looks like.
Visualization of the received data. 1. Distribution of points for assessment of SG. Dependence is visible, and the median (read: average) score grows for each assessment. 2. Distribution of pieces of games in points and estimates. Critics are clearly supportive, most of the assessments between 70-80 points. 3. Dry statistics. 4. Just a screen of how the base looks like.
Modeling (not 3D, but statistical)
For a scientifically based approach to the analysis of relationships, we use multinomy logistics regression.
Regression analysis examines the statistical connection between one dependent variable and (one or more) independent and shows the presence or absence of communication, its strength, then allows you to make the forecast of one of the variables, knowing the others. For example, having studied the relationship of the age of the player and the clock spent behind the compacter per day, you can estimate the number of hours for any age in general (of course, with a bunch of nuances that we will not) about).
The most common regression is linear, which explores the relationship between numerical variables. In our case, the dependence between the score of the game and the assessments of the SG, which are non -layered and can only be four types are investigated. Therefore, linear regression is not suitable for us, we need logistics, which takes into account non -layer variables.
We load our table into a statup and for some kind of 28 lines of code, we get a statistically significant regression that conducted the analysis of the loaded base and revealed interdependence between the estimates of the SG and the games of the game. For clarity, I deduced the model probability of a particular assessment of the SG depending on the critics score.
The probability of each SG assessment for each critics score. For example, the blue sector is an assessment of “garbage”. With the growth of the game point, the probability of “garbage” is reduced, because the quality of the game is growing. And the game with 72 points is most likely (the probability of 63%) will receive “commendable”.
Conclusions
There is a positive relationship between the assessment of SG and the score of the game – the better the assessment, the higher the average score (who would doubt).
In general, critics of most games assign points between 70 and 80. In this range, the probability of “commendable” is 66%, “amazing” – 19%, “Propriknika” – 12%. That’s what we have so many “commendable” games.
And, it seems to me, the most interesting. If we assume that “garbage” and “passage” are generally “bad” games, and the rest are “good”, then the probability of any unknown game is “bad”-51% (respectively, “good”-49%). Almost perfect balance of Stopheim objectivity! But then Stopgame favors the developers – “bad” will receive “garbage” with a probability of 34%, and “good” will receive “amazingly” with 41%.
The most important thing is that the assessments of Stopheim are in their own mass are good and the opinion of critics is consistent with them, almost scientifically proven 🙂
This is perhaps all and all. Thank you for being with us, I hope someone was interested.
The study used Excel, Rstudio, Power Bi, Chrome and Black Tea.
The best comments
The first seems like a comment
It is more difficult with good unknown – you need to look at why they do not fall into the review.
So because unknown)
And you can just ask Dotterian in a couple of minutes to collect any statistics directly from the database. : D
Total 73 pages, on average of 30 games on each – this is 2 190 games. Suppose two minutes to reprint the names manually and rest – this is 73 hours!
I have this part in the article raised the most questions. I, and so, Syak, and such a way tried to imagine it, but did not understand how to write off the names of the games and their assessments for an hour.
Two minutes – this is one game? O_O
(Right now I will “voice into the air” to voice plans to solve the problem of transferring information to the table)
Firstly, if there is a skill of speed (or a friend with it), you can cooperate with the other and: he reads aloud to you the name of the game and an assessment, you write down. In this case, well, by a couple of minutes to go to the page. Moreover, if the names of the games, obviously, need to be recorded completely, then the estimates can easily be reduced to “raisins”, “phv”, “zh”, “mus” (but there is such a saving in presses, of course). Ultimatically – “garbage” change to “1”, “raisins” to “4”.
Secondly, if there is finalcountdowncasino.uk/ no skill of skill, as well as knowledge of English at the proper level (to avoid typos when introducing names, because you will have to compare them further)-you can simply copy the names of the games. And then reduce grades again.
But even if you completely write the names and assessments, I do not believe that each game will take as many as two minutes to fill out two columns “Game – Evaluation”.
PS: And having spent a little more time, one could still record the authors and conduct statistics on them, and not just according to estimates! (But I do not call for action, if you understand, I understand that even in speedranner regimes, filling out such a base will easily devour a couple of hours)
Scientifically proven – the fifth rating is not needed, thanks, we are already perfectly objective!
In general, it would be interesting to look not only at the medians, but also at some kind of peak points (the most highly appreciated games that received the “garbage” that received the “Proproding” and vice versa, games with the lowest marks that received “lazy” and “raisins”).
Plus, perhaps some small analysis and opinion why it could turn out x)
On the histogram of such games in the range of 55-65 points, there are not so many points-the top of the array to the right after 70 points. Is it worth entering a new rating for a small number of games? Now they can be conditionally divided into “bad” and rather “good”, with a minimum of uncertainty. And if you add new grades, then when to stop in time?)
But the observation is very interesting
What kind of game I got at 50 points and at the same time raisins?)
It would be interesting to look at the whole sample, in Google tables for example.
P.S. For the work done, my respect, I like to poke in numbers)
It should also be borne in mind that the reviews do not go to all games, but only good or hype. So good unknown and trash pass by, which no one needs. It should also somehow influence?
It’s easier to ask for status for me: D
You can also use the version for printing – it usually gives a fairly simple result for parsing well, or the old good analysis of the site by scripts – when the data must be assembled 🙂
Well, if in a nutshell – let’s say we have a base for 1,000 schoolchildren, where we have infa about age and hours behind the computer per day. We build a schedule where we put off the points where the X is the age along the X axis, along the Y axis – watch. And we see that the greater the age, the more hours (that is, our points are extended by such a snake). Cm. An example in a picture, according to randomly generated data.
A dashed line is an equation of linear regression, that is, an equation that best adjusted a linear dependence on the basis of a snake of points. Y is a clock, x is age. So we get that y = 1,8003x – 14.926, that is, having set the age instead of x, you can get an estate of the number of hours.
And then the nuances went:
– regression shows simply that there is a connection between x and y. This age affects the clock? Or maybe we live in a parallel universe, and the longer you sat at the computer, the younger you (the clock affect age). The causal relationship of regression cannot prove
– Suddenly, the number of hours at the computer in reality does not depend on the age, but on the class in which the student studies? Then you need to build a new regression, and look what happens. From the regression itself, this cannot be understood
– It all depends on the data that are originally loaded into the model. On an example with a watch, if you substitute the age of 30 years, it will come out that such a person will spend 39 hours a day at a computer. Obviously nonsense)) So such a model can be used for ages, for example, up to 18, and for others to build a new model (maybe not linear)
Therefore, real regression is accompanied by a bunch of tests, inspections, hypotheses, probabilities, significance analyzes, confidential intervals and other tinsel, proving that it is this regression that is good and everything is taken into account
It’s easier to write a program that itself will run around the site and pull out all the necessary information from there, in any convenient format, then you can calmly add an assessment of users, genre, developer and try to build correlations from these data x)
Handmade now, in the age of technology, which can greatly simplify it, if you do not completely remove the need for it, this is somehow not very x)
After IDDQD you become a rinat (and the world flies with a blue screen, for Rinat is the same and indivisible)
In fact, if you look at the schedule, you can understand that you just have problems with games in the range of 55-65 points, which are approximately approximately close to 50%, both “Proproding” and “commendable”, despite the fact that the author of the “Proproding” attributes to a bad assessment, and “commendable” to good. So maybe just some kind of “praise” x would not interfere with)
Thanks) the link to the game is attached
I’ll finish the sample, think about where to lay out
Mercy, indeed, turns out, the probabilities obtained are applicable to games that have passed a certain “selection” for observation. It would be possible to see what affects the (not) hit the game in the review (the same ratings, platforms, something else), but this is a straight line. So far, it seems that if you add a trash (which is most likely a low rating), it will become more precise. It is more difficult with good unknown – you need to look at why they do not fall into the review.
The Great Ace Attorney Chronicles: Review
Chicary: A Colorful Tale: Review
Astalon: Tears of the Earth: Review
The Legend of Heroes: Trails of Cold Steel 4: Review
Monster Hunter Rise: Review
Kaze and the Wild Masks: Review
Atelier Ryza 2: Lost Legends & The Secret Fairy: Review
Super Mario 3D World + Bowser’s Fury: Review
Skul: The Hero Slayer: Review
03 February 25 +22
Cyber Shadow: Review
07 January 18 +15
Unto The End: Review
December 22, 2020. 10 +5
The Last Show of Mr. Chardish: Review
November 23, 2020. 9
Tetris Effect: Connected: Review
November 19, 2020. 7
Crown Trick: Review
November 05, 2020. 12
Disc Room: Review
October 22, 2020. 6 +1
Genshin Impact: Review
October 12, 2020. 122
05 October 2020. 43 +1
Star Renegades: Review
September 30, 2020. 8
September 28, 2020. 169 +3
13 Sentinels: AEGIS RIM: Review
September 27, 2020. 9
Spelunky 2: Review
September 19, 2020. 12 +11
Eternal Hope: Review
September 10, 2020. 7 +4
September 10, 2020. 16
Then you just can handle it, you can rather make a fair way. Either to throw some script that automates this process, after which you will have a tasty-piece list of games with a certain assessment, in which you definitely will not allow any typos. In any case, 73 – you bent wildly)
And what kind of “garbage” is it 74 points already? This is already a very strong scatter.