Three Statistical Tests Every Game Developer Should Know

6 років тому

In this 2016 GDC session, Insomniac Games' Elan Ruskin gives a how-to on statistics for answering questions like "does this new camera control scheme make players happier?", "how many players do I need to test this design change on to prove whether it works better?" and "does the framerate really get faster when I do this thing or is it just a fluke of measurement?
Register for GDC: ubm.io/2gk5KTU
Join the GDC mailing list: www.gdconf.com/subscribe
Follow GDC on Twitter: / official_gdc
GDC talks cover a range of developmental topics including game design, programming, audio, visual arts, business management, production, online games, and much more. We post a fresh GDC video every day. Subscribe to the channel to stay on top of regular updates, and check out GDC Vault for thousands of more in-depth talks from our archives.

КОМЕНТАРІ: 225

@PrimerBlobs 2 роки тому

"Any actual statisticians are totally cringing." Yep. It's not just pedantry. People will literally not know what their test means, and then they will judge whatever change they make in hindsight anyway.

@aleksaa24 6 місяців тому

funny seeing you here, love your vids

@Attewir 4 місяці тому

Easier to digest and more accurate statistics content on PrimerBlobs's channel And the currently 1.7 million subscribers agree

@ToriTheChicken 6 років тому

Some of the GDC talks are very badly presented for UKposts videoes. Not this one. This was great, in just about every way.

@iestynne 2 роки тому

I'm proud to have worked with Elan for several years. As you can tell, he always puts a great deal of effort into preparing for his presentations. Amazingly though, this is actually his normal level of conversational speed, clarity and humor :)

@snarfymcsnarfface2323 2 роки тому

I thought he was just nervous or trying to fit in in a small time lol

@0netwoguy54 2 роки тому

Wait what do you mean "normal"? Does he have a turbo mode???

@Dekharen 2 роки тому

@@0netwoguy54 GAS GAS GAS

@_lime. 2 роки тому

13:00, this is a really good one. With Minecraft, Mojang came to a realization that very few players had ever been to the Nether (based on the percent of the population that had the achievement "We need to go deeper!" which is received upon entering the Nether). The ended up realizing that very few non-hardcore players (players that didn't consume game related content outside of the game, like videos, guides, articles, etc...) knew that the Nether existed. This is why the added obsidian monoliths and broken portals around the Overworld to give you hints.

@Sarmachus 2 роки тому

Where did they say this? I’m having a hard time finding it.

@_lime. 2 роки тому

@@Sarmachus Sorry I can't remember exactly. I saw it in a game development video a year or so ago, and I believe he flashed a tweet from one of the MC devs on the screen. Regardless of it's authenticity it still serves as a good example and valuable lesson.

@Sarmachus 2 роки тому

@@_lime. Thanks for clarifying

@ReeseEifler 6 років тому

Not only is this an amazingly useful talk, it's essentially a perfect presentation. Dope shit.

@WhiteThunder121 2 роки тому

@CruzZ fake news

@dontfk 2 роки тому

@CruzZ what are you talking about. This guy provided a ton of real world examples where statistics could help solve a problem. That doesn’t mean people will always use statistics for good though, he even mentions that in the presentation with an example. Just because big gaming companies suck at stats doesn’t mean his presentation wasn’t phenomenal!

@ailurusfulgens1849 2 роки тому

@@dontfk Big gaming companies most definitely do not suck at stats, if anything, that's the one thing they master above all else. It's just that most statistics are not relevant to the players enjoyement. They are very relevant to shareholders tho.

@dontfk 2 роки тому

@@ailurusfulgens1849 You're right, I used poor word choice there. What I meant by that was that they don't always use their stats for good intentions

@infcaat 5 років тому

wow, he is a fantastic speaker. charismatic, to-the-point, funny and practical.

@kittykittylicization 2 роки тому

As a Biologist (MS)... i was indeed shouting at my screen when you were talking about P values....and then you called it out so im happy now.

@NunSuperior 6 років тому

Thanks for the talk. I had to learn this stuff on the job at Big Software, Inc. when we started measuring PC boot time impact. There were large variances between each boot.

@colinstreck710 2 роки тому

That was fantastic. Their presentation skills are off the charts.

@Discipol 6 років тому

Excelent presentation. THIS was simplified? I am afraid of the scenic route xD I wish to know more, and more practical applications on game dev.

@2Cerealbox 6 років тому

Extremely simplified. Statistics is, like, a whole field of mathematics.

@stephenborntrager6542 6 років тому

It matters a lot for procedural generation, as statistical distribution is a huge part of random number generation. It can also be used to approximate various things... replacing physics in some cases. Sometimes called an "analytical" solution, you can see this show up on some games oceans, etc. The ocean is based on statistical analysis of real oceans, instead of trying to actually simulate fluid dynamics. I'm sure there are more uses than that, especially outside the game.

@LoudSodaCaleb 5 років тому

Yeah, that thing he did was called hypothesis testing. That took me a good 20 hours during a single week to figure out how to do it by hand at school. Finding out that it could be done in a minute in excel blew my mind.

@joshelguapo5563 2 роки тому

As a data scientist... it's a LOT. But really, you don't need all the math to do it practically. You really just need to know the basic definitions, and what the test does. And there you go you got analysis. If you're a game dev, assuming you got some programming experience, you can already do a lot of these things in the language R, with very little effort, and even very easily build some machine learning models.

@jonaza2105 2 роки тому

I essentially got most of this stuff during my semester of statistics class. As he said, he pretty much blazes through it, you mostly need time to understand when what is used, why to use it, what the downsides of using it are, etc and lastly of course, HOW to use it.

@KillerBearsaw 2 роки тому

Absolutely fantastic presentation, would love to hear him speak more

@maximeflageole770 5 років тому

More interesting and useful presentation about statistics I've ever watched.

@ZZaarraakkii 2 роки тому

@12:53 A thing to note is that in that example, people have been playing the "hard" puzzle before and the "easy" puzzle is a novelty, which may cause players to spend more time on it for the experiment, without it being the better solution long term.

@PR-cj8pd 9 місяців тому

Eiði

@ArsenicDrone 2 роки тому

While I wouldn't do everything identically, I didn't have any large complaints, which is not generally what happens listening to quick statistics intros. A good talk.

@Vospi 6 років тому

As an educator and a grateful listener: that was bril-li-ant.

@zikarisg9025 2 роки тому

Excellent, used this to explain the p-Value to some colleagues, since our data science team is not able to explain their models that well...

@jonasnockert 4 роки тому

Love this talk! I spent quite some time trying to derive the 8.14 confidence interval in the first example and finally had to install Excel to verify. I couldn't see it at first but the slides actually mix five and six observations. At ~7:38 there are five observations. At 8:19, the confidence interval is calculated using six observations, i.e. T_DIFFMEANS(A2:A7, ...) rather than the 2 x 5 observations shown on the left.

@RglMrn 7 місяців тому

Incredible talk. Thank you so much!

@lookatnow5730 6 років тому

Wonderful talk

@Joeofiowa 6 років тому

Absolutely brilliant.

@robelbelay4065 2 роки тому

Great talk and amazing delivery :)

@phillipA123 4 роки тому

a semester of stats in 30min. thanks guy.

@hamsandwich780 4 роки тому

One of the best explanations of the T test I have ever seen, read, or perceived in any medium.

@JohnDoe-mx1sq 2 роки тому

This video has existed for almost 4 years and it feels like not a single game dev has ever watched it. Their sales division has warehouses of supercomputers simulating human brain functions trying to figure out how crap a game can be before you will buy it, and just how much you will spend on DLC just to play the game at all.

@TomiTapio 4 роки тому

Worth a listen.

@lan1ord Рік тому

The first talk where I needed to decrease the playback speed instead of increasing. Great material! =)

@elizaknight6980 6 років тому

This is enjoyable, thanks :)

@Fmlad 2 роки тому

Incredible talk

@Brindlebrother 2 роки тому

People are awful at five-star ratings whether that be a game, book, movie, show, item, etc. Basically, people will give 4-5 if the product was at all fun or engaging, or a 1 if there was a problem/complaint/issue or any offense taken. Good video. Statistics are fun.

@SakuraWulf 2 роки тому

Chik-fil-a is not a five-star establishment, people >_>

@KrossX 6 років тому

Happy new year!

@jonwatte4293 5 років тому

"p values" aren't just complicated; they're a root cause of reproduction problems in studies with small sample sizes, and a general frequentist foible. Bayesians of the world, unite! (Interestingly, the "pick sub-samples" illustrations could lead to an IMO much better solution!)

@hamm8934 2 роки тому

Bayesians can play around with their Bayes factors all they like, but at the base, they’re still operating under a frequentist model if theyre gunna do any form of null hypothesis testing. Without a criteria to reject the null (p val), you can’t falsify a hypothesis. So collect all the data you want and build up those Bayes factors, but you’re not escaping the problem of induction. :) Frequentists of the world, unite (and not be undermined by a single black swan)!

@jonwatte4293 2 роки тому

@@hamm8934 The belief that you can "reject" the null hypothesis based on a single yes/no measurement IS THE PROBLEM. (Sorry, got a little loud there.) Look at the PDF. Draw conclusions about underlying behaviors. Make better predictions and test again. Do not pretend that "there's a 96% probability in this case" and "there's a 94% probability in this case" are vastly different, binary outcomes.

@hamm8934 2 роки тому

@@jonwatte4293 what statistician or scientist worth their salt believes that a single positive or negative outcome is sufficient? That’s a bit of a straw man. Of course you either (1) directly replicate the result or (2) perform an extension with a different operationalization of the same hypothesis. If it isn’t replicating approximately 95% of the time, it’s quite safe to say the effect isn’t there (assuming adequate power). If it is replicating approximately 95% of the time, it’s quite safe to say the effect is there. The point I (and other frequentists) make is you have to have a criteria of falsification for null hypothesis testing. If you don’t, the very logic of hypothesis testing collapses as you are no longer able to discern a success from a failure. You have to make a judgement call for null hypothesis testing to exist. This whole notion that Bayesian stats somehow avoids or overcomes this judgement call is a complete failure to acknowledge that you are still making a judgement call, just with a different threshold. (See chp 1 and 2 of The Logic of the Scientific Discovery). Get those Bayes factors as juicy as you want. It just takes 1 falsification for them to be undone. We’ll see which method is more fruitful :)

@neur0leptic782 2 роки тому

@@hamm8934 bruh I feel like you're still being incredibly disingenuous about this whole thing. The key issue with NHST is that a p-value *only* tells you p(Data | H0 = TRUE)-that's it, full stop. The far more interesting question is p(H | D), and that's entirely beyond the realm of classical frequentist methods. 'Rejecting the null' with p < .05 doesn't mean that there's a 95% chance the null is indeed false, or that the alternative is actually true. What we should be doing is systematically pitting models against each other, and this, I think, is something Bayesian methods are exquisitely well-suited for. And sure, there are some rules of thumb when you're doing Bayesian model comparison and trying to figure out how 'meaningful' the difference between models is, but it's a laughably false equivalence to say that the process of multi-model inference (literally comparing the evidence in favor of competing models) is anything close to a binary NHST decision based on differences in means or a correlation. Not to mention you can compare models based not only on the parameters you include, but on your priors, or the underlying likelihood function... Shit, you don't even need to use Bayes Factors-it's super trivial to compare models via their posterior predictive densities using Bayesian cross-validation with PSIS-LOO. All of this ranting is basically just to say that 'all models are wrong, but some are useful'-and I think if we really want to find the best models that explain (or even better, can *generate*) our data, you're gunna have a bad time with frequentist NHST.

@PrimerBlobs 2 роки тому

@@neur0leptic782 Preach

@summonsays2610 2 роки тому

God, statistics is why I can't ever tell anyone I am sure of something. "Hey does this code work like X?" "Well, I was there during requirements gathering, I wrote the code, deployed it, and no one has changed it since. So I think so!" "Yes or no?" .... uhhhhhh

@CineGoodog 2 роки тому

I took an entire statistics course on college and I can remember almost everything he said

@Adaministrator 3 роки тому

excellent talk

@kaloqnchyyy 2 роки тому

the best presentation I have ever seen

@inguanara 6 років тому

that was awesome

@perfectloveweddings 5 років тому

You talk exactly like Jesse Eisenberg from the Social Network when he's coding. It's fantastic.

@FreekHoekstra 6 років тому

at 19:00 mins, do you care about the median? I think thats a rather brazen assumption! sometimes its better to have some people who are really invested and really care, and thus are willing to spend on your product, rather then a lot of people that will play for free but don't care enough to spend money, or come back repeatedly. Great talk overall though!!

@AngleSideSideThm 4 роки тому

This depends on assumptions; the assumption here probably is "I am optimizing my game for ability for at least most of the initial group to make it through".

@Aidiakapi 4 роки тому

That wasn't actually the point he was trying to make. Especially with a small sample size, outliers greatly skew the mean. As for the point of a few dedicated players willing to spend money, that only works if it's a game that does not depends on having an active online community.

@stuartconrod8364 2 роки тому

I know I'm really late to finding your comment, but I thought the same thing! Also, Mark Rosewater (of Magic: The Gathering) has a presentation on UKposts about Game Design and in HIS opinion, that highly polarized distribution is better. It's better to make something that SOME people love even if some other people hate it, instead of something that everyone gives a 'meh' to. In game design, I think it's the difference between "cult classic that some people love and play forever" and "totally forgettable game that disappears in two weeks". If at least some group loves it, it can spread by word-of-mouth and certain reviews. Provided your budget was appropriate to build a niche game, you can have a success... while some game that everyone merely tolerates probably makes no impact and loses money.

@FreekHoekstra 2 роки тому

@@stuartconrod8364 exactly :) Better to be hated by 90% ignored by 5% and loved by 5% Then hated by 20% ignored by 80% and loved by none. Who is going to spend money on a product they don’t love when they have so many alternatives. Plus all those haters are free press too! I think we should lean into the fans more, look at dark souls, its brutal unforgiving and very niche, but clearly doing fine. League of legends, unforgiving, brutal player interactions, but doing fantastically well. Counter strike, same thing. Yes i do think we should keep games accessible, but Not at the cost of what the fans love. I think for example what halo infinite is doing is great, bringing back bots to practice offline before going into the fray. Allows the multiplayer to be as cutthroat and great as it always was, not with unlockable weapons thate give you an edge at the start of the round, No everyone starts with the same weapons, and you need to earn and fight over better ones, so its a true skill matchup. Thats why ists so unforgiving to new players, but also why its so incredibly good.

@ferinzz 2 роки тому

@@FreekHoekstra it really REALLY depends on how you make money off your game. If it's some recurring revenue, then you need to retain a decent number of players. If it's a game which has interaction between users, then you need a decent player pool. If it's a one-time purchase, you can keep it mediocre across the board. If it's for e-sport publicity, you better make that as balanced as possible. Make the goals easy to understand and controls simple enough to get players pouring in. Overall, no matter the game, a larger pool of players will bring more potential spenders, and of those players only 20% of them will be providing your entire income. Money keeps a business going. So making a game for only 2 people is a ridiculous endeavor unless each piece of content is a guaranteed buy and they cannot continue into the new 'season' without making their purchases... Though if only two people are playing they'll need to be spending hundreds of thousands each time you release content. in a free to play game competition drives purchases. You need some fodder for the big spenders to show off their purchases/power to, or they have no reason to buy the newest released item/cosmetic the day it comes out.

@LoudSodaCaleb 5 років тому

His style reminds me of the professor that made me fall in love with stats.

@jarrakul 2 роки тому

Very good talk, even I'm kind of screaming at the use of p-values as "the chance that Fred is right." But you clearly know that, and are simplifying because p-values are confusing and don't actually measure quite what we use them to measure. Which is a good reason to switch to subjectivist statistics, but you can hardly explain how to responsibly use priors in a 30-minute talk.

@franksonjohnson 4 роки тому

Watched the Spiderman talk then this one. Just, damn, passion. Awesome.

@buttonasas Рік тому

Hours played for different versions being radicalised is pretty normal and there are often very good reasons for that because games have lots of humps or steep curves or brick walls. There might be something _terribly_ wrong in the tutorial that makes x% of people just not get past that. And, honestly, I prefer 20% of players go "this is amazing" and the other "bad game" than everyone saying it was "just ok".

@aakk100011 4 роки тому

20:12 When you say Fred being right is 3%, but we are using a two-tailed test. I think the conclusion should be Orange version is different than the old version, it's either better or worse.

@ArneBab 2 роки тому

Actually your boss wants to know how large the probability is of being wrong: that you pay more than you save. So you want the t-test of the SSDs compared to (HDD minus the time difference needed to pay for the SSDs). You’re not below 0.05 for that with your 4 runs, so your boss cannot not sure enough that she’ll be right. But that’s nitpicking and I really like your video :-)

@raventhorX 6 років тому

this guy is my new idol lol.

@dominicparker6124 2 роки тому

how he answered that first question was amazing, you can see he knows his shit.

@yottawatts9470 2 роки тому

I didn't even watch this but scrolled through a few times and could tell this is an amazing presentation. Will watch later bravo.

@simlife445 2 роки тому

it is.. but its is not its a video on how poor ftp gamers flock because lack of money... and how to get them to spend more.... and about how bad ssd are... in 2016 but are now 40-60% cheaper per gigabyte and much much faster... bravo to skipping the description and basic computer imp in the last 5-6 years....

@yottawatts9470 2 роки тому

@@simlife445 Moron alert. You confirmed that it is indeed a good presentation then went on some personal rant of the content you didn't like? I don't give a damn sheesh.

@alfredoeleazarorozcoquesad2988 2 роки тому

Hi! Great talk thanks!! a QUICK TIP for A/B testing! (I'm economist) You could randomly choose who goes into experimental/control group :) That way you don't have to switch, you just have to apply the procedure to many people once, like this: 1) New player enters 2) You generate a random number (between 0 and 1 can be) 3) is it geater than 0.5? experimental, no? control 2) register their group and their target number :D Even if they play only once (you don't need multiple rounds), you can compare the means between those groups ;) Thanks again for the talk!

@gabrote42 2 роки тому

I always tell people that basic statistics and sourcing should be taught at age 11. Would reduce the number of no-argument-freds and would reduce the fake news plausibility rate

@SomethingEternal 2 роки тому

24:14 I dunno, I love my coffee black and I think that study has a point xD

@xGriffy93 2 роки тому

But Fred didn't hypothesise that SSDs don't make any difference to build times, he was questioning the return on investment the SSDs would bring. Or am I off the mark here?

@zacsnowbank7632 2 роки тому

He needed to prove SSDs had any improvement at all first. After that he had a good idea on how much it improved, and eventually he proved Fred right. It would take too many daily builds for SSDs to be worth it. But before that, he needed to know what the difference even was, and after that he used a simple formula to see how much money it saved. Poor Fred just had some words put in his mouth to make the presentation go a little smoother at the beginning.

@donanderson3653 2 роки тому

To be fair, that wasn't daily builds, it was total builds, since SSDs are a one-time investment. Getting even the lowball estimate of 210 builds out of the lifetime of the SSD is probably easily achievable, so SSDs would be a worthwhile investment.

@nlb137 2 роки тому

He covered that briefly with the discussion of dev time cost and how many builds you'd have to do for the SSD to pay for itself. You have to have a null hypothesis to test, and "X isn't worth it" isn't possible, IIRC. It's been a while, but I think your test *has* to basically 'touch zero'; either x=0, x>0, etc. An "even if does save time, does it save *enough* time" hypothesis requires a test that is basically "is x >= y" (where y is the 'threshold' where SSDs pay for themselves). It's either easier to first prove that there *is* a time difference, then calculate the 'value' of the time difference, or it's not even possible to do it the other way (or at least not with 101 statistics).

@tomasxfranco 2 роки тому

@@donanderson3653 Also, SSDs can speed up OS and App boot times as well as many other tasks, so it's ignoring a lot of the other benefits they give.

@julio1148 22 дні тому

Great intro, but as an artist, I WISH it took a year to be Rembrandt lol Great talk too!

@andrewneedham3281 4 роки тому

It was great, right up to the "always use the 2-tailed value." Tons of circumstances where it's better to use a one-sided t-test.

@davidfoley8546 2 роки тому

In fact, his own first example should have been a one-tailed test.

@richardsejour7731 2 роки тому

Under what circumstances would a 1 tailed test be more useful? The 2 tailed t test is more stringent and the test statistic will tell you the direction of the effect.

@andrewneedham3281 2 роки тому

@@richardsejour7731 A 2 tailed test splits your significance level on both tails, so it's only half as strong as a one tailed test when showing a difference between groups IN A SPECIFIC DIRECTION. Frankly, a 2-tailed test is a sloppy but acceptable way to test, but it really shouldn't be used when you have a specific direction of difference between the groups in mind. A 1 tailed test has more power at the same alpha level. It's basically weakening your hypothesis to hedge your bets by using a 2-tailed test when you should be using one. That's why I don't like this lecture. It's a computer programmer with a SINGLE statistical tool he knows, so everything looks good to apply that tool on. It's like that old adage that if you have only a hammer, everything looks like a nail. If he were a statistician, he'd know better. But he's sitting there spouting off like he does, when in fact he's dead wrong.

@richardsejour7731 2 роки тому

@@andrewneedham3281 I didn't interpret it as such. He was trying to create an antagonistic force between him (supporting the alternate hypothesis) and the...wtf animal that was (supporting the null). He was more interested in testing the efficacy of the ssd which is tested using a 2 tail test. The 2 tailed test is more conservative in general and will give you the direction of the effect which is why it's generally more preferred. One sided tests are rarely used, and are often associated with p hacking because there is rarely a scenario to assume the directionality of effect. Yes the one tailed test gives more statistical power, but that's only if you are certain that you won't see any effect in the opposite tail, which is incredibly rare. His wording was off because he should have never assumed that the ssd can only be better or the same, when the ssd could be worse. However, his approach to use the 2 tailed test was spot on for this type of question.

@andrewneedham3281 2 роки тому

@@richardsejour7731 Sure. I never said that he shouldn't use a 2-tailed test in that situation. I merely said that it's foolish to say "Always use the 2-tailed value." Edit: In science, if you have a hypothesis, your hypothesis generally has directionality to it, or you've written a piss-poor hypothesis. So, frankly, I'm often using 1-tailed tests to show that X is strictly less than/strictly greater than, on some real life data, such as, "Are female babies truly smaller than male babies?" or "Did the biodiversity index for the Upper Nooksack area truly increase due to our conservation measures?" In those cases, as a scientist trying to get published in a peer reviewed paper, I'd get laughed right out of publication for trying to use a 2-tailed test in those or many other situations where I find myself relying on statistical inference. Just saying.

@iwersonsch5131 2 роки тому

23:17 That's 45 two-sided tests so you go look for p values below 0.00056. That gives you a 5% false positive rate overall, but I can tell you that you're almost guaranteed to find a true positive unless the classes are carbon copies of one another

@droidBasher 2 роки тому

That works if you want all 1 vs 1 fights to be mostly fair. Think of something like Street Fighter where you can't change your character mid-match. A rock paper scissors relationship would be fair but then if you are playing rock and the opponent is paper then the match isn't a good test of skill, the game was over at the character select screen. Depending on your context (something like Team Fortress or StarCraft) you might need to instead find the Nash equilibrium to make sure all units have their niche. But looking purely at win rates might mislead you if your player base is not playing optimally. Even if you can trust your win rate statistics, finding the Nash equilibrium is NP complete, meaning that each new character class exponentially increases the complexity of the problem. And there's probably units like the SCV where the kill death ratio is exceedingly bad but you can't win without them because their role is non-combat. Or a unit like the carrier (maybe? I'm not a pro) that isn't resource efficient but is a way to force the game to end if you are already ahead in resources and tech. If that's true and you analyze the carrier per unit, it might look overpowered, if you look at it per resource it might look underpowered, but it still has a niche. I guess that all I'm saying is that it's a hard problem, and game theory might be useful, but could still be difficult to apply if you have a game that is interestingly complex.

@georhodiumgeo9827 4 роки тому

This makes me so happy! Great talk I learned a lot. We had 100 barrels at work that were documented to have 50 kg in each. You could quickly tell none of them were empty and it looked like our written inventory was close. The account (not my boss) told me to measure all of them to see how accurate we were. I measured 8 and calculated the standard deviation. Jokes on you I’m not going to break my back and work my ass off to learn something I already know. I’m sorry if you don’t understand what I’m doing I’ll send you a Wikipedia link after I’m done.

@KHamurdik 4 роки тому

I feel educated

@Weckacore 6 років тому

This is probably very helpful, but just forget everything he said if you're taking a class on stats... EDIT: This does an amazing job of teaching intuition and importance, good talk

@mrichards 2 роки тому

Wasn’t he wrong in choosing two tailed t-test? Since he is testing whether SSDs are faster, not just that SSD load times come from a different population than HDD’s

@davidfoley8546 2 роки тому

Yes.

@ArsenicDrone 2 роки тому

Fair question. His reasoning was pretty sound. He would want the one-tailed t-test if it were a safe assumption that SSDs are always either faster or the same (an assumption about the underlying distribution). Making that assumption (which is a bad assumption) is not the same as being mostly interested in finding out if they are faster (which is valid, but does allow for them being slower). His test concluded that they were different distributions, and he could also see that the difference was to SSDs' benefit.

@mrichards 2 роки тому

@@ArsenicDrone The boss was specifically asking if SSDs were worth it (i.e. sufficiently faster that their mean speeds come from a different, faster, population than HDD mean load speeds). Wouldn't it be a mistake to intentionally test a broader hypothesis than you require just to verify your actual, narrower hypothesis by observation at the end?

@ArsenicDrone 2 роки тому

@@mrichards Ah, one of many not-so-intuitive things about statistics. It really comes down to only making the assumptions that you can justify. What the boss was interested in doesn't determine what's possible to test or what assumptions are valid. Notice that his p-value is half as large for the one-tailed test (the result is even more significant). The test got substantially more powerful, but that power doesn't come for free, it comes by making this unjustified assumption. (It's not justified because before he runs the test, he really doesn't know which outcome will happen, and it could actually be slower.)

@davidfoley8546 2 роки тому

@@ArsenicDrone No, he really is mistaken. Whether or not it is a safe assumption that SSDs are always faster is actually irrelevant. What is relevant is that the hypothesis he's testing is a one-sided hypothesis--that SSDs are faster. If he had measured SSDs to be slower, by any magnitude, the hypothesis would have been rejected.

@stefanomaggio5109 3 роки тому

pls tell me the name of the book where i can find all this shit in detal specifically applied for game cases

@tanagato3721 2 роки тому

Damn, I'm not a game developer. I have never googled this topic. I just wrote down the idea of a some computer game that accidentally came to mind and described the game mechanics in the note app on my android smartphone and youtube immediately recommended this video to me. Coincidence? Now I do not know whether it is good or bad...

@neruba2173 2 роки тому

Ill throw a question out of fashion this days. How many players are having fun with my game, and thus, eager to buy anything at all at my shop.

@MrDavidCollins 2 роки тому

If your game has a shop you've already failed.

@drumer960 2 роки тому

@@MrDavidCollins that's just objectively wrong lots of incredibly good and fun games have shops

@roeyshapiro4878 3 роки тому

Did anyone else look at the picture of Rembrandt that he had up there and think that it looked peculiarly similar to him?

@joshuahaag7644 4 роки тому

nice

@slavskee 3 роки тому

God - like speaker

@lushen952 2 роки тому

Problem with your cupcake mode example. Making the game easier may have a positive impact in the short term and may have a negative impact long term. Short term statistics can only measure short term results.

@jacobb5484 2 роки тому

The test was simply to determine whether difficulty had an effect on time played in either direction greater than the margin of error for the sample size. These are great as backup tests to ensure the results aren't just a fluke without a unreasonably large sample size.

@lushen952 2 роки тому

@@jacobb5484 Doesn't matter. If I'm a tester and only testing the game for 10-15 minutes if its too hard I'm going to report that it's too hard. If the game gets made easier and released and I pick it up and find that 30 mins in it's too easy, I'm going to get bored and quit. I think he oversimplifies the situation.

@jacobb5484 2 роки тому

@@lushen952 Its a simple example of a T test on a paired sample. this isn't for small engaged focus groups with detailed subjective data, but rather big data statistics such as the example of a sub mode being beta tested. The situation in this example the T test gives a percentage chance of either: A. The change had the effect of either increasing OR decreasing what's being measured by a notable amount. B. the data is probably skewed due to bad sampling and falls within the margin of error. once you rule that out, you can make further changes and run detailed tests to actually make an improvement.

@GameTesterBootCamp 11 місяців тому

As a math dummy, this talk make my brain implode.

@Alex-re3qm 2 роки тому

This kinda stuff is what game dev tycoon is missing

@nimm90 2 роки тому

I still have no idea how Fred is not convinced with an upgrade that generates $34 per 100 players of profit.

@motbus3 Рік тому

With moderate power comes moderate responsibility

@jerrygreenest 2 роки тому

12:37 negative less? Wait, that's more!

@AdrianTache 2 роки тому

Statistics are a fun way to compare datasets but unfortunately sample size and methodology usually mean that whatever conclusions you draw might be completely irrelevant. And as he's saying, the more questions you ask, the more likely you are to be completely wrong.

@YT775 2 роки тому

@15:30 "As opposed to 20 to 22", doesnt he mean 21% instead of 22% or am I missing something?

@dezimal9143 2 роки тому

If you have 20% of something... let's say all IBM shares, and you increase your holdings by 5% = now you have 22%. But when you say you have increased it by 5% percentage POINT you went from 20%=>25%.

@YT775 2 роки тому

@@dezimal9143 I bamboozled myself. meant to say 21% sry. How is 5% of 20 = 2 ?

@dezimal9143 2 роки тому

@@YT775 Actually it isn't 2% I didn't check the math xD. And you are right it should be 21 vs 25%.

@YT775 2 роки тому

Thanks, so I guess theres no hidden meaning, it was just a minor error/inaccuracy of the speaker. :)

@laureven 2 роки тому

Gold on UKposts :)

@gabrieldta 3 роки тому

Sony is following the monocle example right now by giving 10USD credit to random accounts. Fo'sure that's Sony's ulterior motive: Measure how much more likely people ate to engage into the store and (if they're lucky) top that 10USD to buy more expensive games... =)

@PoppyGaming43 2 роки тому

youtube: *recommends me this video* me, who's literally never gonna use any of this: *interesting*

@Preaplanes 2 роки тому

Guy dismissed me in the first 21 seconds. Won't pretend I'm not tempted to continue watching. Statistics as a science (rather than bad statistics as a political tool) is the only kind of math I can say I greatly enjoy.

@QuietSnake-xs5vx 3 роки тому

I understood only half....need to brush up on my probability

@brandonwilbur2146 2 роки тому

Okay UKposts recommendations, I clicked it.

@IntrusiveThot420 2 роки тому

Any presentation that's got a 538 joke in there is a good presentation

@Hlkpf 6 років тому

super cute!

@garryiglesias4074 2 роки тому

14:36 - Historical and linguistic horror: Sans-culottes MEANT with pants...

@andrewcamden 2 роки тому

More often than not the data you do NOT have is more important than the data you do have. For instance, I and probably millions of other people didn't buy Dead Space 3 BECAUSE it was infested with microtransactions. There is no data for that though since a lost sale literally doesn't show up on the balance sheet. Game devs who decide NOT to "leave money on the table" by making real games without microtransactions are actually leaving a great deal of money on the table in lost sales for which they don't have any data. Game devs need leadership, empathy (essential for understanding customers even if you have no moral concerns whatsoever) and common sense to make good decisions. There isn't any amount of data that can substitute for these attributes.

@Parker-- 2 роки тому

17:27 Watching him shit on gullible health journalists in the COVID timeline. It's like he knew.

@mano_lamancha4716 2 роки тому

The question posed in the thumbnail says everything about why video game quality has plummeted in the past decade.

@lmartinson6963 2 роки тому

I'm pretty sure people who drink their coffee black being sociopathic is entirely factual

@frostknight7687 2 роки тому

Send this to chris

@Daniels2l 3 роки тому

If I were this guys boss id be like OK, If i buy the f*ck*ng SSDs will you shut up?!!

@CraigNicoll 6 років тому

Holy sh*$ a math talk I could ACTUALLY follow! BEST math GDC talk of all time. *awards him Golden UKposts User ClickedCookie.

@yungthunder2681 2 роки тому

If you're a game developer, and didn't take AP statistics, please tell me how you became a game developer?

@jacobb5484 2 роки тому

lots of practice by making mods, level design, digital modeling, etc.?

@MrDavidCollins 2 роки тому

I took statistics and didn't become a game developer (at a company, I just make it all myself now). College costs too much

@Brodysseus113 2 роки тому

Something I'd like to add to the graph at 19:00, the blue analytics are healthier because it produced a stronger reaction. Those are the people who are willing to put money into your game.

@Intrexa 2 роки тому

"The relative risk of somebody in control group b buying pants.." Relative risk of buying pants? What are you making these pants out of?

@tach5884 2 роки тому

Kryptonite.

@tomasxfranco 2 роки тому

Not that it's the point of the presentation, but this misses the other marginal benefits of working on SSDs all the time, not just in builds. Additionally, if build time doesn't change when moving to SSDs, then the bottleneck is elsewhere and could be tackled via a different component or algorithmic improvement.

@simlife445 2 роки тому

or that his is 5 year old gdc session(read the discription) so this data is insanely old ssd are 60% cheaper per gig and much faster

@13b78rug5h 2 роки тому

Yeah and long build times is actually one of the biggest blockers to ci/cd, which the lack of is usually the best indicator of long lead time which is the best indicator for slow development trapping more resources inside the system, increasing the number of bugs, less feedback, less data, less experimentation and less revenue. Overall meaning slower delivery and lower quality product and/or requiring more resources to deliver. And in the end you should not generally build on your local machine but do it automatically on a build server.

@crg78lf 2 роки тому

Dont forget: If you buy SSD to improve build time make sure to put your SWAP memory on the SSD. If you don't have a lot of RAM the extra memory used by the compiler/linker will then go on the ssd as well, drastically improving your build time

@anonymoususer3561 2 роки тому

This could have been half as long, probably

@Skronkful 2 роки тому

The only thing that made me cringe was when he said people should ignore the one-sided p-value, when his example (and most things you'd want to test in real life) is a one-sided hypothesis. It's not necessarily that we assume/know that SSDs are faster, it's that if we find that SSDs are significantly slower, we shouldn't be rejecting the test. He is actually doing a test of size 2.5% instead of 5%.

@f.p.5410 2 роки тому

If only that was the only cringe part... His explanation of p-values is statistical illiteracy 101. I was really surprised when I heard him making that mistake, interpreting p-values as Pr(H0). I thought that this statistical concept entered pop culture (kind of like "correlation is not causation" already did)... Amazing that people like him have the confidence to give talks on statistics.

@richardsejour7731 2 роки тому

You can tell the direction and magnitude of the effect using a 2 tailed t test. The limitations of a 1 tailed t test is that it is less stringent and in most cases you don't have any justification to assume that an effect is greater or lower.

@richardsejour7731 2 роки тому

@@f.p.5410 what did he say wrong? I missed it.

@f.p.5410 2 роки тому

@@richardsejour7731 he always says that the p-value is the probability of the boss (can't remember the name) being right. In other words, the probability of H0 being true. So that p

@richardsejour7731 2 роки тому

@@f.p.5410 I see what you mean. In a lot of schools and even on some online sources, the p value is described as the probability of incorrectly rejecting your null hypothesis, which some people can misconstrue as "proving" whether the null is correct or not. In practice, you can never really prove a null, or really any hypothesis. I think that his attempt at making the presentation lighthearted muddied the waters a little, but the general premise is still there. Speaking of which, I was sloppy in my wording. The 2 sided test is more conservative since the cutoff is 0.025 in either tail instead of 0.05 in 1 direction. Usually, when a two tailed test is statistically significant, a one tailed test is significant in that direction, but there are many cases when a one tailed test is significant, but a two tailed test won't be significant. There are very few circumstances to assume directionality before doing a statistical tests, which is why one sided tests are so rare. Most researchers will just do a 2 sided test to be conservative and then calculate effect sizes, and will consider one sided tests as p hacking. For the research question that he was asking (the efficacy of the ssd) he can't assume that the ssd can only be better out the same, when it's entirely possible that it could be worse. From his presentation of the question, he understands this, but used sloppy wording to make things simple. Regardless, the 2 tailed test is the best approach here.

@sirnathan8417 6 років тому

Arms lol

@poyi1013 2 роки тому

I’m more confused after the video~~~

@jbluepolarbear 2 роки тому

I think the hard drive example falls flat. no other data other than building times doesn't show anything of importance. analyzing the stages of the build would be more beneficial. It may have shown results that point to an issue in the build step as opposed to hdd vs ssd performance.

@13b78rug5h 2 роки тому

The only thing you save with faster build times isn't less time it takes. You make builds less often the more time it takes, it increases your lead time from feature idea to a working feature, therefore trapping value inside the system, slowing down the feedback of data or in some cases revenue. Also opening up your project and files or whatever decreases developer productivity and gets on their nerves. But in the end, all this is a false dichotomy as you should have a build server that does all the builds automatically and not rely on local manual builds. Continuous integration and delivery are a cornerstone of all high performing engineering cultures for a damn good reason.

@andrewandersson 4 роки тому

P is not the probability that the null is true, p values are the most misunderstood aspect of frequentist statistics there is. The p-value means that if you did an infinite number of these experiments, p% of them would have values as extreme or more extreme. 1/20 experiments you will get a p value that is more extreme. I wouldn't say 1/20 is very unlikely.

@richardsejour7731 2 роки тому

If you were to perform a test 1000 times, then wouldn't the distribution of the p value be skewed to the right assuming that you reject the null hypothesis? If this is the case then you are bound to have some p values that are not as extreme as others.

@andrewandersson 2 роки тому

@@richardsejour7731 I'm not sure, but I don't think the distribution of the p value is affected by rejecting or assuming the null hypothesis is correct. And of course you will have values that are less extreme. The first point in the clarifications en.wikipedia.org/wiki/Misuse_of_p-values is basically what I tried to get through, p-values aren't about probablities.

@richardsejour7731 2 роки тому

@@andrewandersson p values are about probabilities because they are tied to a normal distribution. More specifically, p values are supposed to represent the probability of getting a result at least that extreme assuming that you accept the null. My confusion was that I wasn't sure if p values are supposed to reflect an expected outcome if you repeated it multiple times. I understood p values as a test of the null hypothesis for that specific test/data. If you did repeat that test 1000 times, you would see a range of p values, but I don't think you could determine where most of the values would lie.

@andrewandersson 2 роки тому

@@richardsejour7731 it depends on the data and test, not all of them have normal distributions. If you aaaume a different model you get a different distribution but i guess you are right but it is important to distinguish the probability of the hypothesis being true (which frequentist statistics does not address) and the probability of seeing a result as extreme or more extreme assuming the hypothesis is true.

@richardsejour7731 2 роки тому

@@andrewandersson you are right, the p value is not about whether a null or alternate hypothesis is true, or wrong. It's about the probability of getting at least your observed value assuming if there was no difference between the means of your groups (aka the null hypothesis). Under this definition, you have to apply your p value in context to the distribution of a null hypothesis. So if you get a small p value, then that means that the likelihood of your data supporting the null hypothesis is so low that it cannot be an accurate representation of the null hypothesis, and thus you would reject the null since it conflicts with your data. Conversely, if you have a high p value this finding would be acceptable and not uncommon if you follow the distribution of a null hypothesis, or in other words, the null hypothesis doesn't disagree with your data, so you fail to reject the null. Or at least that is how I interpret the p value as.