Need Statistical Help! [Archive]

View Full Version : Need Statistical Help!

Estu

04-18-2013, 09:43 AM

So here's the deal. I've been collecting loot data on monsters in-game and updating the wiki with their info (this has been for low-level monsters so far since they are the fastest to kill). Here's an example of one of the monsters I have the most data for: http://wiki.project1999.org/A_Decaying_Dwarf_Skeleton

That page's loot data are from 408 kills (I was farming bone chips for faction). Usually I just collect 100 kills and move on to other monsters. Even for this page, though, we can see a big (at least relative to the percentages) difference in loot percentages in items that likely have the same actual probability of dropping: the 'common loot' likelihood of dropping varies from 0.2% to 2.2%, meaning that some items appear to drop ten times as often as others, though each item, in my opinion, probably just has a 1% chance of dropping, and I just happened to get more of some items than others.

I was thinking about this, and it occurred to me that the percentages I put up on the wiki might be misleading. Yeah, it's definitely better to have P1999 data than EQEmu data which is almost always wrong, but people might look at a list of loot data and get the wrong idea. For instance, say you want to farm bone chips. You look at the page for a decaying skeleton (http://wiki.project1999.org/A_decaying_skeleton) and see that they drop bone chips 72.9% of the time. Then you look at the page for a dwarf skeleton (http://wiki.project1999.org/A_dwarf_skeleton) and see that they drop bone chips 67.9% of the time. Maybe you conclude that decaying skeletons drop them a little more often, so you should farm them (let's forget for now that decaying skeletons are actually easier to kill in large quantities than dwarf skeletons), even though the data is only based off of about 100 kills for each monster, meaning that we can't actually say with confidence that the real chance to drop bone chips is very close to the given percentages.

I ran some numbers the other day. If an item has a 30% chance to drop and you kill the monster that drops it 100 times, then with a likelihood of about 5%, you will find the item dropping below 20% of the time, and with a likelihood of about 5%, you will find it dropping over 40% of the time. So one out of ten such items you see on the wiki will be have its drop data off by over 10%. The way I got these numbers was arduous; I looked at the associated binomial distribution, calculated the probability of getting each number of drops between 20 and 40 (from 100 trials), and added them up.

Here's what I'm looking to do: if I kill a monster 'n' times and it drops some item 'k' times, I want to generate a 95% confidence interval for that item's actual drop rate, i.e. I want to say that there is a 95% chance that it drops between x% of the time and y% of the time. I haven't taken stats for a while so I don't know the best way to do this, and I worry that using smooth approximations of discrete distributions will give me intervals that are inaccurate (however, I also want to be able to do a lot of these computations (hundreds of monsters, each with over 100 kills) quickly). My assumptions are that 'n' is at least 100, and items usually drop at least 1% of the time.

There's also a slight complication: most of the time items will drop either not at all or just once, but sometimes (e.g. with bone chips), they may drop twice or more (spider silks may drop five times off of spiders in East Karana). The mechanism by which this happens, I believe, is that there is a set probability 'p' used for each one dropping, and it's tested however many times. So maybe bone chips have a 50% chance to drop in each of the two times they might drop, so 25% of the time we get no bone chips, 50% of the time we get one, and 75% of the time we get two. What my parser does is it doesn't give the actual probability 'p' of each one dropping, or of at least one dropping, but rather the expected (average) number of items dropped, since this is easy to compute and in my opinion the most useful piece of information. So in the previous case, each one has a 50% chance to drop and 75% of the time we get at least one bone chip, but the expected number of bone chips is 0*25% + 1*50% + 2*25% = 1, so the wiki would show a "likelihood" of 100%. These cases would, I'm assuming, complicate the confidence intervals, though most pieces of loot that drop can only drop once, so it's more straightforward.

Any stats nerds wanna help me out? Thanks!

Swish

04-18-2013, 10:12 AM

I make wiki changes where I come across errors but its more to do with things like necro research trivials and stuff like that. I really wish I was better with statistics but my mind goes back to some really dry maths lessons at school years ago... could never wait to get out of there :D

Admire what you're doing though, its a shame that a lot of people slate the wiki for inaccuracies rather than making the necessary changes as they come across them :/

seped

04-18-2013, 02:43 PM

... And the forum ate my reply, damnit.

Long story short, Most programming languages have stats packages that will easily crunch this data for you, and can automate pretty much the whole process. You just want to give them the drop records you experienced (If you kill three skeletons and they drop 0, 1, and 2 bone chips you'd give them "0,1,2") and let them give you the mean (expected outcome from one kill) and standard deviation. 95% confidence of where the actual mean resides is within 2 standard deviations either way, so Mean +/- 2*stddev. Alternatively StdDev is simple to calculate, just look up wiki and use the sample formula, not population.

You won't have any issues with items that drop over '100%' as you're describing the expected outcome in truth, not the chance of getting 1 of the item, so just treat it like anything else and people will understand that a 160%+/-10% drop rate means they should expect 1.6 silks per kill and that you're fairly confident the real average is somewhere between 150-170%.

Estu

04-18-2013, 02:52 PM

Is it really as simple as using the standard deviation? Doesn't that assume a normal distribution?

seped

04-18-2013, 04:08 PM

Not really, this seems like a perfect example of the central limit theory, many independent random variables drawing from an unknown distribution will be approximately normally distributed.

That aside, do we have any strong assumption the drop rates aren't independent and normal?

Also, I goofed in my above post, you want +/- 2* the std error, not std deviation, it's just stddev divided by sqrt(sample size). For example in ruby..

irb(main):001:0> require 'statsample'
=> true
irb(main):002:0> test = 100.times.map { rand < 0.1 ? 1 : 0 }
=> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
irb(main):004:0> test = test.to_scale
=> Vector(type:scale, n:100)[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1, 1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 ,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0 ,0,0,1,0]
irb(main):005:0> test.mean
=> 0.11
irb(main):006:0> test.sd
=> 0.3144660377352203
irb(main):007:0> t_1 = Statsample::Test::T::OneSample.new(test, {:u => test.mean})
irb(main):009:0> puts t_1.summary
= One Sample T Test
Sample mean: 0.1100 | Sample sd: 0.3145 | se : 0.0314
Population mean: 0.1100
t(99) = 0.0000, p=1.0000 (both tails)
CI(95%): -0.0624 - 0.0624
=> nil
irb(main):010:0>

You can see that in this sample we'd report that the drop rate was 11%, plus or minus 6% which is dead on with the underlying mean of 10%.

Estu

04-18-2013, 04:20 PM

Certainly we can assume the drop rates are independent. Of course they're not normal; they are a binomial distribution. My worry with the central limit theorem is that the binomial distribution would not be close enough to normal, especially for small values of 'p' (e.g. 1%, which we do see a lot of), but maybe it would be OK, or at least good enough for the wiki. Thanks for the help!

seped

04-18-2013, 04:34 PM

Sorry I misspoke, meant to just say independent.

but maybe it would be OK, or at least good enough for the wiki. Thanks for the help!

Well that's exactly the uncertainty the error is supposed to account for. It will be pretty obvious for samples when you just don't have enough data. If you run 2% drop 100 times and get 4 drops, the error bounds you'll generate will be be 4% +/- 3.9% conversely if you ran 1000 trials your error bounds would be down around .7%. It answers the question of if your sample is large enough at the same time.

Useful link http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval What we're doing here is explicitly finding a confidence interval for a binomial test.

Estu

04-18-2013, 04:41 PM

It answers the question of if your sample is large enough at the same time.

Well, here's the thing: if you look at this page http://en.wikipedia.org/wiki/Binomial_distribution#Normal_approximation

Then it seems that if we set n=100 and p=0.01, it fails all the given tests for whether 'n' is high enough for the normal distribution to be a good approximation (on the other hand, if n=100 and p=0.3, then we seem to get a good approximation).

That Wilson score interval looks pretty good, though. Maybe I'll use that. Thanks for the link!

seped

04-18-2013, 04:46 PM

Estu

04-18-2013, 04:57 PM

Indeed, but the downside of getting a bad approximation is nothing other then getting something unhelpful like "Item drops 4% (+-4%)" which is still useful to know that the wiki doesn't have enough data to say anything more then "this item probably has a drop rate below 8%"

It seems to me that the downside of getting a bad approximation is that the percentages don't tell you anything. That +-4% is coming from the assumption that your binomial distribution is acting pretty much like a normal distribution; it's saying, if we had a normal distribution with the same mean and standard deviation of the binomial one, we'd have a +-4% confidence interval. But if our 'n' is too low and our 'p' is too close to 0 or 1, then the binomial distribution does not act like a normal distribution, so those percentages are worthless. We can brute-force calculate an actual confidence interval by looking at the binomial distribution itself (this would involve taking a lot of big combinations and exponents) and we wouldn't get +-4%.

sambal

04-18-2013, 05:08 PM

So in the previous case, each one has a 50% chance to drop and 75% of the time we get at least one bone chip, but the expected number of bone chips is 0*25% + 1*50% + 2*25% = 1, so the wiki would show a "likelihood" of 100%. These cases would, I'm assuming, complicate the confidence intervals, though most pieces of loot that drop can only drop once, so it's more straightforward.

Any stats nerds wanna help me out? Thanks!

I don't think you're 100% correct here.

The skeleton is drawing from the possibility of dropping two bones, each with the probability of 50%. This is the way almost every video game works, which does wind up yielding a normal distribution. It's much easier for the server to compute these straightforward loot tables, and appear to be convoluted to the player.

p_Bone_A = 0.5
p_Bone_B = 0.5

Probability that bone_A does not drop, P(A'):........................... 0.5 (no bone_A, irrespective of bone_B)
Probability that bone_B does not drop, P(B'):........................... 0.5 (no bone_B, irrespective of bone_A)
Probability that bone_A or bone_B drops, but not both:............. 0.5 (one bone)
Probability that bone_A and bone_B both drops, P(A∩B):............ 0.25 (two bones)
Probability that bone_A and/or bone_B drops, P(A∪B):............... 0.75 (one or two bones)
Probability that neither bone_A or bone_B drops, P(A'∩B'):.......... 0.25 (zero bones)

sambal

04-18-2013, 05:16 PM

p_bone = 1 works because over enough time, the player will average out having 1 bone for every skeleton killed. (for observations over about 30)

Estu

04-18-2013, 05:33 PM

I don't think you're 100% correct here.

The skeleton is drawing from the possibility of dropping two bones, each with the probability of 50%. This is the way almost every video game works, which does wind up yielding a normal distribution. It's much easier for the server to compute these straightforward loot tables, and appear to be convoluted to the player.

p_Bone_A = 0.5
p_Bone_B = 0.5

Probability that bone_A does not drop, P(A'):........................... 0.5 (no bone_A, irrespective of bone_B)
Probability that bone_B does not drop, P(B'):........................... 0.5 (no bone_B, irrespective of bone_A)
Probability that bone_A or bone_B drops, but not both:............. 0.5 (one bone)
Probability that bone_A and bone_B both drops, P(A∩B):............ 0.25 (two bones)
Probability that bone_A and/or bone_B drops, P(A∪B):............... 0.75 (one or two bones)
Probability that neither bone_A or bone_B drops, P(A'∩B'):.......... 0.25 (zero bones)

How is this different from what I said?

sambal

04-18-2013, 06:03 PM

How is this different from what I said?

I am saying don't focus on the probability of getting 0, 1 or 2. It doesn't really make sense. You don't even need to focus on the probability of each event happening independently of each other.

However, you do need to focus on the probability of getting one bone chip. That's 1. The underlying mechanism doesn't really matter to the player. Why bring it up?

On top of all that, if you really have 100 observations, observed loot will be very close to the actual loot tables.

seped

04-18-2013, 06:59 PM

Bugged my statistician friend, he pointed out an item from the page I linked earlier http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_sco re_interval That is specifically designed for smaller sample sizes or extremes of probability. That might be just what you want.

Estu

04-18-2013, 07:01 PM

koros

04-18-2013, 09:50 PM

This is my shit right here. I'll write up a post this weekend on the best way to do this if I get time this weekend.

Estu

04-18-2013, 10:09 PM

This is my shit right here. I'll write up a post this weekend on the best way to do this if I get time this weekend.

Thanks, but I've already coded up a method using the Wilson score interval, and it looks pretty good. It'll be in the wiki pending approval from Ravhin. I'd still welcome more discussion but I'm enjoying the Wilson interval at the moment.

Splorf22

04-18-2013, 10:23 PM

Estu

04-18-2013, 10:47 PM

So here is something to ponder Estu. Take your Dwarf Skeleton example. The Wilson test will say 1.5-2.7% probability for the Rusty Bastard Sword, and 0.1-0.5% probability for the Cloth Cap, or something like that. On the other hand, I look at that table and I'm guessing that 50% of skeletons drop a random rusty weapon or cloth dagger, i.e. that all of the items have the same drop probability. Can you really see Nilbog sitting there saying ho ho, 2.7% for the Rusty Bastard Sword and 0.2% for the Cloth cap?

I'm guessing that a Bayesian approach will do well here for stuff that drops many different possible items.

I assume you're talking about 'a decaying dwarf skeleton' based on the numbers you're giving. I've run it through my program, and here is what it says for those two items:

Number of decaying dwarf skeletons killed: 415
Proportion of them holding rusty bastard swords: 2.2%
Proportion of them holding small cloth caps: 0.2%

95% Wilson confidence interval for rusty bastard swords: 1.1%-4.1%
95% Wilson confidence interval for small cloth caps: 0%-1.4%

A user looking at this could conclude that there is a 1% drop rate for all the 'common' items (that 1.1% lower bound on the rusty bastard swords is pretty close to 1%, and given that there are about 20 common items, it's not surprising that one or two would fall outside of the 95% confidence interval). My conclusion is that the Wilson confidence intervals give good results that are consistent with reasonable assumptions (i.e. that all these items in reality drop at the same rate). However, I don't know the theoretical or practical differences between Wilson intervals and a Bayesian approach, so I'd be interested in hearing about it.

Skope

04-19-2013, 02:45 PM

Isn't this sort of stuff quantized? What I mean is that loot occupies specific tiers, each with its own formula (read: drop rate). These tiers would be shared across monsters for their respective loot tables.

For example:

Monster A has three drops (some may have more/less); X the common; Y the rare; Z the very rare. X has a rate of n%; Y of m%; and Z of p%.

Therefore pinning down the drop rate of one of these tiers would accurately represent the drop rates across a whole slew of NPCs.

I highly doubt the EQ coders (and Nilbog+Rogean and friends) have a completely separate equation for each piece of loot. That's idiotic and defeats the purpose of programming. Instead, new pieces of loot (like velious) would be simply plugged into specific tiers. This should make it easier to accurately represent the loot tables of NPCs. Rather than killing everything in the game in order to get a good %, all you'd need is a rough idea of what's common/rare/ultra-rare (+ additional tiers, if they exist) and a few mobs that drop common, rare, and ultra-rare drops. You wouldn't need to kill trakOnan a thousand times to determine the % of his drops when a lvl 20 mob with the same tiers would suffice just as well.

Estu

04-19-2013, 02:56 PM

I'm not sure exactly how they have it implemented. If we knew that there were only a few possibilities for drop rates, it might make sense to try and pin those down. Certainly it seems like when a monster drops a ton of different things (e.g. research components), many of them have the same chance to drop, but it also seems like that chance can be different between different mobs. Anyway, I have no idea how it works, I'm just trying to get some good statistical results from observation.

I've now updated all my entries in the wiki with confidence intervals (these are mostly for low-level mobs and I'm still in the process of getting more data, and will be for a long time). Here is an example of a monster with 415 kills: http://wiki.project1999.org/A_decaying_dwarf_skeleton

radditsu

04-19-2013, 03:33 PM

I assume it works like the basic loot tables that PEQ uses, tweaked for p99 era.

It has various tables with certain drop percentages. Such as the words/runes are on a table, quest items on a table, then main loot on a table. They even have a frontend!

http://www.peqtgc.com/edit/

its pretty interesting stuff if you are into databases.

Estu

04-19-2013, 03:39 PM

radditsu

04-19-2013, 03:45 PM

Based on this page: http://www.peqtgc.com/edit/index.php?editor=loot&z=blackburrow&zoneid=23&npcid=17002

I'm seeing a bunch of different percentages rather than something as simple as Skope described.

Keep in mind this is wildly different than p99 can be, but it did share a base at some point.

August

04-19-2013, 05:19 PM

You guys are doing God's work - keep it up.

I am inclined to agree w/ Splorf and Skope on the shared loot table. The idea of coding each mob to have independent tables for each piece of cloth, rusty, etc seems overwhelming, especially when you add in research components, etc. It seems way more likely that there is a specific research loot table that can just be tacked onto a mob - same w/ the cash drops gems, spells, armor tiers, weapon tiers, etc. Certainly some mobs may have a modified table (say guards that only drop FS SS + Shield) but I can't imagine such a wasteful approach to coding up loot tables if done wholesale for every mob in Norrath.