A Statistical analysis of Charm Duration - Page 4

charleski · #1 04-14-2025, 02:38 PM

I started playing p1999 about six weeks ago, having left EQ in early 2005 after Arch Overseers largely collapsed due to the great OoW burnout, and expected to find a community that had min-maxed the system up to the hilt. But it's clear that a lot of the folklore and baseless dogma that characterised player knowledge back then still persists. One critical element in that seems to be the subject of charm duration. In this post I will set about deriving a proper statistical analysis of charm duration and provide some tools that can be used to test it. I'm not formally a statistician, though I use stats in my work and am pretty sure that the approach used here is correct. Still, if you spot any errors, please feel free to let me know.

The first step is to be absolutely clear about what we mean by 'charm duration'.
It's generally accepted that the game's global timeline is divided up into 'ticks', each six seconds long, and that any decision about charm breaking is made at the boundary point between ticks. So we have the situation depicted in the diagram below:
[You must be logged in to view images. Log in or Register.]

The charm spell will land on a mob at some random time, which is unlikely to be exactly at a tick boundary. Providing it is not resisted there will be an initial period of less than 6 seconds before a charm-break test is applied. Subsequently, every six seconds, another test will be performed. As long as these tests keep being passed, the charm will continue, until finally a test is failed and the charm breaks. In my experience the initial cast of charm is only ever resisted if the mob is of a higher level than the spell is capable of charming.

As shown by the diagram, a charm that lasts six ticks (plus the initial period) will reflect a sequence of six consecutive passes followed by one failure. In the rest of this discussion we will ignore the initial period, which is irrelevant to the analysis. In general, a charm that last n ticks will be the result of n charm-break successes.

Important assumption: I assume that each test is performed independently. That is, the probability of success on each tick is determined without reference to the number of preceding successes. This is a reasonable assumption and simplifies things considerably. There may be a limiting factor in terms of maximum charm duration, this is unknown. The maximum charm duration I've seen so far is 818 seconds, which is over 13 and a half minutes. I expect most enchanters will be used to seeing charms last over ten minutes, such that tash wears off even though you reapply tash on each break.

So to start off, we ask, "What is the probability that a charm will last n ticks?" This is actually a very simple question and relies on the underlying probability, p, of success on each individual tick. If we toss a coin, the chance of it coming up heads is 50% (p=0.5). What is the probability of it coming up heads each time if we toss it twice? The answer is to multiply the probabilities, so p = 0.5*0.5 = (0.5)^2 = 0.25. If you toss it twice, the chance of getting two heads is 1 in 4. Likewise, if you toss it three times, the chance of getting heads three times in a row will be (0.5)^3, 1 in 8, etc. So,
The probability that a charm will last n ticks is p^n.

This is an exponential curve, and matches what we see if we just dump a whole load of individual charms together and graph the distribution. I wrote a python program (attached below as CharmParsing.py) that parses a log file, extracts the duration of charms and performs some stats that will be explained later. Usage is simply
py -m CharmParsing <path to log file>
It will create a .csv file in the same location as the log that contains the stats and a list of individual charm durations in seconds and ticks. I took a log covering a bit over 2 weeks and fed it into the program, then graphed the duration of the 427 charms. This is uncontrolled data, so merely indicative, but it does show an exponential distribution.
[You must be logged in to view images. Log in or Register.]

At this point it's important to note that, given the exponential distribution, any talk of 'average' charm duration is both misleading and completely meaningless. The average only applies to samples that are drawn from a normally-distributed population, which is very much not the case here. If we're going to analyse charm duration we need a different statistic, we need an estimate of the underlying probability (p) of success on each tick.

This can be performed by recognising that each tick during the charm duration represents a separate independent trial. A charm that lasts six ticks represents six successes and one failure. As long as we have enough data and they're properly controlled (i.e. on the same mob, with no changes that might alter the underlying probability) we can group all these trials together and use them to calculate the Wilson Score (see references below), a binomial statistic that provides an estimate of underlying probability at a specified confidence level:
[You must be logged in to view images. Log in or Register.]

where p0 = upper and lower bounds of the Wilson Score; n = total number of trials, p_hat (p with a circumflex) = number of successes/total number of trials, and c = the critical value corresponding to the level of confidence required (1.96 for 95% confidence).
This is used in CharmParsing.py to report a central probability along with its upper and lower bounds.

We're now getting somewhere. We can generate estimates of the underlying probability of charm success on each tick. But what we really want is a way to compare these values between different conditions (i.e. changes in level-difference, CHA and magic resist). We could just see if the ranges given by the Wilson Score overlap, but in some cases that might produce a Type II error (false-negative). A better method is to compare the difference between central probabilities (p2-p1) to the Newcombe-Wilson difference interval (also covered in the references below).
The test is given by:
[You must be logged in to view images. Log in or Register.]
where p1 = central probability for condition1, w(1,-) = lower bound to the Wilson Score for condition 1, w(1,+) = upper bound to the Wilson Score for condition1, etc

This is performed by CharmDiff.py, also attached. To use this, cut your log file up into separate segments, each corresponding to one controlled value for the condition being tested. Usage is
py -m CharmDiff <95¦99> <path to first log file> <path to second log file>
The first argument must be 95 or 99, the level of confidence which you wish to use. It will print results to the terminal, but this can be redirected to a file using the > operator.

We now have the tools needed to investigate charm duration in a meaningful manner, and the following posts will show some results I've gathered. I strongly encourage anyone interested to try this for themselves and post the results they get. Any scientific data are only meaningful to the extent that they can be replicated. If you're looking to compare conditions, just make sure that the logs used are properly controlled. The code used only works for L12 Charm, Beguile and Cajoling Whispers, because I'm level 51 and don't have Allure. If you want to include those, it's easy to add them to the regexps and cast duration constants at the start of the files.

It's important to note that anecdotal evidence is completely worthless here. We've all had pets that seem unusually unruly ('OMG I just recharmed the bloody thing and it's broken again!'), but that means nothing. The First Rule of Statistics is: Shit Happens. My personal gut feeling is that charm breaks are more likely to happen when I've taken my hand off the keyboard to pick up a drink, but I haven't worked out a way to test that reliably yet... If you want to generate useful data you need to keep the conditions controlled and record a sufficiently large number of trials - I would recommend at least an hour or so's worth, i.e. 600 or more trials.

Q: You use big wurds and sumz! Y u do sumz? Dey make brane hurt!
A: You've been using Illusion:Troll too much. Here's 10pp, go buy yourself a drink.

The following posts will concern different conditions that are reputed to affect charm duration. If you can't wait, I'm coming to the conclusion that charm success per tick has a fixed probability close to 0.98.

References:
The Wilson Confidence Interval for a Proportion
Binomial Confidence Intervals and Contingency Tests
Interval estimation for the difference between independent proportions
Plotting the Newcombe-Wilson distribution
Sean Wallis, Statistics in Corpus Linguistics Research, 2021