A Statistical analysis of Charm Duration

charleski · #1 04-14-2025, 02:38 PM

I started playing p1999 about six weeks ago, having left EQ in early 2005 after Arch Overseers largely collapsed due to the great OoW burnout, and expected to find a community that had min-maxed the system up to the hilt. But it's clear that a lot of the folklore and baseless dogma that characterised player knowledge back then still persists. One critical element in that seems to be the subject of charm duration. In this post I will set about deriving a proper statistical analysis of charm duration and provide some tools that can be used to test it. I'm not formally a statistician, though I use stats in my work and am pretty sure that the approach used here is correct. Still, if you spot any errors, please feel free to let me know.

The first step is to be absolutely clear about what we mean by 'charm duration'.
It's generally accepted that the game's global timeline is divided up into 'ticks', each six seconds long, and that any decision about charm breaking is made at the boundary point between ticks. So we have the situation depicted in the diagram below:
[You must be logged in to view images. Log in or Register.]

The charm spell will land on a mob at some random time, which is unlikely to be exactly at a tick boundary. Providing it is not resisted there will be an initial period of less than 6 seconds before a charm-break test is applied. Subsequently, every six seconds, another test will be performed. As long as these tests keep being passed, the charm will continue, until finally a test is failed and the charm breaks. In my experience the initial cast of charm is only ever resisted if the mob is of a higher level than the spell is capable of charming.

As shown by the diagram, a charm that lasts six ticks (plus the initial period) will reflect a sequence of six consecutive passes followed by one failure. In the rest of this discussion we will ignore the initial period, which is irrelevant to the analysis. In general, a charm that last n ticks will be the result of n charm-break successes.

Important assumption: I assume that each test is performed independently. That is, the probability of success on each tick is determined without reference to the number of preceding successes. This is a reasonable assumption and simplifies things considerably. There may be a limiting factor in terms of maximum charm duration, this is unknown. The maximum charm duration I've seen so far is 818 seconds, which is over 13 and a half minutes. I expect most enchanters will be used to seeing charms last over ten minutes, such that tash wears off even though you reapply tash on each break.

So to start off, we ask, "What is the probability that a charm will last n ticks?" This is actually a very simple question and relies on the underlying probability, p, of success on each individual tick. If we toss a coin, the chance of it coming up heads is 50% (p=0.5). What is the probability of it coming up heads each time if we toss it twice? The answer is to multiply the probabilities, so p = 0.5*0.5 = (0.5)^2 = 0.25. If you toss it twice, the chance of getting two heads is 1 in 4. Likewise, if you toss it three times, the chance of getting heads three times in a row will be (0.5)^3, 1 in 8, etc. So,
The probability that a charm will last n ticks is p^n.

This is an exponential curve, and matches what we see if we just dump a whole load of individual charms together and graph the distribution. I wrote a python program (attached below as CharmParsing.py) that parses a log file, extracts the duration of charms and performs some stats that will be explained later. Usage is simply
py -m CharmParsing <path to log file>
It will create a .csv file in the same location as the log that contains the stats and a list of individual charm durations in seconds and ticks. I took a log covering a bit over 2 weeks and fed it into the program, then graphed the duration of the 427 charms. This is uncontrolled data, so merely indicative, but it does show an exponential distribution.
[You must be logged in to view images. Log in or Register.]

At this point it's important to note that, given the exponential distribution, any talk of 'average' charm duration is both misleading and completely meaningless. The average only applies to samples that are drawn from a normally-distributed population, which is very much not the case here. If we're going to analyse charm duration we need a different statistic, we need an estimate of the underlying probability (p) of success on each tick.

This can be performed by recognising that each tick during the charm duration represents a separate independent trial. A charm that lasts six ticks represents six successes and one failure. As long as we have enough data and they're properly controlled (i.e. on the same mob, with no changes that might alter the underlying probability) we can group all these trials together and use them to calculate the Wilson Score (see references below), a binomial statistic that provides an estimate of underlying probability at a specified confidence level:
[You must be logged in to view images. Log in or Register.]

where p0 = upper and lower bounds of the Wilson Score; n = total number of trials, p_hat (p with a circumflex) = number of successes/total number of trials, and c = the critical value corresponding to the level of confidence required (1.96 for 95% confidence).
This is used in CharmParsing.py to report a central probability along with its upper and lower bounds.

We're now getting somewhere. We can generate estimates of the underlying probability of charm success on each tick. But what we really want is a way to compare these values between different conditions (i.e. changes in level-difference, CHA and magic resist). We could just see if the ranges given by the Wilson Score overlap, but in some cases that might produce a Type II error (false-negative). A better method is to compare the difference between central probabilities (p2-p1) to the Newcombe-Wilson difference interval (also covered in the references below).
The test is given by:
[You must be logged in to view images. Log in or Register.]
where p1 = central probability for condition1, w(1,-) = lower bound to the Wilson Score for condition 1, w(1,+) = upper bound to the Wilson Score for condition1, etc

This is performed by CharmDiff.py, also attached. To use this, cut your log file up into separate segments, each corresponding to one controlled value for the condition being tested. Usage is
py -m CharmDiff <95¦99> <path to first log file> <path to second log file>
The first argument must be 95 or 99, the level of confidence which you wish to use. It will print results to the terminal, but this can be redirected to a file using the > operator.

We now have the tools needed to investigate charm duration in a meaningful manner, and the following posts will show some results I've gathered. I strongly encourage anyone interested to try this for themselves and post the results they get. Any scientific data are only meaningful to the extent that they can be replicated. If you're looking to compare conditions, just make sure that the logs used are properly controlled. The code used only works for L12 Charm, Beguile and Cajoling Whispers, because I'm level 51 and don't have Allure. If you want to include those, it's easy to add them to the regexps and cast duration constants at the start of the files.

It's important to note that anecdotal evidence is completely worthless here. We've all had pets that seem unusually unruly ('OMG I just recharmed the bloody thing and it's broken again!'), but that means nothing. The First Rule of Statistics is: Shit Happens. My personal gut feeling is that charm breaks are more likely to happen when I've taken my hand off the keyboard to pick up a drink, but I haven't worked out a way to test that reliably yet... If you want to generate useful data you need to keep the conditions controlled and record a sufficiently large number of trials - I would recommend at least an hour or so's worth, i.e. 600 or more trials.

Q: You use big wurds and sumz! Y u do sumz? Dey make brane hurt!
A: You've been using Illusion:Troll too much. Here's 10pp, go buy yourself a drink.

The following posts will concern different conditions that are reputed to affect charm duration. If you can't wait, I'm coming to the conclusion that charm success per tick has a fixed probability close to 0.98.

References:
The Wilson Confidence Interval for a Proportion
Binomial Confidence Intervals and Contingency Tests
Interval estimation for the difference between independent proportions
Plotting the Newcombe-Wilson distribution
Sean Wallis, Statistics in Corpus Linguistics Research, 2021

charleski · #2 04-14-2025, 02:39 PM

The topic that's generated the most confusion seems to be the importance of the CHA stat.
I'm going to start off here by saying that it is, as a general principle, impossible to prove a negative. I can't prove to you that CHA has no effect on charm duration. The way statistics works is to start with a null hypothesis (that there is no difference between two conditions) and test whether the evidence shows that you can reject that hypothesis with a specified level of confidence. I.e. you are testing whether or not you can prove a positive effect.

TLDR, the result is:
Unable to reject the null hypothesis that CHA has no effect at the 95% level.

To test this I grabbed a Greater Spurbone in Emerald Jungle and parked it on the south wall. I was level 50, the mob was blue-con, resisted Beguile twice and hit for an observed max of 100, so probably level 38-39. I took off all gear with any CHA on it to take my CHA stat down to its base of 115 (because I followed the folklore and made my character according to the guide …) and proceeded to recharm on each break for around 2 hours. I then put on all the CHA gear I could find, applied the CHA buff to take my CHA up to 226 and repeated the process for another 2 hours or so. This resulted in around 1200 individual tick trials for each condition, and the results are given below:
Input data 1:
File: L50 EJ Gt Spurbone CHA 115.txt
Total trials: 1157
p charm success (per tick): 0.9742
Wilson Score lower bound: 0.9647
Wilson Score upper bound: 0.9836
Input data 2:
File: L50 EJ Gt Spurbone CHA 226.txt
Total trials: 1294
p charm success (per tick): 0.9847
Wilson Score lower bound: 0.9776
Wilson Score upper bound: 0.9915

probability difference: 0.0104

Newcombe-Wilson difference interval: -0.0117, 0.0117
Not significant at 95% level

To illustrate this further, here's a diagram showing the extent to which the two Wilson Scores overlap:
[You must be logged in to view images. Log in or Register.]

Now I know that some will be tempted to carp that the increased CHA did show a minor increase in the central success probability number. Unfortunately this fails to appreciate the actual nature of the Wilson Score. The actual probability for each condition may lie at any point within the upper and lower bounds, and these overlap substantially. Furthermore, 95% confidence (2 sigma) is pretty weak-sauce as far as confidence goes and represents the lowest level at which we can start talking about any real difference.

One factor that might come into play is an adjustment called continuity correction. This is employed to ensure that probabilities stay within the 0-1 bounds and becomes more important as the probabilities approach closer to 0 or 1 (as seen here). The programs use continuity correction in their default state, but this does lead to slightly wider Wilson Score intervals. I turned continuity correction off (you just need to change a boolean constant in the code), but it still failed to produce a significant result.

My personal feeling is that the results for the 115 CHA condition just happen to be on the lower part of the range - all my preliminary results were closer to 0.98. But it doesn't really matter, the statistics involved are capable of taking that into account, and do so here.

As I said above, I encourage you to test this for yourselves. Getting the data is rather boring, but the more data the better.

How does this affect standard guidelines? Newbie enchanters should put their points into STR, so they can carry more fine steel back to town to sell.

charleski · #3 04-14-2025, 02:40 PM

What about level? Surely the difference in level affects charm duration?

Result:
Unable to reject the null hypothesis that level difference has no effect at the 95% level.

A few days before performing the experiment with CHA I went to GFay and faced off against the mighty level 2 orc pawn (confirmed white-con to a level 2 player). This test was also performed with a CHA of 115. I was 48 at the time, so the difference was 46 levels, as opposed to 11-12 levels against the greater spurbone in EJ.

Input data 1:
File: L50 EJ Gt Spurbone CHA 115.txt
Total trials: 1157
p charm success (per tick): 0.9742
Wilson Score lower bound: 0.9647
Wilson Score upper bound: 0.9836
Input data 2:
File: orc_pawn_CHA115.txt
Total trials: 2053
p charm success (per tick): 0.9840
Wilson Score lower bound: 0.9784
Wilson Score upper bound: 0.9895

probability difference: 0.0098

Newcombe-Wilson difference interval: -0.0110, 0.0109
Not significant at 95% level

There is one possibility that is not tested here. It may be that level difference only has a manifest effect when it is very small, possibly via an exponential factor that has saturated by the time level difference is 10 or more. Unfortunately I'm unable to test this on my own. If you know of a healer or another high-level enchanter who's willing to spend hours doing nothing other than help getting a L50 mob under control again on breaks, then drop me a line. But be warned: doing this for several hours in a row is not the most exciting experience.

charleski · #4 04-14-2025, 02:40 PM

Finally, what about magic resist? Does that have an effect on charm duration?

I haven't formally tested this yet. This is largely because, as mentioned earlier, I frequently notice charms that last so long that tash wears off, which obviously will have a confounding effect on the succeeding trials, and I'm not sure how to handle these instances.

But frankly, the question of magic resist is largely moot anyway as we know for sure that MR is important in handling charm breaks. When charm breaks you go through the stun-L4mez-reTash-reCharm cycle (strung together with the clicky exploit) and landing the stun and mez are essential components in making that happen smoothly. Successful charming means successful handling of charm breaks, and keeping the mob's MR low is a critical factor in that.

So those Rusty Spiked Shoulderpads are indeed a useful addition, just not in terms of increasing charm duration.

Jimjam · #5 04-14-2025, 03:00 PM

Quote:

Originally Posted by charleski [You must be logged in to view images. Log in or Register.]

I haven't formally tested this yet. This is largely because, as mentioned earlier, I frequently notice charms that last so long that tash wears off, which obviously will have a confounding effect on the succeeding trials, and I'm not sure how to handle these instances.

If you duel a conspirator they can refresh tash on your pet before it fades, removing that confounding effect.

bcbrown · #6 04-14-2025, 03:45 PM

Very nice work. It'll take me a couple readings to fully grasp, but on a preliminary basis your conclusions look well-founded. A couple initial comments:

You suggest that charisma does not impact charm duration, and therefore is overvalued as a stat. I don't play an enchanter, but my understanding has always been that high charisma is mainly valued for the impact on the lull line of spells, not charms.

Quote:

Originally Posted by charleski [You must be logged in to view images. Log in or Register.]

I haven't formally tested this yet. This is largely because, as mentioned earlier, I frequently notice charms that last so long that tash wears off, which obviously will have a confounding effect on the succeeding trials, and I'm not sure how to handle these instances.

Since every tick is an independent trial, when parsing the log couldn't you track the log entry for tash wearing off and group subsequent trials separately from tash-active trials?

The test comparing durations for the orc pawn versus the EJ skeleton is interesting. I would expect that a 46-level difference would completely saturate any level-dependent effect, but a 10-12 level difference I would have expected to be small enough to show an impact if there was one. I'm happy to volunteer two hours supporting further testing with either a 60 druid or 55 cleric.

charleski · #7 04-15-2025, 12:58 PM

Quote:

Originally Posted by bcbrown [You must be logged in to view images. Log in or Register.]

Since every tick is an independent trial, when parsing the log couldn't you track the log entry for tash wearing off and group subsequent trials separately from tash-active trials?

That is certainly possible. I'll admit that I haven't really got round to working on the problem properly. This is partially because, as I mentioned, you definitely do want to keep mob MR low in order to handle the charm breaks smoothly. But this is something to keep in mind for future study.

Quote:

The test comparing durations for the orc pawn versus the EJ skeleton is interesting. I would expect that a 46-level difference would completely saturate any level-dependent effect, but a 10-12 level difference I would have expected to be small enough to show an impact if there was one. I'm happy to volunteer two hours supporting further testing with either a 60 druid or 55 cleric.

One possibility that occurred to me is that level difference is factored in an exponential manner, such that small level differences have an appreciable effect, but beyond 10 levels or so it has shrunk to a vanishingly small level.

If you're happy spending several hours doing nothing other than staring at a mob waiting for it to break, drop Feressa a tell in the game. But it is pretty boring [You must be logged in to view images. Log in or Register.]

bcbrown · #8 06-29-2025, 09:04 PM

I've been doing a little digging into Torven's writeups on equemu, and I think he can explain why you didn't see an effect from level difference here:

Quote:

Originally Posted by charleski [You must be logged in to view images. Log in or Register.]

A few days before performing the experiment with CHA I went to GFay and faced off against the mighty level 2 orc pawn (confirmed white-con to a level 2 player). This test was also performed with a CHA of 115. I was 48 at the time, so the difference was 46 levels, as opposed to 11-12 levels against the greater spurbone in EJ.

Quote:

Originally Posted by Torven

What we can determine from this is that there is a limit as to how much advantage a player gets from being a higher level than the NPC-- that limit is 20% resist rate, or 40 resist value. Resist rate flattens starting at +9 levels above the NPC-- the level advantage players get over NPCs is capped at 9 levels, and 9 levels corresponds with 20% resist rate. The second item of note is that the rate of change is not linear, and gets stronger the farther away from the NPC's level one is. The warlord and soldier resist rates hit 100% at their two points due to the maximum allowable level hit range.

from https://web.archive.org/web/20200813...ad.php?t=38673

Quote:

Originally Posted by Torven

The resist modifier from level difference is: diff^2 / 2, capped at -40. Charm ticks add +4 to the caster's level here. For example, the Crystalline golem's effective MR vs a level 65 would be: 50 - INT((65-62)^2 / 2) = 46. But for charm ticks: 50 - INT((69-62)^2 / 2) = 26.

from https://www.eqemulator.org/forums/sh...ad.php?t=43370

This suggests there is a level cap, which explains why this data doesn't show any impact from level difference (11-12 levels vs 46 levels)

Samoht · #9 04-17-2025, 06:09 PM

Quote:

Originally Posted by charleski [You must be logged in to view images. Log in or Register.]

How does this affect standard guidelines? Newbie enchanters should put their points into STR, so they can carry more fine steel back to town to sell.

While I support this sentiment 100% for other caster classes, it's just not true for enchanters.

There have already been people who pointed out that we need CHA for lulls.

And the OP has already learned that they had a flawed assumption that charm lands 100% of the time. CHA is assumed to reduce initial check. As is level. As is MR.

So this whole conversation about whether or not CHA affects charm duration might be cool to figure out, but from the two statements above, it is moot because you will have CHA already. And Wilson might say that the tests from the first post show no statistical difference, but this is magic, baby, and the enchanter sees the difference between the two graphs.

So, ultimately, my statement to Wilson is that I am in agreement with the following:

Quote:

Originally Posted by kjs86z2 [You must be logged in to view images. Log in or Register.]

who cares?

shovelquest · #10 04-14-2025, 03:26 PM

Charms should only last 8 minuets (max), mountains of proof:

https://project1999.com/forums/showp...3&postcount=81

I hope OP's science (that is above my pay grade) and this can put a rest to the debate and crush a bunch of people's joy [You must be logged in to view images. Log in or Register.]