**This section of sample problems and solutions is a part of** **The Actuary’s Free Study Guide for Exam 4 / Exam C****, authored by Mr. Stolyarov. This is Section 24 of the Study Guide. See an index of all sections by following the link in this paragraph.**

Some of the problems in this section were designed to be similar to problems from past versions of Exam 4/C, offered jointly by the Casualty Actuarial Society and the Society of Actuaries. They use original exam questions as their inspiration – and the specific inspiration is cited to give students an opportunity to see the original. All of the original problems are publicly available, and students are encouraged to refer to them. But all of the values, names, conditions, and calculations in the problems here are the original work of Mr. Stolyarov.

This section will also serve to give some practice with the **Chi-square goodness of fit test.**

The **Chi-square distribution (χ2 distribution)** can be defined as follows:

The p.d.f. of U = j=1m∑[Zj2], where each Zj is a standard normal random variable independent of all the other Zj’s, is called the **Chi-square distribution with m degrees of freedom.** (Larsen and Marx 2006, p. 474).

It is easy to remember the Chi-square distribution as the p.d.f. that results from adding the *squares* of m *standard normal random variables,* where m is the number of degrees of freedom for the Chi-square distribution.

The chi-square distribution has **critical values** associated with a given **significance level** and a number of degrees of freedom. You can look up these critical values in a Table of Critical Values for the Chi-Square Distribution.

Larsen and Marx define a **goodness of fit test** as “any procedure that seeks to determine whether a set of data could reasonably have originated from some given probability distribution, or *class* of probability distributions” (599).

The **Chi-square goodness of fit test** can be phrased as follows.

“Let r1, r2, …, rt be the set of possible outcomes (or ranges of outcomes) associated with each of n independent trials, where P(ri) = pi, for i = 1, 2, …, t. Let Xi = the number of times ri occurs. Then the following holds.

a. The random variable D = i=1t∑[(Xi – npi)2/npi] has approximately a χ2 distribution with t-1 degrees of freedom. For the approximation to be adequate, the t classes should be defined so that npi ≥ 5 for all i.

b. Let k1, k2, …, kt be the observed frequencies for the outcomes r1, r2, …, rt, respectively, and let np1o, np2o, …, npto be the corresponding expected frequencies based on the null hypothesis. At the α level of significance, H0: fY(y) = fo(y) (or H0: pX(k) = po(k)) is rejected if

d = i=1t∑[(ki – npio)2/(npio)] ≥ χ21- α,t-1.” (Larsen and Marx 2006, p. 616)

For instance, if we have a distribution of observations, and we know both the expected numbers of observations of each kind and the actual numbers of observations of this kind, then we can let ki be the actual number of observations of the type i and npio be the expected number of observations of the type i. Then we can find d = i=1t∑[(ki – npio)2/(npio)] and compare it to

χ21- α,t-1, the chi-square statistic with t-1 degrees of freedom and significance level of 1-α.

If d ≥ χ21- α,t-1, then we reject the null hypothesis H0.

**Sources:**

Broverman, Sam. Actuarial Exam Solutions – CAS Exam 3 – Fall 2006.

Exam C Sample Questions and Solutions from the Society of Actuaries.

Larsen, Richard J. and Morris L. Marx. *An Introduction to Mathematical Statistics and Its Applications.* Fourth Edition. Pearson Prentice Hall: 2006. pp. 474, 599, 616.

**Original Problems and Solutions from The Actuary’s Free Study Guide**

**Problem S4C24-1. Similar to Question 11 of the** **Exam C Sample Questions from the Society of Actuaries.** A Pareto distribution has pdf f(x│θ) = 2θ2/(x+θ)3. There is a 1/3 probability that θ = 2 and a 2/3 probability that θ = 6. Variable X follows this distribution and denotes the number of blue slugs that will land on the roof of Mr. Zolatrax’s house in a given month. In April, it is known that 4 blue slugs landed on the roof of Mr. Zolatrax’s house. Given this information, find the probability that more than 10 slugs will land on the roof of Mr. Zolatrax’s house in May. (This is called a **posterior probability**.)

**Solution S4C24-1.** First, we find what the probability of 4 slugs having landed on Mr. Zolatrax’s roof is, given each of the possible values of θ:

f(4│2) = 2*22/(4+2)3 = 1/27

f(4│6) = 2*62/(4+6)3 = 9/125

Now we want to find the probability that θ = 2, *given* our past observation, X = 4:

Pr(θ = 2│ X = 4) = Pr(θ = 2)*Pr(X = 4│θ = 2)/Pr(X = 4) =

Pr(θ = 2)*Pr(X = 4│θ = 2)/(Pr(θ = 2)*Pr(X = 4│θ = 2) + Pr(θ = 6)*Pr(X = 4│θ = 6)).

We know that Pr(θ = 2) = 1/3 and Pr(X = 4│θ = 2) = f(4│2) = 1/27.

We also know that Pr(θ = 6) = 2/3 and Pr(X = 4│θ = 6) = f(4│6) = 9/125.

Thus, Pr(θ = 2│ X = 4) = (1/3)(1/27)/((1/3)(1/27) + (2/3)(9/125)) = 125/611 = 0.2045826514.

From this it follows that Pr(θ = 6│ X = 4) = 1 – Pr(θ = 2│ X = 4) = 486/611 = 0.7954173486.

In calculating SX(10), we will thus assume that there is a 125/611 probability that θ = 2 and a 486/611 probability that θ = 6.

Now we need to find SX(x) = 1 – 0∞∫(2θ2/(x+θ)3)dx = 1 – (-θ2/(x+θ)2) = θ2/(x+θ)2.

Hence, SX(10) = (125/611)*22/(10+2)2 + (486/611)*62/(10+6)2 = **SX(10) = 0.1175384161**.

**Problem S4C24-2. Similar to Question 13 of the** **Exam C Sample Questions from the Society of Actuaries.** There are three colors of cat-dogs: yorange, grue, and rurple. The following are their historical probabilities of occurring: yorange: 0.43; grue: 0.24; rurple: 0.33. In a sample of 1000 cat-dogs, it is observed that 465 are yorange, 198 are grue, and 337 are rurple. You are testing the hypothesis that the observed probabilities of each of the three colors are the same as the historical probabilities. Find the Chi-square goodness of fit statistic for this test.

**Solution S4C24-2.** Our test statistic is calculated as follows: i=1t∑[(Xi – npi)2/npi]. Here, the expected numbers of each color of cat-dog, based on the historical probabilities, are as follows:

Yorange: 0.43*1000 = 430; Grue: 0.24*1000 = 240; Rurple: 0.33*1000 = 330.

These are our values of npi in each term of the formula calculating the Chi-square statistic:

χ2 = (465-430)2/430 + (198-240)2/240 + (337-330)2/330 = **χ2 = 10.34732206**.

**Problem S4C24-3. Similar to Question 19 of the** **Exam C Sample Questions from the Society of Actuaries.** A random variable X follows a probability distribution with the following pdf: fX(x) = (1/4)(1/θ)e-x/θ + (3/4)(1/σ)e-x/σ. A sample of 200 values is obtained, with the following sample data: i=1200Σxi = 96000; i=1200Σxi2 = 80000000. It is known that θ > σ. Assuming that the sample is representative of the population, estimate the value of θ by using the moments of the sample distribution. Use the Exam 4 / C Tables where necessary.

**Solution S4C24-3.** We note that the distribution in question is a mixture of two exponential distributions. The exponential distribution with mean θ has weight (1/4), and the exponential distribution with mean σ has weight (3/4). Thus, E(X) = 0.25θ + 0.75σ, which, using the first moment of the sample, we know to be 96000/200 = 480.

The second moment E(X2) of an exponential distribution is 2θ2, so in this case, E(X2) = 0.25*2θ2 + 0.75*2σ2 = 0.5θ2 + 1.5σ2. Using the second moment of the sample, we know this to be equal to 80000000/200 = 40000. Thus, we must solve the following system of equations:

0.25θ + 0.75σ = 480

0.5θ2 + 1.5σ2 = 400000.

0.25θ + 0.75σ = 480 → σ = (480 – 0.25θ)/0.75 = 640 – θ/3.

This means that 400000 = 0.5θ2 + 1.5(640 – θ/3)2 →

400000 = 0.5θ2 + 1.5(409600 – 640θ + θ2/9) →

0 = (2/3)θ2 – 960θ + 214400. Using the Quadratic Formula, we find that

θ = 1163.62146 or θ = 276.3785397.

Of these values, if θ = 276.3785397, it is clearly the case that σ > θ, or else any weighted average of these two values would not be equal to 480. Thus, in order for θ > σ to be the case, it must be that **θ = 1163.62146**.

**Problem S4C24-4. Similar to Question 25 of the** **Exam C Sample Questions from the Society of Actuaries.** You are modeling the number of earthquakes in a variety of counties per year. A sample of 100 counties produced the following results:

53 counties had 0 earthquakes.

23 counties had 1 earthquake.

13 counties had 2 earthquakes.

4 counties had 3 earthquakes.

3 counties had 4 earthquakes.

3 counties had 5 earthquakes.

1 county had 6 earthquakes.

Which of the following distributions would be best for modeling this data?

(a) Poisson

(b) Negative binomial

(c) Binomial

(d) Discrete uniform

(e) Either Poisson or binomial.

**Solution S4C24-4.** We compare the variance to the mean. We find E(X) = 0*53/100 + 1*23/100 + 2*13/100 + 3*4/100 + 4*3/100 + 5*3/100 + 6*1/100 = E(X) = 0.94.

E(X2) = 02*53/100 + 12*23/100 + 22*13/100 + 32*4/100 + 42*3/100 + 52*3/100 + 62*1/100 = E(X2) = 2.7.

Var(X) = E(X2) – E(X)2 = 2.7 – 0.942 = 1.8164.

Here, the variance is almost twice as large as the mean. Earthquakes in the sample are clearly not uniformly distributed between 0 and 6 and, of the distributions listed above, only the negative binomial has a variance larger than the mean. Thus, the correct answer is **(b) Negative binomial**.

**Problem S4C24-5. Similar to Question 28 of the** **Exam C Sample Questions from the Society of Actuaries.** 100 giraffes are measured for their height (X) in centimeters. The following data are observed:

34 giraffes are between 300 and 400 centimeters in height.

54 giraffes are between 400 and 500 centimeters in height.

12 giraffes are between 500 and 600 centimeters in height.

Assume that the heights are uniformly distributed within each interval.

Find E(X2) – E((X Λ 570)2).

**Solution S4C24-5.**

We use the formulas E(Xk) = -∞∞∫xk*f(x)dx and E((X Λ u)k) = -∞u∫xk*f(x)dx + uk*(1 – F(u)).

Thus, E(X2) – E((X Λ 570)2) = 300600∫x2*f(x)dx – 300570∫x2*f(x)dx – 5702(1 – F(570)) →

E(X2) – E((X Λ 570)2) = 570600∫x2*f(x)dx – 5702(1 – F(570)).

The uniform distribution on the interval from 500 to 600 has a pdf of (1/100), multiplied by the weight of (12/100) – as only 12 of the 100 observed giraffes are within the interval in question. Thus, the relevant value of f(x) is 12/10000 = 0.0012.

1 – F(570) is the same as S(570) = the probability that a giraffe is taller than 570 centimeters. Because we assume a uniform distribution within each interval, 12(600-570)/(600-500) = 3.6 of 100 giraffes can be expected to be taller than 570 centimeters. Therefore, 1 – F(570) = 0.036, and so

E(X2) – E((X Λ 570)2) = 570600∫0.0012x2dx – 5702(0.036) =

**E(X2) – E((X Λ 570)2) = 626.4**.

**See other sections of** **The Actuary’s Free Study Guide for Exam 4 / Exam C****.**