Econ 4650 — Assignment 1

Econ 4650 — Assignment 1 Due Monday, Feb. 1 1. (Labor force participation) I created the table below from the March 2015 Current Population Survey. It shows the joint probability distribution of two variables: F and L. F is whether the person is female (F = 1 for female), L is whether the person is in the labor force (L = 1 for being in the labor force). Being in the labor force means you’re either employed or looking for employment. Also, only civilians of working age (16+) have a chance to be in the labor force. L = 0 L = 1 F = 0 0.15 0.33 F = 1 0.22 0.29 Using the table ... (a) Compute the labor force participation rate (the probability of being in the labor force). What was the official labor force participation rate published by the Bureau of Labor Statistics for March 2015? Even if you compute it correctly from the table your number won’t exactly match the BLS because of rounding in the table, and because the BLS number includes a seasonal adjustment, but it should be within 1. (b) Is the labor force participation rate you computed in (a) equal to E¡ L ¢ ? Why or why not? (c) Pr(L = 1|F = 0) and Pr(L = 1|F = 1). What do these represent? (d) Show how to use the law of iterated expectations to compute the overall labor force participation rate. (e) Compute sFL — the covariance between labor force participation and whether female. (f) Is being in the labor force statistically independent of being female? How can you tell? 2. (Standardizing) The random variable Y has a mean of 1 and a variance of 4. Let Z = 1 2 ¡ Y -1 ¢ . Show that µZ = 0 and s 2 Z = 1. 3. (Looking up probabilities) Recollect how to look up normal and t probabilities from a table. (a) If Y is distributed N(1,4) (mean is 1, standard deviation is 4), find Pr(Y = 3). Use Table 1 in the appendix. (b) If Y is distributed t90 (t with 90 degrees of freedom), find Pr(-1.99 = Y = 1.99). Use Table 2 in the appendix. (c) If Y is N(0,1), find Pr(-1.99 = Y = 1.99). (d) Why are the answers to 3b and 3c approximately the same? 4. (Central limit theorem) Yi , i = 1,...,n, are i.i.d. Bernoulli random variables with p = 0.5. Using the central limit theorem, find Pr(0.45 = Y = 0.55) when n = 10 and when n = 100. What is the minimum n such that Pr(0.45 = Y = 0.55) > 0.99? 5. (Chi-squared) Suppose Yi is distributed i.i.d. N(0,s 2 ) for i = 1,2,...,m. (a) Show that E ³ Y 2 i /s 2 ´ = 1. Hint: Use what I showed in class about the expected square of a random variable: E ³ Y 2 ´ = µ 2 Y +s 2 Y . Also, realize that s 2 is a constant. Think of it as a scaling factor. (b) Show that W = 1/s 2 Pn i=1 Y 2 i is distributed ? 2 n (chi-squared with n degrees of freedom). Hint: Check the definition of a chi-squared with n degrees of freedom. Why does W fit that definition? 1 (c) Show that E ¡ W ¢ = n. Hint: W is a linear combination. In class we’ve looked at how to compute the expected value of a linear combination. Also, from part 5a you know the expected value of each component of the linear combination. 6. (Benefits of diversification) Suppose you have some money to invest—for simplicity, $1—and you are planning to put a fraction w into a stock market mutual fund and the rest, 1-w, into a bond mutual fund. Suppose that $1 invested in a stock fund yields Rs after 1 year and that $1 invested in a bond fund yields Rb, suppose that RS is random with mean 0.08 (8%) and standard deviation 0.07, and suppose that Rb is random with mean 0.05 (5%) and standard deviation 0.04. The correlation between Rs and Rb is 0.25. If you place a fraction w of your money in the stock fund and the rest, 1 - w, in the bond fund, then the return on your investment is R = wRs +(1-w)Rb. (a) Suppose that w = 0.5. Compute the mean and standard deviation of R. (b) Suppose that w = 0.75. Compute the mean and standard deviation of R. (c) What value of w makes the mean of R as large as possible? What is the standard deviation of R for this value of w? (d) What is the value of w that minimizes the standard deviation of R? 7. (Basic descriptive statistics with R) The dataset cps8.csv (on Canvas) is from the Current Population Survey, a monthly survey of workers carried out by the U.S. Bureau of Labor Statistics. This data is from the March 2008 survey. Each row in the data set corresponds to a person who responded to the survey. DATA DESCRIPTION: ahe is the average hourly earnings reported by the respondent, yrseduc is the number of years of education, female is a 0-1 indicator of whether the respondent is female (0 = no, 1 = yes), and age is the respondent’s age, northeast, midwest, south, and west are all 0-1 indicators of whether the respondent works in the northeastern, midwestern, southern, or western part of the U.S. note—Parts (c) through (k) require R. Turn in the R code you used, all in one file please, to Canvas. I should be able to reproduce your results from your code, so please check that your code really works and is consistent with the answers you’ve given before uploading. (a) Is this a cross-sectional, time series, or panel data set? Or, have I not give you enough information to tell? Explain your answer. (b) How many people/respondents are represented in this survey? (c) What proportion of respondents are female? (d) What is the average ahe among females? Among males? (e) What the variance of ahe among females? Among males? (f) Plot ahe (y-axis) versus age (x-axis) Does there appear to be an association between ahe and age? Describe what you see. (g) Compute the mean ahe for each value of age and add these to the plot. (h) Compute the covariance and correlation between ahe and age. (i) Plot ahe (y-axis) versus yrseduc (x-axis) Does there appear to be an association between ahe and yrseduc? Describe what you see. (j) Compute the mean ahe for each value of yrseduc and add these to the plot. (k) Compute the covariance and correlation between ahe and yrseduc. 8. (DataCamp) Complete the DataCamp course “Introduction to R.” Upon completion you’ll get a certificate. Upload your DataCamp certificate to Canvas to receive full credit for this exercise. 2