Often researchers deal with proportional dependent variables that are logit-linear. In other words,
where y is the observed number within some group (n) divided by the total number of observations (N). As an empirical example, in my own work, y is commonly the proportion of seats held by females in the national legislature, i.e. the number of women divided by the number of seats.
Here’s a hypothetical distribution of a logit-normal variable that was generated using Stata’s random number generator:
Note that the logit transformation only works for values that fall between zero and one. Zeros and ones are undefined. Thus, researchers utilizing closed interval proportional data will need to ‘winsorize’ their observations, making all of them slightly more than zero and slightly less than one. This introduces a certain amount of arbitrary bias into the model.
A common problem then, is interpreting the coefficients from this model, or the impact of a one-unit change in x on the dependent variable y, rather than on the logit-transformation of the dependent variable.
This is particularly complicated if we continue to think in terms of proportions rather than in ratios. In effect, the logit transformation of any proportion reduces to the natural log of the ratio of some in-group to some out-group. The in-group refers to the group we are interested in studying (n). In the previous example, females are the in-group. The out-group is basically everyone else or the reference category ( z = N – n ). When looking at female representation, the out-group is males.
Some simple rearranging will make it clear. Let n denote the number of members within the in-group, let z denote the number of members in the out-group, and let N represent the total number of observations (i.e. both groups combined).
Example using Stata
In 2008, Tripp & Kang published an article in Comparative Political Studies, which included data on women’s representation in 153 countries in 2006. They utilized a logit-linear model with the proportion of seats held by women as the dependent variable.
where rep2006 is the proportion of seats held by women in 2006. Note, like many others, Tripp and Kang observe zeros and thus winsorize these data to 0.01. Their model reduces to the log-linear model of the ratio of females to males in the legislature:
Using data from Kang’s website, I re-estimate their “Model C” in Stata. The raw output is below.
Of course, these are identical to the results reported by Tripp & Kang in Table 2 of their publication. Their dataset contains the logit-transformed variable, rep06. Alternatively, one can calculate a dependent variable that is logit-transformed by typing the following command:
. gen logit_y = logit(y)
where y is the observed PDV. Stata can calculate the inverse of the logit of some value using the display function:
. display invlogit([value])
and can also calculate an inverse logit variable using the generate command:
. gen invlogit_y = invlogit(y)
Rather than using Clarify to simulate predicted changes in the proportion of seats held by women, I recommend interpreting these results in terms of the predicted percentage change in the ratio of females to males.
The two key variables in Model C are quota, which is a dummy variable coded as one (1) if the country had any sort of gender quota in place during the previous election, and prelect which is coded as one (1) if the country utilized proportional representation during the previous election. Following the equation above, the coefficients for these variables can be interpreted in terms of a percentage change in the ratio of females to males.
Countries with proportional representation have on average, 53.8% higher ratio of females to males in the legislature, all else being equal.
To wrap up, here’s a graph comparing the observed values to the fitted values in each interpretation method. As you can see, the slopes of the two lines are nearly identical, with some slight differences due to error.