^{1}

^{*}

^{2}

^{†}

^{2}

^{3}

^{1}

^{1}

^{2}

^{3}

Edited by: Holmes Finch, Ball State University, United States

Reviewed by: Seock-Ho Kim, University of Georgia, United States; Kuan-Yu Jin, Hong Kong Examinations and Assessment Authority, Hong Kong, SAR China

^{†}Deceased

This article was submitted to Quantitative Psychology and Measurement, a section of the journal Frontiers in Psychology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Compositional items – a form of forced-choice items – require respondents to allocate a fixed total number of points to a set of statements. To describe the responses to these items, the Thurstonian item response theory (IRT) model was developed. Despite its prominence, the model requires that items composed of parts of statements result in a factor loading matrix with full rank. Without this requirement, the model cannot be identified, and the latent trait estimates would be seriously biased. Besides, the estimation of the Thurstonian IRT model often results in convergence problems. To address these issues, this study developed a new version of the Thurstonian IRT model for analyzing compositional items – the lognormal ipsative model (LIM) – that would be sufficient for tests using items with all statements positively phrased and with equal factor loadings. We developed an online value test following Schwartz’s values theory using compositional items and collected response data from a sample size of

Test of non-cognitive constructs, such as personality traits (

Most self-report questionnaires use the single-stimulus format to assess multidimensional, non-cognitive, latent traits. For example, the NEO Personality Inventories measure five personality traits using 240 items, each of which contains five response options that range from

The classical scoring of forced-choice format yields ipsative scores (

The fundamental differences between normative and ipsative scores are the referenced criterion and the explanation of the scores (

Several types of forced-choice items have been described in the literature, such as pairwise comparison (

Example of compositional items.

An empirical example of using compositional items in the social sciences is the Organizational Culture Assessment Instrument (OCAI), through which respondents are asked to allocate 100 points to four statements in each of the six items. Each statement in an item is designed to measure a distinct dimension of organizational culture (i.e., clan, adhocracy, hierarchy, and market). Aside from its popularity in assessing organizational culture, the OCAI has been revised for assessing classroom culture (

Several models have been developed to analyze ipsative tests with categorical data, including the Thurstonian item response theory models for forced-choice items (Thurstonian IRT models;

Although the TMC (

One way for test developers to overcome the non-full rank of loading matrix problem in Thurstonian IRT is to increase the number of latent traits measured in the test (

Including the negative keyed statements in forced-choice items is another way to satisfy the full-rank requirement. However, it is still a risk to meet the identification problem when the number of dimensions is low.

In contrast, the Rasch ipsative model (RIM;

In the present study, we mimic the works of RIM study but for the compositional data. More specifically, this study is aimed at developing a measurement model for multidimensional compositional items as an alternative version of Thurstonian IRT models. The new model is mathematically nested in the TMC (

The remainder of this article is organized as follows: We first discuss the theoretical background and introduce the compositional analysis, along with

Compositional data are defined as a vector of _{1}, …, _{D}], where the sum of the components is a constant

To parametrically model compositional data, _{1}, …, _{D}] to the logarithm of the remaining

In psychological measurements,

As mentioned, the TMC (_{d}_{1}, …, _{D} denote the responses to statements 1, …, _{k} (_{D} can be written as follows:

where _{kD} is the additive log ratio transformation and is assumed to follow a multivariate normal distribution (_{kD} as follows:

where δ_{k}_{_}_{D}_{k}_{D}_{k}_{D}_{k}_{_}_{D}

Essential to the analysis of ipsative data is that only within-person comparisons, rather than between-persons comparisons, can be concluded from the scores (

The models in the Thurstonian IRT model framework (

The RIM is only used for analyzing discrete ipsative response data. In this section, a new model for the analysis of compositional items under the RIM framework is introduced. The model, that is, the lognormal ipsative model (LIM), the parameter estimation method, and the calculation of the approximate standard error, and the Fisher information function are all described. Point estimates (ipsative explanation) are then explained. Subsequently, we compare the new model and the Thurstonian model and present a method for evaluating the fit of the new model.

According to the additive log ratio transformation (_{1}, …, _{D}] with _{D} is arbitrarily selected from _{1}, …, _{D}. We can express _{k} and _{D} should be decided by the three effects of a person’s latent trait θ in the corresponding dimension, statement utility δ, and random error ε. For simplicity, we do not index the persons and items until necessary. Following the argument above, the LIM decomposes log(

Thus,

where θ_{k}_{D}_{k}_{D}_{kD}

To illustrate this model simply, let us take the item composed of statements 1 and 2 (i.e., _{1} and θ_{2}, which represents a person’s pattern of latent traits. Specifically, a larger θ_{1}–θ_{2} leads to a higher expected value of the log ratio of X_{1} to X_{2} (the positive slope in _{1}–θ_{2} is expected to give a higher value to X_{1} than to X_{2} across the different δ levels. Thus, the between-persons comparison is enabled on the pattern of θs. Moreover, a larger δ_{1}–δ_{2} (the higher line in _{1}/X_{2}), which is also monotonically increasing. The item with a positive δ_{1}–δ_{2} enables persons to give a higher value to X_{1} than to X_{2} across θ levels.

Expected log ratio of X_{1} to X_{2} across different levels of θ_{1}–θ_{2} and δ_{1}–δ_{2} in the LIM.

The LIM yields the unique utility for each individual statement that is different from the Thurstonian model yielding utilities (location parameters) for pairs of statements. As can be seen in Eq. 5, δ_{k}_{D}

For the model identification, the following two constraints are necessary. The first one is

The LIM has the property of specific objectivity. The sample-free and item-free properties of specific objectivity for compositional data analysis can be found in Appendix A. Suppose persons _{kD} following from Eq. 5 is as follows:

The test-free measurement is demonstrated by comparing two persons such that

This expression is independent of the item parameters δ_{i}_{(}_{k}_{)} and δ_{i}_{(}_{D}_{)}, which is the requirement of test-free property for compositional data. The measurement satisfies the test-free property. Similarly, to demonstrate the sample-free of the LIM, when person

Then,

This expression is independent of the person parameters θ_{n}_{(}_{k}_{)} and θ_{n}_{(}_{D}_{)}, which is the requirement of sample-free for compositional data. Therefore, the LIM is a sample-free model and satisfies the property of specific objectivity. Conceptually, specific objectivity in the LIM implies that the comparison between persons’ patterns of the latent traits (i.e., comparison between person _{n}_{(}_{k}_{)}–θ_{n}_{(}_{l}_{)} and person _{m}_{(}_{k}_{)}–θ_{m}_{(}_{l}_{)}) is under a scale with the measurement property of test-free (see Appendix A), and that the comparison between statement utilities (i.e., comparison between statement _{(}_{k}_{)} and statement _{(}_{l}_{)}) remains stable even when different persons take the test.

This study used the Bayesian approach of the Markov chain Monte Carlo (MCMC) algorithm for parameter estimation in the analysis of the empirical data. The method of posterior predictive model checking (PPMC) was adopted in the evaluation of model–data fit, and it was administered effectively in the MCMC iterations (^{2} (_{2} statistic (

The MCMC estimation utilized the Bayesian framework and was sampled from the joint posterior distribution of the parameters. To make it applicable to the new model, the joint posterior distribution, given the whole data set

where _{n}

For the LIM, the likelihood function is

Using the Metropolis-Hastings algorithm with the Gibbs sampling procedure allows for the sampling and the obtaining of the full conditional distributions of parameters _{–}_{D}_{–}_{D}_{1}, …, μ_{D}_{–}_{1}]^{T} and a covariance matrix

A popular use of the expectation-maximization algorithm (EM algorithm) for the IRT model (

In Bayesian estimation, the standard error of estimates can be obtained by calculating the variance of posterior distribution. In maximum likelihood estimation the diagonal elements of the square root of the inverse Fisher information represent the approximate standard error of the multidimensional estimates (

Given the response vector

where _{id_D} = _{id}/_{iD}) is the log ratio of response _{id} to response _{iD} in item _{d}_{D}_{id}_{iD}^{2} is the residual variance. The second derivative of the log-likelihood for the _{k}^{2} and the current test length

Note that the 2 × 2 matrix in

where

The approximate standard error of the estimate is the diagonal elements of the square root of the inverse Fisher information:

The standard error is constant regardless the values of θ and δ. To illustrate the standard error in the compositional responses,

The 1,000 replications of a single person’s responses to the two compositional items, δ_{A} = [0, −1.5,−1.5] and δ_{B} = [1.5, 0, 0], under the lognormal framework.

The proposed model retains the nature of ipsative scores, that is, the sum of scores within persons is constant. The comparison between persons can be made in terms of aspects of the profile or of differentiation, and not of individual traits. The profile aspect means that the explanation is in terms of the pattern of latent traits for each person. The differentiation aspect means that the explanation is made in terms of the range of latent traits for each person. For example, one person’s personality scores may have a range of 3.0, which is larger than a range of 0.5 in another person’s personality scores.

As the explanation of compositional data is based on a person’s profile, that is, how the dimensions for the person are differentiated, the scores are measured at a ratio level. The

Note that a zero response causes the convergence problem in parameter estimations. In the LIM, the log ratio _{kD}, including zeros, yields either negative infinity (when _{k} = 0, log(_{k}/_{D}) = −∞) or positive infinity (when _{D} = 0, log(_{k}/_{D}) = ∞). One effective solution to the problem of zero responses is to use the imputation method (

Both the TMC (

The LIM and the TMC serve different measurement purposes. The TMC aims to recover person parameters that represent normative latent traits that can be compared (i.e., a single trait by a single trait) between persons. By contrast, the LIM aims to obtain the measures with the property of specific objectivity, to recover the person parameters always when all statements using positive keys, and to converge all the time. A model with the good parameter-recovery in equally keyed statement design and achieves a perfect convergency rate would be highly desired characteristics for practitioners who may face the convergence problem in the application of TMC.

The TMC yields utilities (location parameters) for pairs of statements, whereas the LIM yields the unique utility for each individual statement. As can be seen in Eq. 3 for TMC, only δ_{k_D}_{k}_{D}

The drawback of using the LIM is that it maintains the ipsative nature of the constant sum of latent traits such that the test users must explain the test scores using the ipsative way (see the section of “Explanation of Point Estimates”). The between-person comparison of scores on a single dimension could not be made. With the TMC, although test user can explain the scores in the normative way, there is always the risk of bias in the latent trait estimates in the condition of equally keyed statement design and the difficulty of the convergence problem, even when the response data perfectly follows the TMC (

This section introduces the model fit diagnostic methods used to examine the data fit for the LIM. The PPMC method for compositional data was adopted in the data analysis. PPMC works under the Bayesian theorem. Let π denote the parameters in the model, that is, either person ability or statement utility. _{obs}, which can be obtained using Bayesian probability:

where _{obs}) is the posterior distribution of the parameter π given the observed data _{obs}, and _{obs} | π) is the likelihood function of the fitted model. In PPMC, the posterior can be used to draw the replicated data, _{rep}. As _{rep} is generated based on the parameter π of the fitted model, it implies a prediction of the response data if the model is true. To assess the model fit, the discrepancy statistic ξ, which is a function of the data set _{obs}) is the discrepancy statistic of the observed data, and ξ(_{rep}) is that of the replicated data. PPMC demonstrates a poor model–data fit when the value of ξ(_{obs}) is out of the credible interval of ξ(_{rep}) distribution, whereas it shows a good model–data fit when the value of ξ(_{obs}) is within the credible interval of ξ(_{rep}) distribution (_{rep}) > _{obs})]is calculated, denoted by

where _{rep}; and _{rep}) ≥ ξ(_{obs}) is true, and 0 otherwise. Generally, a

The sum of the profile differentiation can be chosen as the discrepancy statistics in this study because the purpose of compositional items is to measure the profile distribution of the traits within persons. A person

where

To illustrate the application and implication of LIM, we created an online test using compositional items to collect real responses and analyzed the collected compositional data using both the LIM and the TMC. This empirical study included the development of an online value test using compositional items, real data collection, and data analysis.

An online value test based on Schwartz’s values theory with compositional items was developed. A total of 32 statements were modified from the World Values Survey Online (

Each of the four dimensions was measured by 8 of the 32 statements. Statement numbers 1–8 measure Self-Transcendence, 9–16 measure Conservation, 17–24 measure Self-Enhancement, and 25–32 measure Openness to Change. Based on the partial linkage design, we developed 40 compositional items. The linkage design (assignment of statements to items) of this survey is presented in ^{1}.

Statement numbers in the online value test with compositional items.

Item number | ST | CS | SE | OC | Item number | ST | CS | SE | OC |

1 | 1 | 9 | 17 | 25 | 21 | 2 | 9 | 24 | 31 |

2 | 2 | 10 | 18 | 26 | 22 | 7 | 10 | 17 | 32 |

3 | 3 | 11 | 19 | 27 | 23 | 8 | 15 | 18 | 25 |

4 | 4 | 12 | 20 | 28 | 24 | 1 | 16 | 23 | 26 |

5 | 1 | 10 | 19 | 28 | 25 | 1 | 11 | 21 | 31 |

6 | 2 | 11 | 20 | 25 | 26 | 3 | 13 | 23 | 25 |

7 | 3 | 12 | 17 | 26 | 27 | 5 | 15 | 17 | 27 |

8 | 4 | 9 | 18 | 27 | 28 | 7 | 9 | 19 | 29 |

9 | 5 | 13 | 21 | 29 | 29 | 2 | 12 | 22 | 32 |

10 | 6 | 14 | 22 | 30 | 30 | 4 | 14 | 24 | 26 |

11 | 7 | 15 | 23 | 31 | 31 | 6 | 16 | 18 | 28 |

12 | 8 | 16 | 24 | 32 | 32 | 8 | 10 | 20 | 30 |

13 | 5 | 14 | 23 | 32 | 33 | 1 | 12 | 23 | 26 |

14 | 6 | 15 | 24 | 29 | 34 | 2 | 13 | 24 | 27 |

15 | 7 | 16 | 21 | 30 | 35 | 3 | 14 | 17 | 28 |

16 | 8 | 13 | 22 | 31 | 36 | 4 | 15 | 18 | 29 |

17 | 6 | 13 | 20 | 27 | 37 | 5 | 16 | 19 | 30 |

18 | 3 | 14 | 21 | 28 | 38 | 6 | 9 | 20 | 31 |

19 | 4 | 11 | 22 | 29 | 39 | 7 | 10 | 21 | 32 |

20 | 5 | 12 | 19 | 30 | 40 | 8 | 11 | 22 | 25 |

Convenience sampling was used to administer the surveys. We requested student helpers to complete the test and distribute the survey website link to their peers. Seven student helpers who were enrolled in an undergraduate degree program at the Education University of Hong Kong at the time of the study were hired. Each helper was requested to distribute the survey link to at least 60 friends. The total number of participants was 577 persons aged 12–52 years. The sample comprised 190 males and 387 females.

In the data preprocessing, zero responses were assigned a constant value of 0.65 in accordance with

Demographic variables of the respondents in the analyzed samples.

Gender | Religion | ||

Male | 167 | Catholic | 98 |

Female | 345 | Islam | 5 |

Hinduism | 7 | ||

Chinese tradition | 24 | ||

Buddhism | 15 | ||

11∼15 | 4 | No Religion | 361 |

16∼20 | 232 | Other | 2 |

21∼25 | 255 | ||

26∼30 | 13 | ||

31∼35 | 3 | Elementary | 3 |

36∼40 | 1 | High School | 24 |

41∼45 | 1 | Undergraduate | 462 |

46∼50 | 2 | Postgraduate | 15 |

51∼ | 1 | Other | 8 |

After data cleaning, the real data collected from the survey were fitted to the LIM and the TMC. To apply the TMC approach and replicate

The MCMC algorithm was used for parameter estimation and implemented using the JAGS software (_{rep}.

The convergence of the MCMC estimation was examined by tracing the posterior sampling of the parameters. We calculated the potential scale reduction factor (PSR;

One purpose of the empirical study is to present an example of the practical interpretations of the results under the proposed model. To achieve this, the descriptive analysis of the estimates of a person’s latent traits and statement utilities for the proposed model are presented first, and then the correlations between the raw scores and the latent trait estimates in the LIM are shown to give an idea of the latent traits from the proposed model. We selected two persons to illustrate the values of the latent traits and their meanings in practice.

The reliability statistics were calculated. The variance of the latent trait estimates and their standard errors were used to calculate reliability. The population error variance of each dimension was obtained by the squared standard error. In accordance with the classic definition that the proportions of variance in the intended traits are accounted for by the true score, reliability was calculated as follows:

The LIM was expected to have reliabilities above the acceptable level of 0.7.

Four aspects of the results are presented and discussed in this section: (1) convergence of the MCMC, (2) estimates of the statement utilities, (3) correlation between the latent trait estimates and the raw scores, and (4) model–data fit statistics of the new model. The convergence of the MCMC was evaluated first.

Trace of posterior sampling for the two models in MCMC.

The results of the statement utility estimates are reported.

Utility estimates of the statements at the 97.5 and 2.5% quantiles.

The correlations between the latent trait estimates and the raw scores are 0.92 for Self-Transcendence, 0.95 for Conservation, 0.94 for Self-Enhancement, and 0.96 for Openness to Change (

Relationship between a person’s raw score and trait estimate in the lognormal ipsative model.

Summary of the means, standard deviations, and correlations of the four measures in the raw score and in the lognormal ipsative model.

Correlations |
|||||

Measure | Mean | SD | 1 | 2 | 3 |

1. Self-Transcendence | 0.27 | 0.06 | − | ||

2. Conservation | 0.22 | 0.06 | −0.01 | − | |

3. Self-Enhancement | 0.21 | 0.07 | −0.60 | −0.20 | − |

4. Openness to change | 0.30 | 0.07 | −0.21 | −0.59 | −0.35 |

1. Self-Transcendence | 0.13 | 0.34 | − | ||

2. Conservation | −0.13 | 0.36 | −0.05 | − | |

3. Self-Enhancement | −0.26 | 0.51 | −0.74 | −0.27 | − |

4. Openness to Change | 0.26 | 0.37 | −0.16 | −0.56 | −0.44 |

To illustrate the meaning of the latent traits, we took person no. 003 as an example. Using the estimates in the LIM, person no. 003 has the highest estimated value (1.22) for Self-Transcendence among the four traits. This means that he/she prefers to help others, loves nature, and believes in protecting the weakest members of society. However, he/she has the lowest estimated value (−1.52) for Self-Enhancement among the four traits. This finding indicates that he/she places less importance on emphasizing his/her success and dominance over others.

By contrast, person no. 410 is an example of low differentiation. His/her four trait values are close to each other. In the LIM, person no. 410 has the vector of latent traits

To assess the model–data fit, the PPMC method was employed. The results show that the model provides a good model–data fit. The observed sum of the profile differentiation across persons located at the 30.7th percentile of the replicated sum of the profile differentiation. It is located within the 95% credible interval; therefore, the proposed model has a good model–data fit according to the criterion established in the methodology section. The error variance estimates is 0.087. The reliabilities of the four dimensions in the LIM are 0.85, 0.86, 0.93, and 0.96. All reliabilities are higher than 0.85.

The simulation study aims to investigate the precision of the parameter estimation for both the LIM and the TMC when the response data followed the LIM, which is a special case of the TMC with all factor loadings fixed to one and ipsative constraints. It means the TMC definitely met the non-full rank of factor loading problem in our simulation. Since the TMC often fails to converge in this special case, the comparison of convergency rate between the two models is also presented in the simulation results.

A four-dimensional test using compositional items with four statements in different dimensions was conducted. The statement utility parameters were generated from −1.2 to +1.2 following a uniform distribution. The distribution of statement utilities corresponded to the result of empirical data analysis in this study. For identification, the sum of the utilities within the items was set to zero. The test length was manipulated in three different conditions, 10 items, 20 items, and 40 items.

To evaluate the effect of the sample size, we manipulated the sample sizes of 250, 500, and 1,000 persons. The persons were generated with four normative latent traits [θ_{1}, θ_{2}, θ_{3}, θ_{4}] following a multivariate normal distribution with means of [0, 0, 0, 0], and all standard deviations set to equal one. The correlation between latent traits was manipulated into six conditions: all correlations set to (1) 0.8, (2) 0.5, (3) 0.2, (4) 0, (5)–0.2, and (6) a real-world correlation matrix (_{1} + θ_{2} + θ_{3} + θ_{4} = 0 for each person), we subtracted the within-person mean of the latent traits from the person’s generated latent traits as the true person parameters.

The response data sets were generated by the LIM and replicated 100 times using the R2jags package (

To evaluate the recovery of item and person parameters, bias and the root mean square error (RMSE) of the estimates were employed and computed as follows:

where

In summary, this simulation experiment had 3 × 3 × 6 = 54 unique conditions across three test length conditions (i.e., 10, 20, and 40 items), four sample-size conditions (i.e., 250, 500, and 1,000 persons), and six intercorrelation conditions (i.e., 8, 0.5, 0.2, 0, −0.2, and a real-world correlation matrix). The bias and RMSE in Eq. 21 were averaged within the conditions, so that the means of the bias and RMSE were compared between conditions.

In simulation study, tests with different test lengths and sample sizes, and the covariances between latent traits following different scenario were manipulated to form different conditions.

Convergence rate obtained by fitting the lognormal ipsative model and the Thurstonian model to simulated data. The types of lines indicate the test lengths. Cov = the covariance between latent traits. Cov = Real indicates the variance-covariance matrix followed the empirical result in literature.

Average bias of item parameters obtained by fitting the LIM and the Thurstonian model to simulated data. The types of lines indicate the test lengths. Cov = the covariance between latent trait. Cov = Real indicates the variance-covariance matrix followed the empirical result in literature.

Average RMSE of item parameters obtained by fitting the lognormal ipsative model and the Thurstonian model to simulated data. RMSE is root mean square errors. The types of lines indicate the test lengths. Cov = the covariance between latent trait. Cov = Real indicates the variance-covariance matrix followed the empirical result in literature.

Bias of person parameters obtained by fitting the lognormal ipsative model and the Thurstonian model to simulated data. Cov = the covariance between latent trait. Cov = Real indicates the variance-covariance matrix followed the empirical result in literature.

RMSE of person parameters obtained by fitting the lognormal ipsative model and the Thurstonian model to simulated data. RMSE is root mean square errors. Cov = the covariance between latent trait. Cov = Real indicates the variance-covariance matrix followed the empirical result in literature.

To evaluate whether the structure among traits change when fitting the models, the relative biases and relative absolute biases of variance-covariance estimation for both models was observed. For the TMC, the relative bias ranged from −505.901 to 434.817, and the relative absolute bias ranged from 9.431 to 525.972. This implies that (1) the TMC converges to extremely wrong values and (2) TMC fails to recover the structure between traits. For the LIM, the relative bias ranged from −0.013 to 0.008 across all conditions. The relative absolute bias decreased when sample size increased, test length increased, and covariance was close to zero (see

Average relative absolute bias of variance-covariance estimation obtained by fitting the lognormal ipsative model to simulated data. Cov = the covariance between latent trait. Cov = Real indicates the variance-covariance matrix followed the empirical result in literature.

In this study, we developed a new model called the LIM for analyzing compositional items, overcoming the limitations of TMC (

This research made use of an online value survey comprising 40 compositional items that were developed according to Schwartz’s value theory to ascertain the applicability of the LIM, newly developed for the analysis of compositional data in empirical settings. The response data set used a sample size of 512 individuals, whose responses were analyzed by the proposed model since the TMC failed to be converged. The examination of the model–data fit through the PPMC method showed that the LIM had an acceptable model–data fit. The reliabilities were greater than 0.85 in the model.

When response data generated from LIM in which all items have equal keyed statements, the TMC had worse convergence rate than the LIM, especially in the condition of small sample size (250 persons) and short test length (10 items). The item parameters and person parameters were biased estimated in TMC. The LIM had the convergence rate close to 100% across conditions of different test lengths, sample sizes, and covariance between traits. It implies that using TMC in an all-equal-keys situation takes the risk of non-converged result in the model estimation which has been concluded by

The precision of item parameter estimation increases as the sample size increases. The precision of person parameter estimation increases as the test length increases. The precision of covariance between traits rises when the test length and sample size increase. Those findings corroborate the previous results in IRT modeling for ipsative data (

The high precision of parameter estimates obtained in the simulation study has demonstrated that the proposed model allows the practitioners to develop a compositional test with an equally keyed statement design which cannot be allowed in using TMC because of the biased estimation and convergency problem. Using tests containing equally keyed statements will help avoid many of the problems encountered when using negatively keyed statements. The first problem is the dimensionality problem, in which the two oppositely keyed statements (positively and negatively) imply different underlining factors (

Compared with the existing TMC by which the test takers can obtain their scores on a normative scale, the drawback of the new model is that it yields scores in an ipsative way, where the sum of scores across traits within persons is constrained to zero. Practitioners would prefer not to obtain test scores that only represent the differentiation or the relative locations between traits (i.e., explanation of ipsative scores) when they need to rank test takers by their scores for individual traits (i.e., explanation of normative scores). The between-person comparison in using ipsative scores, if required by practitioners, should be based on the person’s differentiation rather than scores on a single trait. In other words, in using the LIM, ranking test takers is only allowed in terms of the differentiation among traits. In choosing between the TMC and the LIM for analyzing compositional item response data there is a trade-off between avoiding the problems associated with the TMC, vis-à-vis convergence and estimation biases (in equal keys design tests), and the challenge associated with explaining ipsative scores in the LIM. At least, when the TMC has failed to converge in fitting a data, the LIM provides a solution for practitioners.

The LIM sacrificed the chance of generating the normative scores as the TMC did, but had specific objectivity. Of course, specific objectivity is not the “Holy Grail” of scale properties and, in fact, is inappropriate for the measurement purpose of Thurstonian IRT models when creating the normative scores for ipsative tests with forced-choice items. Specific objectivity would not be a big issue in using the TMC for compositional response data. This study does not reject using TMC – instead, it provided an alternative of the measurement model for compositional items other than the TMC as an option for practitioners.

In summary, key advantages of the LIM are the feature of specific objectivity and the possibilities to overcome the convergence problem in modeling compositional data. Nevertheless, several limitations of the LIM should be noted here: First, LIM does not allow a unidimensional structure. This limitation matches Brown’s conclusion that when the number of the dimension is one (

Moreover, our study has some limitations: First, no psychological theory has yet been proposed that supports the necessity for using compositional items in tests. The issue of whether a forced-choice format can avoid the effect of social desirability has been explored and reported in the literature (

As a recommendation for future studies, the application of multilevel models is useful in educational research. The multilevel model takes into account the nested data structure in the modeling process. For example, the method of the Program for International Student Assessment is to sample the first the schools and then the students nested within the schools.

The proposed LIM is a dominance model. As of today, probabilistic models for unfolded ipsative and continuous data have not yet been reported in the IRT literature. Unfolded response means that the respondents are expected to have higher scores for statements in which their latent trait values are closer to the utility. To model the continuous ipsative response data in the future, we suggest that, given the LIM, the ideal point concept for the probability of the response to statement

where _{k_D} is the log ratio of _{k} to _{D} (the response to statement _{k}_{D}_{k}_{D}_{kD}_{k}_{k}_{k} obtained in the function. This model is expected to have the same constraints as the LIM. Furthermore, the parameter estimation of the MCMC algorithm can be adopted. In future research, the parameter recovery of this new model will be evaluated using a simulation approach similar to that reported in this study.

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

The studies involving human participants were reviewed and approved by the Human Research Ethics Committee at the Education University of Hong Kong. Written informed consent to participate in this study was provided by the participants, and where necessary, the participants’ legal guardian/next of kin.

C-WC contributed for creating ideas, data collection, data analysis, and writing. W-CW contributed to the discussion of ideas and revision of the manuscript. MM and RS contributed to the revision of the manuscript. All authors contributed to the article and approved the submitted version.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

The Supplementary Material for this article can be found online at: