Patients and setting
Data was collected in the Department of Physical
Medicine and Rehabilitation at the Medical School of
Ankara University, Turkey. A total of 100 outpatients
(73 females, 27 males; mean age 55.3±16.7 years; range
24 to 84 years) with LBP were included in the study.
Patients with non-mechanical back pain resulting
from inflammatory, infectious, malignant or visceral
diseases or with a history of recent surgery that
could affect assessment were excluded. The Ethical
Committee of Ankara University approved the study
and all patients gave written informed consent.
Assessment
The assessment included the administration of the
ICF Core Set for LBP, the Roland-Morris disability
questionnaire (RMDQ) for LBP[6] and the Short Form-36
Health Survey version 1.0 (SF-36“).[7] The scoring of
the ICF Core Set for all patients was performed by
rehabilitation medicine specialists who were trained
in a structured one-day workshop organized by the
researchers of the WHO ICF Collaborating center at
the Ludwig-Maximilian University in Munich. The
questionnaires RMDQ and SF-36 were either selfcompleted
by literate patients or administered by
assessors to illiterates. Sociodemographic (age, gender,
years of education, employment status) and clinical
data (disease duration, etiology, disease severity) were
also recorded.
The ICF Core Set for LBP consists of 78 ICF
categories organized in four different components of
which BF contains 19 categories, BS five categories,
AP 29, and EF 25 categories. A generic qualifier
scale was used to evaluate the extent of a patient’s
problem in each of the ICF categories. The qualifier
scale of the components BF, BS and AP has five
response levels ranging from 0 to 4: no/mild/moderate/
severe/complete problem. The qualifier scale of the
component EF has nine response levels ranging from
−4 to +4. A specific EF can be a barrier (−1 to −4), or a
facilitator (+1 to +4), or can have no influence (0) on a
patient’s life. If a factor has an influence, the extent of
the influence (either positive or negative) can be coded
as mild, moderate, severe, or complete. For the Rasch analysis, scoring of EF items was done as 0 for −4, 1
for −3, 2 for −2, 3 for −1, 4 for 0, 5 for +1, 6 for +2, 7
for +3 and 8 for +4. In addition, there are the response
options “8 (not specified)” and “9 (not applicable)” for
all ICF categories of all components.4 In our analysis,
“8 (not specified)” and “9 (not applicable)” responses
were accepted as missing values.
Physical disability due to LBP was assessed by the
RMDQ. It includes 24 items, each with a dichotomous
response category of yes or no. The scale has a total
score ranging from 0 to 24 with a high score showing
higher disability. The Turkish version of the RMDQ
was used.7
The health-related quality of life was evaluated
using the SF-36 questionnaire.8 It contains 36 items
that measure perceived health in eight scales (physical
functioning, role-physical, bodily pain, general health,
vitality, social functioning, role-emotional, and mental
health) with higher scores (range 0-100) reflecting
better perceived health. Additionally, two summary
scores can be obtained- the physical component
summary score and the mental component summary
score. The Turkish version of the SF-36 was used in the
study.9
Internal construct validity
The internal construct validity of each component
of the ICF Core Set for LBP ‘“BF and BS”, AP and EF
items’ was assessed by Rasch analysis. Rasch analysis is
the formal testing of an assessment or a scale against
a mathematical measurement model which defines
how interval scale measurements can be derived
from ordinal questionnaires.10-12 The Rasch model
assumes that the probability of a given respondent
affirming an item is a logistic function of the difference
between the item difficulty and the person ability
parameter. Master’s partial credit model (PCM) which
is an extension of the Rasch dichotomous model for
polytomous (more than two response categories) items
was used in this study.13
Common fundamental attributes of the Rasch model
were assessed.14 These are (i) the appropriate stochastic
ordering of response categories; (ii) fit of items and
persons to the model; (iii) test of the assumption of
the local independence of items, including response
dependency and unidimensionality; and (iv) the
presence of differential item functioning (DIF).
As one of the most common sources of item
misfit concerns respondents’ inconsistent use of these
response categories, the response categories should be examined for correct ordering of thresholds before
the evaluation of item fit where polytomous items are
involved. For an item with an appropriate ordering
of thresholds, thresholds should increase in their
location in a manner consistent with the increase in
the underlying trait being measured. When this does
not occur, the thresholds are said to be disordered, and
the categories may have to be collapsed to ensure that
this is the case.15
A range of fit statistics is used to test if the data
conform to Rasch model expectations. Two are
item-person interaction statistics transformed to
approximate a Z score representing a standardized
normal distribution. If the items and persons
fit the model, we would expect to see a mean
of approximately zero and a standard deviation
(SD) of one. The third is a summed chi-square
within groups defined by their position on the
trait where the overall chi-square for items is
summed to give the item-trait interaction statistic.
This tests the property of invariance across the
trait. A significant chi-square indicates that the
hierarchical ordering of the items varies across the
trait which compromises the required property of
invariance. In addition to these overall summary fit
statistics, individual person- and item-fit statistics
are presented as (i) residuals (a summation of
individual person and item deviations) and (ii) as a
chi-square statistic. Fit residuals between ±2.5 are
deemed to be adequate. These are summated within
ability groups to provide the basis of the analysis of
variance (ANOVA).14,15
A formal test of the assumption of
unidimensionality is undertaken by performing a
principal component analysis (PCA) of the residuals.
Items with the highest positive and negative
correlations on the first residual PC are used to
construct two smaller scales that are anchored to
the item difficulties of the main analysis.16 The
person estimates derived from these two subsets of
items are contrasted for each individual by a t-test.
A significant difference would be expected to occur
by chance in 5% of the cases. Consequently, the
percentage of t statistic outside the range ±1.96 is
reported together with a 95% binomial confidence
interval. This interval should overlap 5% for a nonsignificant
finding to confirm unidimensionality.
The assumption of local independence implies
that when the ‘Rasch factor’ has been extracted,
there should be no leftover patterns in the residuals. Performing a PCA analysis of the residuals obtained
from PCM tested this assumption. If a pair of items had
a residual correlation of 0.30 or more, one of the items
that showed a higher accumulated residual correlation
with the remaining items was eliminated.17
Items are also tested for DIF. In the framework
of Rasch measurement, the scale should be free of
item bias or DIF.18 Differential item functioning
occurs when different groups within the sample
(e.g., younger and older persons) respond in a
different manner to an individual item, despite
having equal levels of the underlying characteristic
being measured. For example, younger and older
patients with equal levels of disability may respond
systematically differently to a self-care item such as
getting dressed. DIF can be detected both statistically
and graphically. In the current analysis, DIF was
tested by age, gender, years of education and disease
duration.
Reliability
An estimate of the internal consistency reliability
of the ICF item sets was tested by both Cronbach’s
alpha19 and person separation index (PSI) from the
Rasch analysis.20 The PSI is equivalent to Cronbach’s
alpha. Usually a reliability of 0.70 is required for
analysis at the group level, and values of 0.85 and
higher for individual use.21
External construct validity
The external construct validity was assessed by
testing for expected associations of ICF item sets with
RMDQ and SF-36 through the process of convergent
construct validity.22 The degree of associations with
these outcome measures was analyzed by Spearman’s
correlation coefficient.
Sample size and statistical software
For the Rasch analysis, a sample size of 100 patients
will estimate item difficulty with alpha of 0.05 to
within ±0.39 logits.23 This sample size is also sufficient
to test for DIF where at alpha of 0.05 a difference of
0.39 within the residuals can be detected for any two
groups with beta of 0.20. Bonferroni correction was
applied to both fit and DIF statistics due to the multiple
testing.24 Statistical analysis was undertaken with SPSS
for Windows version 11.5, (SPSS Inc., Chicago, Illinois,
USA), Rasch analysis with RUMM2020 package.25