Patients and setting
Data was collected in the Department of Physical
Medicine and Rehabilitation at the Medical Faculty of
Ankara University, Turkey. A total of 215 outpatients
diagnosed as knee OA according to the American College
of Rheumatology criteria for the classification and
reporting OA of knee were included in the study
8.
Patients with concomitant uncontrolled or severe systemic
diseases that might affect their health status were
excluded. The study was approved by the Ethical
Committee of the Faculty of Medicine, Ankara University.
All patients gave informed consent and the study was
carried out in compliance with Helsinki Declaration.
Outcome measures
The assessment included the administration of the
HAQ-DI, MHAQ, the Western Ontario and McMaster
Universities Index of Osteoarthritis (WOMAC), the World
Health Organization Disability Assessment Schedule II
(WHODAS-II) and the Nottingham Health Profile (NHP).
The HAQ-DI contains 20 questions classified into eight
domains (items): dressing and grooming, arising, eating,
walking, hygiene, reach, grip and activities. There are four
possible responses for each question: without any
difficulty (0), with some difficulty (1), with much difficulty
(2), unable to do (3). The highest score reported by the
patient for any component question of each domain
determines the score for that domain unless aids or
devices are required. In case the need of aids or devices
the score is automatically raised to 2 when it is rated as 0
or 1. Then the HAQ-DI score is calculated as the average
of 8 domains (items) scores ranging between 0 and 3,
higher score showing more disability. The Turkish
adaptation was used in the study9.
The MHAQ is a subset of 8 questions taken from the 8
domains of the original HAQ-DI. It is scored by taking the
average of the 8 question scores, with a range of 0-3.
Scoring principle of each question is similar to HAQ-DI,
except that MHAQ does not consider aids or devices in
the scoring process. In the present study, “level of
difficulty” was assessed for the MHAQ. The MHAQ was
not administered as a separate questionnaire but was
scored from the HAQ-DI.
The WOMAC is a disease-specific index developed for
OA of the knee or hip10. It consists of 24 items in three
domains: pain (5 items), stiffness (2 items), and physical
function (17 items). There are five response options for
every question (‘0' none, ‘1' mild, ‘2' moderate, ‘3' severe
and ‘4' extreme) in Likert form. The maximum score is 20
for pain, 8 for stiffness, 68 for physical function and 96
for the total WOMAC. Higher scores indicate more or
worse symptoms, maximal limitations, and poor health.
The Turkish version of the WOMAC (version 3.1) scale was
used in this study11.
The WHODAS-II is a generic, multidimensional
disability questionnaire that includes 36 items in six life
domains: understanding and communicating (6 items),
getting around (5 items), self-care (4 items), getting
along with people (5 items), life activities (8 items), and
participation in society (8 items)12. It employs a fivepoint
rating scale on all items in which ‘1' indicates no
difficulty and ‘5' indicates extreme difficulty or inability
to perform the activity. Raw scores are transformed into
standardized scores. Total score and subscale scores range
between 0 and 100, with higher scores reflecting greater
disability. The adapted Turkish version of the WHODAS-II
instrument was used13.
The NHP is a generic health status measure developed
to record the perceived distress of patients in physical,
emotional, and social domains14. It comprises 38
statements (answered ‘yes' or ‘no') that form six sections:
physical mobility (8 items), pain (8 items), sleep (5 items),
emotional reactions (9 items), social isolation (5 items),
and energy level (3 items). The score on each section of
the NHP is the percentage of items affirmed by the
respondent (i.e., the number of ‘yes' responses multiplied
by 100 and divided by the number of items in that section). Possible scores could range from 0 to 100, with a
higher score indicating greater distress. The Turkish
version of NHP was used15.
Internal construct validity
Internal construct validity of HAQ-DI and MHAQ was
assessed by Rasch Analysis. Rasch analysis is the formal
testing of an assessment or an outcome measure against
a mathematical measurement model which defines how
interval scale measurement can be derived from ordinal
questionnaires16,17. The Rasch model assumes that the
probability of a given respondent affirming an item is a
logistic function of the relative distance between the
item difficulty and the person ability on a linear scale. The
model estimates person ability independent of the
distribution of the population, and item difficulty
independent of the person ability18. These are
requirements for obtaining interval scale estimates19.
Master's partial credit model (PCM) which is an extension
of the Rasch dichotomous model for polytomous (more
than two response categories) items was used in this
study20.
Common fundamental aspects to the Rasch model
were assessed21. These are 1) the appropriate ordering
of response categories and any necessary rescoring for
where polytomous items; 2) fit of items and persons to
the model; 3) test of the assumption of the local
independence of items, including response dependency
and unidimensionality; 4) the presence of Differential
Item Functioning (DIF).
Before evaluation of item fit, where polytomous
items are involved, the response categories should be
examined for correct ordering. For an item with an
appropriate ordering of thresholds, each response option
would demonstrate the highest probability of
endorsement at a specific range of the scale, with
successive thresholds found at increasing levels of the
construct being measured. The respondents' inconsistent
use of response options result in disordered thresholds
and usually, in these circumstances, the collapsing of
categories improves overall fit to the model22.
A range of fit statistics is used to test if the data
conform to Rasch model expectations. Two are item-
person interaction statistics transformed to approximate
a z score, representing a standardized normal distribution.
If the items and persons fit the model, we would expect
to see a mean of approximately zero and a standard
deviation (SD) of one. The third is a summed chi-square
within groups defined by their position on the trait,
where the overall chi-square for items is summed to give
the item trait interaction statistic, testing the property of
invariance across the trait. A significant chi-square
indicates that the hierarchical ordering of the items varies
across the trait, so compromising the required property
of invariance. In addition to these overall summary fit
statistics, individual person- and item-fit statistics are
presented, as (a) residuals (a summation of individual person and item deviations), (b) as a chi-square statistic,
and (c) as an analysis of variance (ANOVA) with the
residuals summed across the main effects of class intervals.
Fit residuals between ±2.5 are deemed to be adequate.
These are summated within ability groups to provide the
basis of the ANOVA analysis.
A formal test of the assumption of unidimensionality
is undertaken by performing a, principal component
analysis (PCA) of the residuals. Items with the highest
positive and negative correlations on the first residual
factor are used to construct two smaller scales, anchored
to the item difficulties of the main analysis23. The
person estimates derived from these two subsets of items
are contrasted for each individual by a t test. A significant
difference would be expected to occur by chance in 5%
of the cases. Consequently, the percentage of tests
outside the range ±1.96 is reported, together with a 95%
binomial confidence interval. This interval should overlap
5% for a non-significant finding to confirm
unidimensionality.
The assumption of local independence implies that
when the ‘Rasch factor' has been extracted, that is, the
main scale, there should be no leftover patterns in the
residuals. This assumption was tested by performing a
PCA analysis of the residuals obtained from PCM. If a pair
of items had a residual correlation of 0.30 or more, one
of the items that showed a higher accumulated residual
correlation with the remaining items was eliminated24.
Items are also tested for DIF. In the framework of
Rasch measurement, the scale should be free of item bias
or DIF25. DIF occurs when different groups within the
sample (e.g., males and females), despite equal levels of
the underlying characteristic being measured, respond in
a different manner to an individual item. For example,
men and women with equal levels of disability may
respond systematically differently to a self-care item such
as getting dressed. DIF can be detected both statistically
and graphically. In the current analysis, DIF was tested by
age, gender and duration of disease.
Reliability
Reliability of HAQ-DI and MHAQ was initially tested by
internal consistency which is an estimate of the degree to
which its constituent items are interrelated, and is assessed
by Cronbach's α26. Subsequently reliability was further
tested by the person separation index (PSI) from the Rasch
analysis. This is equivalent to Cronbach's α but has the
linear transformation from the Rasch model substituted
for the ordinal raw score27. Usually a reliability of 0.70 is
required for analysis at the group level, and values of 0.85
and higher for individual use28.
External construct validity
External construct validity was determined by testing
for expected associations of HAQ-DI and MHAQ with
WOMAC, WHODAS-II and NHP through the process of convergent construct validity29. In this study, the
degree of associations was analyzed by Spearman's
correlation coefficient.
Sample size and statistical software
For the Rasch analysis, a sample size of 215 patients
will estimate item difficulty, with α of 0.05, to within
±0.27 logits30. With an operational range of 3 logits for
the scale this degree of precision would represent
approximately half of a standard deviation, or with a 6
logit range, approximately one quarter of a standard
deviation31. This sample size is also sufficient to test for
DIF where, at α of 0.05 a difference of 0.25 within the
residuals can be detected for any 2 groups with Β of 0.20.
Bonferroni correction was applied to both fit and DIF
statistics due to the multiple testing32.