Psychometric evaluation of the altered states of consciousness rating scale (OAV)

This study conducted a psychometric evaluation of the altered states of consciousness rating scale (OAV). Data (n=591) was pooled from 43 different studies. The analyses showed that the originally proposed model did not fit the data well and new (lower order) scales were constructed, which have better psychometric properties.

Abstract

Background: The OAV questionnaire has been developed to integrate research on altered states of consciousness (ASC). It measures three primary and one secondary dimensions of ASC that are hypothesized to be invariant across ASC induction methods. The OAV rating scale has been in use for more than 20 years and applied internationally in a broad range of research fields, yet its factorial structure has never been tested by structural equation modeling techniques and its psychometric properties have never been examined in large samples of experimentally induced ASC.

Methodology/Principal Findings: The present study conducted a psychometric evaluation of the OAV in a sample of psilocybin (n = 327), ketamine (n = 162), and MDMA (n = 102) induced ASC that was obtained by pooling data from 43 experimental studies. The factorial structure was examined by confirmatory factor analysis, exploratory structural equation modeling, hierarchical item clustering (ICLUST), and multiple indicators multiple causes (MIMIC) modeling. The originally proposed model did not fit the data well even if zero-constraints on non-target factor loadings and residual correlations were relaxed. Furthermore, ICLUST suggested that the “oceanic boundlessness” and “visionary restructuralization” factors could be combined on a high level of the construct hierarchy. However, because these factors were multidimensional, we extracted and examined 11 new lower order factors. MIMIC modeling indicated that these factors were highly measurement invariant across drugs, settings, questionnaire versions, and sexes. The new factors were also demonstrated to have improved homogeneities, satisfactory reliabilities, discriminant and convergent validities, and to differentiate well among the three drug groups.

Conclusions/Significance: The original scales of the OAV were shown to be multidimensional constructs. Eleven new lower order scales were constructed and demonstrated to have desirable psychometric properties. The new lower order scales are most likely better suited to assess drug induced ASC.”

Authors: Erich Studerus, Alex Gamma & Franz X. Vollenweider

Summary

The OAV questionnaire has been used in research on altered states of consciousness for more than 20 years, but its factorial structure has never been tested by structural equation modeling techniques.

A psychometric evaluation of the Oceanic Boundlessness and Visionary Restructuralization variables was conducted in psilocybin, ketamine, and MDMA induced ASC. The results indicated that the new factors had improved homogeneities, satisfactory reliabilities, discriminant and convergent validities, and to differentiate well among the three drug groups.

The OAV was shown to be multidimensional and eleven new lower order scales were constructed.

Studerus, Gamma, and Vollenweider (2010) evaluated the OAV.

e12412. Editor: Vaughan Bell

This article is an open-access article distributed under the terms of the Creative Commons Attribution License. The authors declare that no competing interests exist.

Introduction

Altered states of consciousness (ASC) are short-lasting deviations from normal waking consciousness, and are usually self-induced (eg, by hallucinogenic drugs, meditation, hypnosis). They are often accompanied by unusual mental experiences and are used to generate hypotheses for psychiatric research.

Dittrich’s ASC questionnaires have been used in approximately 70 experimental studies to assess ASC induced by psychoactive drugs, as well as ASC induced by endogenous psychosis, sensory deprivation, mind machines, and monochrome sounds. The APZ contains 158 dichotomous items covering a broad range of phenomena potentially occurring during ASC. It was originally developed to test the hypothesis that ASC have features in common that can be parsimoniously described on stable (ie, etiology-independent) major dimensions.

Dittrich tested his hypothesis in a series of experimental studies, in which healthy volunteers were treated with hallucinogens, nitrous oxide, sensory deprivation, and sensory overload. Dittrich [3,4] identified 72 items that were etiology-independent and that differentiated significantly between the treatment and control conditions. These items were then factored into three oblique primary and one secondary etiology-independent dimensions, which were termed oceanic boundlessness (OBN), dread of ego dissolution (DED), and visionary restructuralization (VRS). The APZ secondary scale consists of 72 items and measures consciousness alteration in subjects who have experienced ASC within the past 12 months.

Although reliabilities and validities of APZ scales were considered to be acceptable, several weaknesses were also recognized. Therefore, a psychometrically improved version called OAV was developed. The OAV was primarily derived from 72 etiology-independent items of the APZ, but the response format was changed to visual analogue and several items were re-worded. The VRS dimension was conceptually widened by incorporating items that measure an increased imagination, associations, and memory retrieval. The original OAV validation study indicated that the questionnaire revision successfully improved several psychometric properties, and that the scales measured similar constructs in both questionnaires.

Although dimensional analyses of the APZ and OAV questionnaires revealed three primary ”etiology-independent” dimensions of ASC, further dimensions were found to be specific to certain ASC-inducing agents. These additional dimensions included acoustic alterations and hallucinations. Dittrich hypothesized that two etiology-dependent dimensions, auditory alterations and vigilance reduction, could be reliably measured, and a questionnaire was constructed and validated to measure these two dimensions.

Dittrich’s ASC questionnaires were reliabilities and validities demonstrated and a new version, the 5D-ASC, was published in 1999. It contains 16 and 12 BETA items measuring the AUA and VIR dimensions, respectively.

Although Dittrich concluded that his original hypotheses on ASC have survived considerable falsification testing, the APZ questionnaire has serious methodological limitations, which may have led to the extraction of pseudofactors that reflect similar item difficulty rather than similar item content. Dittrich’s original investigation was limited to examining the similarity of factor loadings across different ASC induction methods and languages, and thus could only provide evidence for the weakest forms of factorial invariance. However, even for the assessment of these weak forms of factorial invariance, the use of similarity measures is problematic. Dittrich [21] used item aggregates to assess factor pattern similarity across different ASC induction methods, but this is problematic because it ignores item dimensionality.

Dittrich’s original investigation of ASC may have been biased by the use of a specific set of items, and his investigation was never repeated in other independent sets of items.

Studies that re-examine the psychometric properties of Dittrich’s ASC rating scales after their first publication are scarce, and those that exist were based on very limited sample sizes. Furthermore, no studies have previously used modern latent-variable approaches to analyze these scales.

Previous psychometric investigations have analyzed only major dimensions, and have not explored potential lower order factors or so-called facets. This research explores these issues, and provides two validated instruments that measure similar subjective experiences.

To overcome methodological limitations of previous investigations, we performed a psychometric evaluation of the OAV in a relatively large sample of subjects describing experiences of ASC that were experimentally induced by psilocybin, ketamine, or MDMA. The newly constructed subscales were compared with the original scales.

Ethics Statement

All pooled studies were approved by the ethics committee and all subjects gave their written informed consent prior to participation.

Samples and Data Collection Procedures

The samples used in the present investigation were obtained from 43 experimental studies conducted at our research facility between 1992 and 2008 involving psilocybin, ketamine, or MDMA administration to healthy volunteers. All subjects were carefully screened before admission to the studies.

In 22 studies, subjects received placebo and 1 – 4 different doses or combinations of psychoactive drugs in 2 – 5 experimental sessions. The present investigation used data from 327, 162, and 102 experimental sessions, respectively, of which 591 were psilocybin, ketamine, and MDMA administered alone.

Because some studies involved multiple drug sessions and some subjects participated in more than one study, some statistical procedures rely on the assumption of independency of observations. We also analyzed samples that included only one experimental session per subject.

Measures

In experiments, subjects were asked to rate their experience of drug induced altered states of consciousness using the OAV or 5D-ASC questionnaires. The OAV and 5D-ASC contain the same items, but the 5D-ASC contains additional items.

Subjects responded to statements describing experiences of ASC by placing marks on horizontal visual analogue scales of 100 millimeters length. The responses were scored by measuring the millimeters from the low end of the scale to the subject’s mark.

In most studies, the OAV and 5D-ASC were completed during or shortly after the drug effects peaked. In some studies, the questionnaires were completed at multiple time points.

The EWL-60-S is a German self-report rating scale composed of 60 adjectives that can be grouped into 15 subscales. It has been found to be well suited to measure short-term changes of mental states induced by psychoactive drugs, psychological stress, and embodying of emotion.

In the present investigation, the EWL-60-S was used to assess the convergent and discriminant validity of the OAV scales. The internal consistencies of the EWL-60-S subscales were mostly good to excellent.

The State-Trait-Anxiety Inventory – State version (Form X; STAI-S) is a popular self-report rating scale that contains 10 items describing symptoms of anxiety and 10 items describing the absence of anxiety. However, the STAI-S has been criticized for its inability to adequately discriminate between symptoms of anxiety and depression.

The OAV contains subscales that assess symptoms of anxiety and well-being. The STAI-S was used to assess convergent and discriminant validity of the OAV in 56 experimental drug sessions.

Statistical Analysis

The original hypothesized factorial structure of the OAV was tested by CFA and ESEM using Mplus Version 5.2. ESEM allows for an exploratory factor analysis (EFA) while at the same time taking method effects into account.

In order to explore the adequacy of the hypothesized three-dimensional solution, the appropriate numbers of factors were examined using several tests and algorithms.

Because it was impossible to achieve a well fitting simple structure CFA model with clearly defined factors using the traditional EFA approach, we used cluster analysis as an alternative heuristic for initial CFA model specification.

We applied Revelle’s ICLUST procedure, which hierarchically clusters questionnaire items using correlations corrected for attenuation as a proximity measure and the size of the reliability coefficients Cronbach’s a and Revelle’s b as stopping rules. The ICLUST procedure provides unique diagnostic and interpretative information not available in conventional approaches of scale construction.

A CFA model was initially specified with correlated latent factors and refined by dropping items with high cross-loadings.

After establishing a well fitting CFA model, we used MIMIC modeling to examine population heterogeneity and differential item functioning across different drugs, questionnaires, settings, and sexes. We found that measurement non-invariance was present in all samples, and therefore group comparisons cannot be meaningfully interpreted.

In the present study, we first examined a MIMIC model in which only the latent factors were regressed on the covariates. The model included five binary variables and a three-level nominal variable, and the minority or focal group contained at least 100 cases.

We used the free baseline designated anchor approach to detect differential functioning (D – F) items. This procedure involved two steps: first, anchor items were identified by regressing one item at a time on the five grouping variables, and second, all other items were tested for DIF using likelihood ratio difference tests. After identifying all D – F items, a MIMIC model was fitted, but non-significant direct effects were dropped.

Because the data set contained non-independent observations and was positively skewed and kurtotic, latent factor models were fitted using the Robust Maximum Likelihood (MLR) estimator in combination with the ”Complex” option in Mplus.

Most OAV items were positively skewed and kurtotic, and piled up at the lower end of the scale and modestly at the upper end. The results may be biased by strong floor- or ceiling effects, so we cross-checked our results.

The fit of the latent factor models was evaluated by Bentler’s comparative fit index, Tucker-Lewis index, and RMSEA. A cut-off value close to 1.0 was considered suitable for the weighted root mean square residual.

We did not use Cronbach’s a for assessing scale reliability because the newly constructed scales did not meet the assumption of essential tau-equivalence, and because the original OAV scales were non-congeneric. Instead, we used rSEM, which was supplemented by confidence intervals found by the delta-method. For scales not meeting the assumption of unidimensionality, scale reliability was assessed using McDonalds vH and vT. Revelle’s b and Cronbach’s a were also calculated as indicators of homogeneity, and confidence intervals for vH and vT were calculated using the method described by Duhachek and Iacobucci.

Criterion validities were assessed for the original and newly constructed OAV scales by correlating them with subscales of the EWL-60-S and the STAI-S.

Results

In contrast to the hypothesis, correlations between items of opposite affective valence with L-shaped bivariate distributions and between latent constructs measuring opposite affective valence were not more negative when estimated by polychoric instead of product-moment correlations.

Table 2 provides fit indexes from a series of latent factor models testing Bodmer’s originally hypothesized factorial structure of the OAV. Although the model fit significantly better than the original model, the comparative fit indexes were still unacceptable. We next tested geomin- and quartimin-rotated 3-Factor-ESEMs with and without method effects, and found that the fit was significantly improved, but the overall fit was still relatively poor. The geomin-rotated ESEM without method effects showed that 59 of 66 items had their highest loading on the hypothesized factors, and that 28 items demonstrated also significant cross-loadings. The geomin-rotated ESEM with method effects showed a considerably different pattern of factor loadings.

We tested a bi-factor model for the APZ and OAV, but the fit index was still unacceptable. We also modeled each factor separately, and the results indicated that VRS is the most heterogeneous factor.

The Optimal Number of Factors to Extract

Although several methods were used to determine the optimal number of factors to extract from the OAV questionnaire, none indicated a three-dimensional solution. The scree test, MAP-test, and VSS criterion for complexity one and two favored one- and two-factor solutions, respectively.

The optimal number of factors for the higher level scale was determined by testing the fit of ESEMs with a varying number of factors.

Construction of new OAV Scales

Although ESEMs with 11 or more factors fit reasonably well, they were not suitable for initial CFA model specification because they contained several poorly defined factors and a relatively large number of items with significant cross-loadings. Instead, 11 item clusters were detected and used for initial CFA model specification. We tried to improve model fit by dropping items showing large modification indexes for cross-loadings and ambiguous item wordings. The final model contained 42 items instead of 47.

The final CFA model had 11 factors, which were mostly assigned to different factors. The correlations between the latent factors and the original OAV scales were shown in Table 3.

A Monte Carlo analysis was performed to assure that the parameter estimates were accurate and that the statistical power was high enough to detect significant effects. The analysis demonstrated that the parameter estimates were stable and powerful.

MIMIC Modeling

The no-DIF MIMIC model showed only slightly reduced global model fit relative to the final CFA model, but the full baseline designated anchor approach to DIF detection revealed nine significant direct effects. The final MIMIC model fitted significantly better than the no-DIF model.

The nine direct effects were due to measurement non-invariance between the MDMA and ketamine groups, gender and questionnaire version covariates, and the two drug contrasts, but the impact of these effects on the estimated group differences can be considered low.

The effects of psilocybin and ketamine on scales measuring visual alterations, insights and spiritual experiences are shown in Table 4. Females reported more impairment in control and cognition and slightly more/stronger experiences of disembodiment and unity than males. The OAV questionnaire measured increased changed meaning of percepts, insightfulness, blissful state, and spiritual experiences compared to the 5D-ABC questionnaire. However, the effects of the PET setting might have been confounded by different drug doses. Although we controlled the effects of different drug doses in a separate MIMIC model, the results suggested that the effects of the PET setting were only slightly confounded by different drug doses.

Reliability Assessment

The results of the reliability assessment of the original and new OAV scales show that although the original scales are multidimensional, the general factors dominated these scales and explained more than 70% of the variance in the OBN, DED and VRS scales.

Because the total scale explained 60% of the variance, and the OBN, DED, and VRS scales explained even more variance, the calculation of sum scores from these scales could be justified.

When compared with the old OAV scales, the new OAV scales showed higher reliabilities when rSEM was used as an estimator of scale reliability. This was because the new OAV scales were more homogeneous than the old OAV scales.

Validity Assessment

The new OAV scales correlated better with similar and dissimilar EWL-60-S subscales than the old OAV scales did. The new OAV scales were also more specific, with the blissful state scale correlating with heightened mood.

Pearson correlations between the OAV and STAI-S scales showed that the STAI-S total scale was significantly associated with DED, anxiety, and impaired control and cognition, whereas the STAI-S anxiety present scale correlated significantly with DED, anxiety, blissful state, and impaired control and cognition.

Figure 2 shows that the new OAV scales differentiated well among the three different drug groups.

Discussion

This study examined the factorial structure of the OAV questionnaire in a sample of drug induced ASC and found that the three dimensional structure originally proposed by the authors of the OAV was not supported by the data.

Although none of the three hypothesized OAV factors met criteria of unidimensionality, the results suggest that the VRS factor is the biggest source of misfit. The VRS factor had the lowest general factor saturation.

Principal component analysis with varimax-rotation found that 10 of 18 VRS items loaded highest on the OBN factor in a sample of 93 endogenous psychotic patients. The wrongly assigned VRS items in the studies of Habermeyer and Bodmer [24,25], as well as in the present study, describe experiences of changed meaning of percepts, facilitated recollection, and insightfulness. Additionally, in the studies of Habermeyer and Bodmer [24], the wrongly assigned VRS items included items measuring complex imagery.

Although the VRS dimension was reduced to a set of items tapping only visual alterations, it was still difficult to separate from the OBN dimension on a high level of the construct hierarchy, especially when potential method effects of similarly worded VRS items were taken into account.

The authors of the OAV discussed how to determine the most appropriate number of factors to extract from the OAV. They noted that psychological constructs have a hierarchical structure, and that factors have different levels of conceptual breadth.

The researchers decided to extract factors only on a high level of the construct hierarchy because they were primarily interested in the so called etiology-independent dimensions. However, they found that only two factors account for the variance between OAV items. The OAV scale showed strong general factor saturation, which is in agreement with the originally proposed general factor G-ASC. The total scale forms ambiguous correlations with other psychological constructs, but the general factor saturation justifies its use for the prediction of complex criteria.

This study has demonstrated that lower order scales can be constructed that are not only reliable, but also stable (measurement invariant) and valid. The new scales are more homogeneous than the old scales and provide a reasonably good fit to the data when modeled as congeneric factors in a simple structure CFA. Although some items showed DIF, the impact of DIF on the comparisons of latent factor means was small. This is important for the use of these scales in applied research. Although the new scales were less reliable than the old scales, they still showed relatively high reliabilities, especially when used to compare groups and not to make decisions about individuals.

The reliability index of the OAV scales was 0.8, and this was considered adequate. The correlations between the OAV scales and other psychological constructs were not affected by the lower reliability of the OAV scales. The new OAV scales had good convergent and discriminant validities, and differentiated well among the subjective effects of psilocybin, ketamine and MDMA.

The interpretation of the results with regard to Dittrich’s original hypothesis is complicated by the fact that the items of the OAV were pre-selected to be in accordance with this hypothesis. Consequently, the factorial structure of the OAV cannot provide independent evidence for the validity of Dittrich’s hypothesis.

Limitations

Because the sample was too small to split it in two halves and perform exploratory and confirmatory analyses on separate data sets, we have not cross-validated our results. Furthermore, measurement invariance and population heterogeneity of the new OAV scales should be investigated using multiple-group CFA.

The 5D-ASC contains 94 items, but this study has only analyzed the 66 items that it shares with the OAV. Future studies must clarify whether the 28 unique items can be split into many reliable and valid subscales.

Conclusions and Recommendations

We confirmed that the general factor (G-ASC) accounts for most of the common variance among OAV items, but our results only partially supported the hypothesized structure of group factors. We also demonstrated that the OBN, DED, and VRS scales are multidimensional constructs that can be split into many reliable and valid subscales.

Supporting Information

Distributional characteristics of the uncategorized OAV items, categorized OAV items, factor assignments in the exploratory structural equation model, optimal number of factors to extract, loading matrices, and Y-standardized regression coeffcients of the MIMIC model with dose predictors and DIF are presented.

PDF of Psychometric evaluation of the altered states of consciousness rating scale (OAV)