The p Factor Consistently Predicts Long-Term Psychiatric and Functional Outcomes in Anxiety-Disordered Youth

Objective: Pediatric anxiety disorders can have a chronic course and are considered gateway disorders to adult psychopathology, but no consistent predictors of long-term outcome have been identified. A single latent symptom dimension that reflects features shared by all mental health disorders, the p factor, is thought to reflect mechanisms that cut across mental disorders. Whether p predicts outcome in youth with psychiatric disorders has not been examined. We tested whether the p factor predicted long-term psychiatric and functional outcomes in a large, naturalistically followed-up cohort of anxiety-disordered youth. Method: Children and adolescents enrolled in a randomized controlled treatment trial of pediatric anxiety were followed-up on average 6 years posttreatment and then annually for 4 years. Structural equation modeling was used to estimate p at baseline. Both p and previously established predictors were modeled as predictors of long-term outcome. Results: Higher levels of p at baseline were related to more mental health disorders, poorer functioning, and greater impairment across all measures at all follow-up time points. p Predicted outcome above and beyond previously identified predictors, including diagnostic comorbidity at baseline. Post hoc analyses showed that p predicted long-term anxiety outcome, but not acute treatment outcome, suggesting that p may be uniquely associated with long-term outcome. Conclusion: Children and adolescents with anxiety disorders who present with a liability toward broad mental health problems may be at a higher risk for poor long-term outcome across mental health and functional domains. Efforts to assess and to address this broad liability may enhance long-term outcome. for populations have direct implications care, further longitudinal investigations clinical additional variables diversity different comorbid 17 Our in previous findings showing that emotion dysregulation (a proposed underpinning of predicted long-term, but not short-term, outcomes in anxious youth. 25 Taken together, these results suggest that p may be a predictor uniquely associated with long-term outcomes. may have implications. the first-line psychological therapy specific and primarily such a the present results suggest that a subset of anxious youth complementary or alternative approaches. that p did not predict acute treatment efforts to prevent relapse and the development of broad may

broad dimensions: (1) internalizing (eg, depression, fear), (2) externalizing (eg, impulsivity, antisocial behavior), and (3) thought disorder (eg, mania, psychosis) symptoms. The advantages of dimensional models over categorical models are that they better explain the shared genetic architecture of many psychiatric disorders as well as the high rates of comorbidity, within-disorder heterogeneity, and heterotypic continuity seen within psychiatry. 14 A key element in dimensional approaches to psychiatric symptoms is the inclusion of a general overarching factor that reflects features shared by all psychiatric disorders: namely, the p factor. Within a clinical context, individuals high in p are prone to experience symptoms across the psychiatric spectrum, whereas individuals low on p may experience symptoms within only a single symptom dimension. It has been suggested that p is a proxy for mechanisms that cut across disorders, such as poor emotion regulation. 15 p is highly heritable in children and adolescents 16 and is strongly related to age of onset, duration, and disorder diversity in a recent 4-decade longitudinal cohort study. 17 Although most studies examining p have been conducted with adults, there is evidence for the p factor in community populations of youth [18][19][20] and children as young as 18 months of age 21 as well as referred youth. 22 However, p has not been examined as a predictor of longitudinal outcome among youth that fulfill criteria for a psychiatric disorder. Pediatric anxiety disorders may be particularly relevant, as they are the most common mental disorders in youth, [1][2][3][4]23 onset early, 2 and predict broad mental health problems and functional impairments into adulthood. 24 In addition, emotion dysregulation (a proposed underpinning of p) has predicted rates of nontargeted disorders (posttraumatic stress disorder, agoraphobia, panic attacks, and obsessive-compulsive disorder [OCD]) for anxious youth (N = 64) at 7 to 19 years following treatment, although emotion dysregulation was not a predictor of posttreatment responder status. 25 This pattern of findings further highlights the potential for p as an LTFU predictor. Functional outcomes may be particularly important to assess in addition to psychiatric/anxiety symptoms, as symptoms and functional impairments can interact with one another throughout follow-up in a series of developmental cascades. 26 If consistent predictors of broad outcome can be outlined, findings could be leveraged to improve or amend care (eg, by the inclusion of treatment components directly targeting consistent predictors or sustained care over time).
The present study examined whether p predicted long-term outcome in a large naturalistically followed-up sample of clinically anxious youth. These youth participated in the Child/Adolescent Anxiety Multimodal Extended Long-term Study (CAMELS). CAMELS was conducted as an LTFU of participants enrolled in a large randomized controlled trial of pediatric anxiety, the Child and Adolescent Anxiety Multimodal Study (CAMS). 27 In CAMS, youth with a principal anxiety disorder were randomized to cognitive -behavioral therapy (CBT), sertraline, CBT plus sertraline, or pill placebo. 5 In CAMELS, a subset of the CAMS participants were re-assessed on average 6 years after completion of CAMS and then assessed annually for 4 consecutive years. In addition to anxiety outcomes, CAMELS assessments included broad diagnostic interviews and ratings of psychosocial functioning and impairment. An original evaluation of CAMELS predictors found that age at study entry, male sex, higher baseline functioning, positive family dynamics, and absence of social anxiety disorder were associated with a stable remission versus a chronic pattern of anxiety through follow-up. 10 The current study pursued the following hypotheses: (1) baseline p will predict long-term outcome defined as number of psychiatric diagnoses, overall functioning, and degree of day-to-day impairment; and (2) p will predict these outcomes above and beyond baseline predictors that previously have been established in CAMELS.

METHOD Procedure
English-speaking outpatient youth (N = 488) aged 7 to 17 years with a principal social anxiety, generalized anxiety, or separation anxiety disorder who were free of anti-anxiety medications prior to baseline were randomized into the CAMS treatments and assessed at baseline and at weeks 4, 8 and 12. Of note, presence of additional comorbid diagnoses (eg, depression, externalizing disorders) was not an exclusion criterion per se, although youth with disorders that required treatment not provided in CAMS (ie, major depressive disorder, bipolar disorder, psychotic disorder, pervasive developmental disorder, uncontrolled attention-deficit/hyperactivity disorder, eating disorders, and substance use disorders) were excluded. Additional exclusion criteria in CAMS included the following: school refusal behavior in the most recent term (missing >25% of school days); suicidal or homicidal ideation; 2 previous failed SSRI trials or 1 failed trial of CBT for anxiety; sertraline intolerance; presence of a confounding medical condition; pregnancy; and IQ estimate ≥80. 27 After completing CAMS, interested youth/caregivers were enrolled into CAMELS and assessed yearly 4 to 12 years after CAMS 10 ; 65.4% of the CAMS sample (n = 319) participated in the first CAMELS assessment (numbers at each of the 4 CAMELS assessments are shown in Table 1). Sociodemographic information and scores on the outcome measures across time points are also provided in Table 1.
Previous studies have compared CAMELS participants and nonparticipants along several variables (ie, percentage of responders, baseline anxiety severity, number of baseline comorbid disorders, assigned treatment conditions), with no significant differences in presentation or treatment response. There were significant demographic differences, such that participants who participated in CAMELS were more likely to be female, to be non-Hispanic, and to report a higher socioeconomic status. CAMELS participants were also more likely to be randomized to treatment 3 months later than those who did not participate in CAMELS.

p Factor Measure
To establish p, we used the 8 narrowband subscales of the parent-reported Child Behavior Checklist (CBCL) 28 administered at CAMS baseline and as part of assessments at 4, 8, and 12 weeks. The CBCL provided dimensional scores for each participant across 8 dimensions of psychopathology: rule-breaking behavior, attention problems, thought problems, social problems, somatic complaints, withdrawn/depressed symptoms, and anxious/depressed symptoms. The CBCL has been used to reliably assess p in previous studies with youth 22,29 and is the most frequently used scale to derive p in youth samples.

Outcome Measures
The Anxiety Disorders Interview Schedule (ADIS-IV) 30 is a semi-structured diagnostic interview of DSM-IV-TR diagnoses. Reliable independent evaluators (IEs) administered the ADIS-IV separately to both parent and children/adolescents at the baseline assessment in CAMS and long-visits in CAMELS, which included Clinical Severity Ratings (CSR) for endorsed diagnoses. CSRs were provided along a scale of 0 to 8, with a CSR of 4 or higher indicating that the child/adolescent met diagnostic criteria. Intraclass correlation coefficients between CAMS IEs and quality assurance raters for 10% of ADIS assessments were excellent. Consistent with the original evaluation of CAMELS outcome, which used presence/absence of an anxiety disorder diagnosis to generate 3 responder groups, 12 we used number of psychiatric diagnoses on the ADIS-IV at each time point as our primary outcome.
As secondary outcomes, we evaluated broad functional outcome using 2 additional variables: the Children's Global Assessment Scale/Global Assessment of Functioning 31 (CGAS/GAF) and Health of the Nation Outcome Scales (HoNOS; child/adolescent 32 and adult 33 scales). CGAS/GAF are IE-rated scales assessing overall psychosocial functioning on a scale from 1 to 100, with lower ratings indicating poorer functioning. CGAS/GAF has demonstrated adequate psychometric properties. 34 The HoNOS is an IE-reported measure of functional impairment (eg, peer relationships, family life/relationships, activities of daily living, occupational functioning, and school performance) and has adequate psychometric properties. 32,35

Validator Measures
Criterion validity of p was examined by outlining associations between p and other baseline variables. First, associations between p and comorbidity were examined, with comorbidity being coded in 3 ways according to ADIS-IV ratings: (1) any comorbid disorder in addition to the principal anxiety disorder (yes/no); (2) any comorbid internalizing disorder (yes/no); and (3) any comorbid externalizing disorder (yes/no). Second, associations between p and overall functioning (CGAS) and overall anxiety severity (Clinical Global Impression-Severity scale [CGI-S]) were examined. Finally, associations between p and self-reported depression and anxiety were examined using the child-reported Mood and Feelings Questionnaire 36 (MFQ) and the child-reported Screen for Child Anxiety Related Disorders 37 (SCARED), with both measures having sound psychometric properties. 38,39

Predictor Measures
To control for possible confounding in the predictor analysis, we made an a priori decision to include all variables previously established as significant baseline predictors of long-term outcome in CAMELS: age at study entry, sex, baseline functioning (CGAS), family functioning and social anxiety. In accordance with previous CAMELS studies, the general scale of the parent-reported Brief Family Assessment Measure-III (BFAM-III) 40 was used as a measure of baseline family functioning. This scale has adequate internal consistency in other samples 41 and in CAMS (Cronbach α = 0.84). The CSR rating (0-8) for social anxiety disorder in ADIS was used as a measure of severity of social anxiety at baseline. Treatment condition and baseline anxiety severity were also included as predictors, with baseline anxiety severity being measured using the clinician-rated CGI-S.

Statistical Analysis
All analyses were carried out within a structural equation modeling (SEM) framework and performed in R using the library lavaan. The full statistical code is provided as an online supplement.
First, we aimed to establish a well-fitting dimensional model that included all CBCL narrowband scales. We followed a stepwise approach. First, and consistent with previous approaches, 42,43 we tested (1) a single general factor model (model 1); (2) a model with 2 correlated first-order factors (internalizing and externalizing symptoms; model 2); and (3) a bifactor model (ie, the combination of models 1 and 2, but with orthogonal internalizing and externalizing factors; model 3). Raw CBCL scores were used in all analyses because of better distributional properties, and data for all participants in CAMS were used to maximize statistical power and to minimize sampling error. If a well-fitting model was not outlined using this initial approach, we decided a priori to use modification indices (MI) to improve model fit. MI is a statistical procedure that produces an estimate of how much a model's χ 2 value is reduced (ie, the model becomes more adequate in reproducing the observed sample statistics) if a specific model restriction is removed. The MI procedure in lavaan tests the removal of all model restrictions and results in suggestions of which added parameters will most increase model fit. We chose to add only those model parameters that were theoretically justified starting with the parameter that increased model fit the most. For each added parameter, we re-examined all model fit indices. Because of increased risk of overfitting using MI, we tested our final model by fitting it to separate CBCL data, reported by parents at 4, 8, and 12 weeks into CAMS.
As recommended, several fit indices were examined to evaluate model fit 44 : χ 2 (lower value indicates better fit), confirmatory fit index (CFI; adequate fit indicated by >0.90), root mean square error of approximation (RMSEA; adequate fit indicated by <0.06), standardized mean square residual (SRMR; adequate fit indicated by <0.08), and Tucker-Lewis fit index (TLI adequate fit indicated by >0.90). 45,46 Criterion validity of p was examined in relation to the validator measures described above.
Finally, p was used to predict long-term outcome in a line of structural predictor models. Specifically, the best-fitting p factor model was included in the overall SEM model, and the p dimension was modeled as a predictor of outcome alongside the other predictor variables described above. Each outcome was examined using a separate SEM model. These SEM models let us examine whether p predicted outcome above and beyond other predictor variables. Maximum likelihood (continuous data) or diagonally weighted least-squares (categorical data) estimation was used, dependent upon whether or not categorical variables were included in the specified model. Robust standard errors were estimated throughout. An α level of 0.05 was used as an indicator of statistical significance. No adjustment of α level was used because we examined prespecified hypotheses, reported results for all analyses, examined multiple dependent outcome variables (eg, functioning/impairment at 4 different time points), and did not test a universal null hypothesis. 47

Fitting and Testing the p Factor
All steps of fitting the p factor model are presented in Table 2. Model fit was not adequate for the single general factor model (model 1) or the model with correlated internalizing and externalizing factors (model 2). The bifactor model (model 3) had better fit, but all indicators did not load significantly onto the externalizing and internalizing factors, and some indicators had factor loadings in the direction opposite to that which was expected (all models and loadings are represented in Figure S1, available online). Because the fitted bifactor model was not theoretically adequate, it was dropped from further analyses.
MI were used for models 1 and 2 to examine whether we could fit a theoretically justified model that adequately reproduced the observed sample statistics. For model 1, 5 theoretically justified parameters were added to achieve a well-fitting model that was also theoretically coherent ( Table 2). In the final model (model 1e), all narrowband scales loaded significantly and positively onto the p factor. In addition, the scales of Aggressive Behavior and Rule-Breaking Behavior had a positive association that was not captured merely by them being indicators of p. A similar association emerged for Withdrawn/Depressed and Social Problems, in which Anxious/Depressed was negatively associated with Attention Problems and Aggressive Behavior after accounting for their shared variance through p. All parameters in the final model were statistically significant (all p values <.001) and participants in the sample varied significantly on the p factor (p < .001).
For model 2, additional parameters identical to those for model 1 were suggested and added; very similar fit indices were found across models 1 and 2 with each added parameter. Fit indices for the final model (model 2e) are presented in Table 2. Importantly, in this model, the internalizing and externalizing factors were very highly correlated (β = 0.970), indicating that a general psychopathology factor was clearly represented in the observed sample statistics. Also, model parameters were nearly identical across the 2 models (Figure 1). Because of the high correlation between the latent factors in model 2e, we proceeded with model 1e, which better accounted for the general factor suggested by the data. Both models and their parameter estimates are presented in Figure 1.
To examine model validity and the risk of an over-fitted model due to the use modification indices, we cross-validated the final model using parent-reported CBCL data from 4, 8, and 12 weeks into CAMS. The model showed adequate fit at all time points and across fit indices with the exception of RMSEA, which is known to falsely reject models when degrees of freedom (df) and sample size are small. 48 As a sensitivity analysis, we also tested model fit across time points for model 2e. Nearly identical fit indices for models 1e and 2e were found. Because we used raw CBCL scores and not sex-and age-adjusted t scores, we tested the model fit of model 1e using t-score data at baseline and at 4, 8, and 12 weeks into CAMS. Fit indices were very similar, but the model showed slightly better fit using t-score data. Finally, we tested whether model 1e had adequate fit in the 318 CAMS participants who participated in the first follow-up in CAMELS; good fit was found. Full fit indices for all models are provided in Table S1, available online.
To further validate our final model (ie, model 1e), associations between the latent p factor in the model and CAMS baseline variables were examined. p was related to the presence of a comorbid disorder in general (β = 0.223, p < .001; n = 480) and the presence of a comorbid externalizing disorder specifically (β = 0.463, p < .001; n = 480), but not to internalizing disorders beyond the presenting principal anxiety disorder (β = 0.062, p = .318; n = 480). Significant associations were also found between p and child-reported depression (MFQ; β = 0.204, p < .001; n = 480), child-reported anxiety (SCARED-R; β = 0.149, p = .006; n = 477), IE-reported overall functioning (CGAS; β = −0.116, p = .009; n = 480), and IEreported baseline anxiety severity (CGI-S; β = 0.186, p < .001; n = 480). The p factor was not associated with age or sex. Table 3 presents results for p and the other predictor variables in relation to long-term outcomes. With respect to our primary outcome, higher p at baseline was associated with a larger number of diagnoses at each of the 4 follow-ups. Family functioning and severity of social anxiety at baseline were associated with number of diagnoses at select time points. In regard to our secondary outcomes, higher p at baseline was associated with lower functioning/more impairment at all follow-up assessments. Sex, severity of social anxiety, and family functioning at baseline were associated with these outcomes at select time points.

Post Hoc Analyses
Because p consistently predicted our main outcomes, a post hoc analysis was carried out to examine whether p also predicted anxiety outcomes, with these outcomes defined as the IErated CGI-S score for anxiety at each follow-up assessment. Results are shown in Table S2, available online. Higher levels of p at baseline were significantly associated with higher levels of anxiety severity at each follow-up time point.
To examine whether the predictive influence of p simply reflected baseline diagnostic comorbidity, a post hoc decision was made to run a model that included baseline diagnostic comorbidity (defined as 0, 1, or 2+ baseline comorbid disorders according to ADIS) in the set of predictors. The p factor was significantly associated with outcome in 14 of 16 models; baseline comorbidity was associated with outcome in 4 of 16 models. Full results are provided in Table S3, available online.
To examine whether the association between higher levels of p and poor outcome could be explained by p being associated with a poor treatment response, we examined whether p predicted CAMS short-term outcome, which was defined as CGI-S and CGAS scores directly at posttreatment and at 24 and 36 weeks posttreatment. p predicted CGI-S at 36 weeks posttreatment (β = 0.137, p = .007; n = 473), but no other significant associations emerged. Full results are provided in Table S4, available online. The p factor model used in the present study, with the CBCL narrowband scales loading somewhat equally on p, suggests that p (as measured in the present study) may be reasonably well reflected by a single CBCL sum score. Such a measure is easily obtained in everyday clinical care and hence has clinical utility. Therefore, we examined whether a total CBCL score predicted long-term outcomes. Results indicated that a CBCL sum score predicted long-term outcome with similar but somewhat attenuated effects compared to the latent p factor in the SEM model. The CBCL total score was significantly associated with outcome in all but 1 model (where it approached significance). Full results are provided in Table S5, available online.

DISCUSSION
Does a single latent dimension that confers risk for a wide range of mental disorders (the p factor) predict long-term outcome for youth who received treatment for anxiety during childhood or adolescence? In line with hypotheses, results showed that higher levels of p at baseline predicted a larger number of psychiatric disorders, lower functioning, and more impairment at each long-term follow-up. These findings emerged above and beyond previously established baseline predictors of anxiety outcomes (eg, sex, family functioning, presence of social anxiety). Importantly, p was the only predictor consistently associated with outcomes across time and measures, also when controlling for baseline diagnostic comorbidity, indicating that p is related not only to a narrow set of outcomes but to broad outcomes, and that it is not simply a proxy for diagnostic comorbidity. Higher levels of p at baseline also predicted long-term anxiety outcomes, which adds to the extant literature on factors associated with remission versus relapse/nonresponse among anxiety-disordered youth. [10][11][12] Furthermore, p did not predict initial treatment response, suggesting that its long-term predictive effects cannot be explained by poorer immediate outcome and that the importance of p may emerge uniquely in relation to long-term outcomes. Although results should be considered as preliminary until replicated, study findings suggest that p may reflect processes/vulnerabilities that are important for understanding the transition from pediatric anxiety to adult psychopathology and impairment.
Study results are consistent with cross-sectional studies examining p in community youth populations [18][19][20] and in referred youth. 22 These studies have found that higher levels of p are associated with a range of negative sequalae, including more severe scores on a composite measure of self-harm and suicidal ideation. 22 The present findings extend previous crosssectional work to suggest that p is associated with additional negative outcomes (ie, more psychiatric disorders, poorer functioning, and more impairment) in a longitudinal clinical sample of youth. Given that findings for clinical populations can have direct implications for clinical care, further longitudinal investigations with other youth clinical samples and additional variables (ie, age of onset, disorder duration, and diversity of different comorbid disorders over time 17 ) are warranted. Our findings are also in line with previous findings showing that emotion dysregulation (a proposed underpinning of p) predicted long-term, but not short-term, outcomes in anxious youth. 25 Taken together, these results suggest that p may be a predictor uniquely associated with long-term outcomes.
Our findings may have treatment implications. CBT, the first-line psychological therapy for youth anxiety, is disorder specific and targets primarily anxiety. Although such a focus is efficacious in changing anxiety, the present results suggest that a subset of anxious youth may need complementary or alternative approaches. Given that p did not predict acute be more efficacious for youth high in p than efforts to improve current disorder-specific treatments. It may also be that youth high on p may benefit from treatments that dually target features suggested to underlie p (eg, emotion dysregulation, emotional impulsivity) while also including core anxiety-focused CBT components (eg, exposure). To advance the field, work is needed to better elucidate the mechanisms underlying p and how to optimally assess and address these mechanisms in youth treatments. Hopefully, such work can be leveraged to identify children and adolescents who may need broader or more long-term care, via increased treatment duration, augmented treatment content, and/or medication. In the present study, we could show that a simple CBCL sum score may act as a clinically feasible proxy for p. Future work may benefit from the inclusion of similar simpler models using measures that can be obtained and used in everyday clinical care.
Study limitations warrant consideration. First, the optimal measurement of the p factor is not yet established, and in the present study we could not use the bifactor model often used in previous studies. Future studies may want to assess p using item-level and not sum score data as we used. Second, item-level data were not available for any measures, and thus we could not estimate internal reliability scores and similar estimates. Third, although CAMELS attrition was similar to that in other trials, 49 retention rates were modest (65%). 10 Thus, selection bias may have affected results, which is supported by the low representation of participants of ethnic minority and from a low socioeconomic status. Fourth, a naturalistic study design was implemented during follow-up for ethical reasons. As a result, firm causal conclusions cannot be drawn. Fifth, issues of external validity warrant consideration. Families who opted to participate in CAMELS did not differ from those who chose not to participate in terms of percentage of treatment responders, baseline anxiety severity, number of baseline comorbid disorders, or assigned treatment condition, but there were significant demographic differences (ie, biological sex, ethnicity, socioeconomic status). 10 Although findings that CAMELS participants did not differ in clinical presentation or treatment response are encouraging for external validity, results may not generalize to the entire population of youth with anxiety disorders. Future LTFUs should emphasize recruitment and retention of more diverse populations, with a particular focus on the barriers that these populations face to participation in research more generally. 50 Finally, the predictive power of p should be examined alongside predictor variables other than those used here (eg, depression 51 ) and in relation to a broader set of outcomes, and by including variables that may mediate outcome (eg, interim negative life events). Furthermore, p may be reciprocally or causally related to some of the factors included as covariates in this study (eg, family and overall functioning). More complex models, preferably using longitudinal data, can help to identify such relations with implications for the onset and maintenance of mental disorders among youth.
Despite these limitations, the current study's prospective design with long-term follow-up allowed for a preliminary test of the hypothesis that p is a "liability" to future disorders. This extends previous work using primarily cross-sectional or retrospective assessments of p, and findings clearly indicated that youth with anxiety disorders who present with a liability toward broad mental health problems are at increased risk for poor long-term, but not immediate, outcomes across mental health and functional domains. Efforts to assess and to address this broad liability may enhance outcomes and inform updates to treatments and    .031 Note: Boldface type indicates the best fitting model. CFI = comparative fit index; RMSEA = root mean square error of approximation; SRMR = standardized mean square residual; TLI = Tucker-Lewis Index.