Validation of the Quality of ChatGPT's Responses to Frequently Asked Questions About Gestational and Congenital Syphilis
artificial intelligence; frequently asked questions (FAQ); gestational syphilis; congenital syphilis.
Syphilis is a sexually transmitted infection caused by the bacterium Treponema pallidum. Although it is a curable disease through antibiotic therapy, the absence of treatment may lead to severe complications. The increasing rates of acquired, gestational, and congenital syphilis are associated, among other factors, with misinformation. In this context, artificial intelligence (AI) tools such as ChatGPT, based on natural language processing, emerge as potential alternatives for disseminating health information. This dissertation aimed to validate the quality of responses provided by ChatGPT to frequently asked questions (FAQs) about gestational and congenital syphilis.This was a validation, descriptive, cross-sectional, and observational study with a quantitative approach. The methodological pathway included: (1) searching and selecting FAQs from institutional websites; (2) generating responses using ChatGPT 4.0 with the command “respond as an FAQ”; (3) validation by both experts and the target population (pregnant women); and (4) textual readability analysis of the responses. Expert judges evaluated clarity, relevance, accuracy, comprehensiveness, and overall satisfaction, while the target population assessed only clarity, relevance, and satisfaction. Data collection was conducted through electronic forms (experts) and printed forms (target population). Readability was measured using the ALT software, applying classical formulas such as Flesch Reading Ease, Gulpease, Flesch-Kincaid, Gunning Fog, ARI, Coleman-Liau, and a general index. A total of 21 health professionals and 19 pregnant women participated. Among experts, the responses scored averages above 4.0 on the Likert scale (1–5), with emphasis on clarity (CVI = 94%) and relevance (CVI = 91%). However, comprehensiveness (74%) and accuracy (76%) fell below the cutoff point (CVI ≥ 78%), highlighting technical gaps. The overall Cronbach’s alpha (91.4%) demonstrated high internal consistency. The target population rated the responses with an overall mean of 4.4, a CVI of 90.3%, and internal consistency of 90.3%, indicating strong acceptance of the content. Readability analysis indicated that most texts were at medium to higher levels, with no significant differences between groups. Pearson’s correlations revealed statistically significant associations between most indices, such as ARI and Flesch-Kincaid (r = 0.897; p < 0.001) and Gulpease and the General Index (r = -0.926; p < 0.001), confirming coherence across similar metrics. However, reliability analysis across three sets of generated responses showed that no index achieved good or excellent consistency (ICC < 50%), except for Coleman-Liau at 49%, which nevertheless lacked statistical significance. In conclusion, ChatGPT demonstrates potential as a complementary tool for health education, providing responses perceived as clear, relevant, and satisfactory, especially by the target population with secondary and higher education. Nevertheless, limitations in accuracy and comprehensiveness underscore the need for technical supervision and critical review of information before use in clinical and educational contexts.