Development and Validation of a Risk Assessment Tool for Detecting Oral Cavity Cancer in India: A Case–control Study Design Approach

Monica Mocherla; Pushpanjali Krishnappa; Denny John

doi:10.1055/s-0046-1817795

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00050348.xml

Download PDF

CC BY 4.0 · Indian J Med Paediatr Oncol
DOI: 10.1055/s-0046-1817795

Original Article

Development and Validation of a Risk Assessment Tool for Detecting Oral Cavity Cancer in India: A Case–control Study Design Approach

Authors

Monica Mocherla

¹Department of Public Health Dentistry, Sri Sai College of Dental Surgery, Vikarabad, Telangana, India

²Department of Public Health Dentistry, Faculty of Dental Sciences, MS Ramaiah University of Applied Sciences, Bengaluru, Karnataka, India
Pushpanjali Krishnappa

²Department of Public Health Dentistry, Faculty of Dental Sciences, MS Ramaiah University of Applied Sciences, Bengaluru, Karnataka, India
Denny John

³Faculty of Life and Allied Health Sciences, MS Ramaiah University of Applied Sciences, Bengaluru, Karnataka, India

Further Information

Also available at

PDF Download Permissions and Reprints

Abstract

Introduction

Oral cancer is a significant health issue in India, often diagnosed late, resulting in poor outcomes/prognosis. Early identification of high-risk individuals is crucial for preventing complications, and focusing on these populations can significantly improve screening efforts. Implementing a risk assessment tool for oral cancer may enhance examination strategies across the country.

Objectives

Our objective was to develop a risk assessment model for oral cavity cancer based on a comprehensive understanding of risk factors, to ensure generalizability across Indian populations.

Materials and Methods

A multicenter case–control study was conducted from October 2022 to July 2023 across three cancer hospitals in Telangana, India to identify oral cancer risk factors. A risk score for each predictor was derived from the respective odds ratios (OR). The predictive ability of the regression model and the cut-off risk score were determined by calculating sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Calibration plots and the Hosmer–Lemeshow goodness-of-fit test were used to assess how well each model's predicted probabilities align with primary and secondary outcomes. Brier score was used as a measure of the model's overall accuracy. Decision curve analysis evaluated the model's clinical utility and net benefit for risk prediction. The models were validated using a bootstrap sample and OR from pooled studies from a systematic review.

Results

Years of smoked and smokeless tobacco, alcohol frequency, use of vegetables in the diet, and history of chronic oral trauma were the predictors. Risk scores ranged from −1 to 2. Area under the receiver operating characteristic curve for risk scores was good (0.76–0.840). Sensitivity was highest for upper socio-economic class, and pooled models while multivariable and bootstrapped upper socio-economic class models had the highest for specificity. Brier score of 0.1322 for the upper class and 0.1673 for the lower class indicated optimal model performance, while those for multivariable and pooled data models indicated suboptimal model performance.

Conclusion

The risk scoring model showed the ability to identify individuals at high risk for oral cancer, demonstrating good predictive ability for the Indian population. It needs validation in other populations to accurately pinpoint subgroups needing further clinical evaluation.

Keywords

case–control study - oral cavity cancer - risk factors - risk score - sensitivity - specificity

Introduction

Oral cavity cancer is a major health issue, especially in areas with high tobacco and alcohol use. Early identification of high-risk individuals is essential to prevent complications from late diagnoses.[1] In India, the India State-Level Disease Burden Initiative Cancer Collaborators reported 79,979 oral cancer deaths in 2022, ranking third in cancer mortality.[2] Tobacco, particularly smokeless varieties, and areca nut are the main causes, along with inadequate nutrition, oral HPV infection, and poor oral hygiene.[3]

Oral cancer in India poses a serious public health challenge, often diagnosed late, resulting in poor outcomes and unaffordable costs.[4] A lack of qualified healthcare professionals in rural areas leads to delays and advanced stages of cancer at diagnosis.[5] Early detection enhances outcomes and reduces costs, improving survival chances. Oral cancer mainly affects lower socioeconomic groups with high tobacco and alcohol use.[6] India has insufficient oral cancer screening programs, resulting in an increasing prevalence.[7] Implementing these programs is difficult as healthcare workers prioritize maternal and child health, immunization, and feel overburdened by added oral examinations.[8] There are no clear policies for health workers regarding oral health responsibilities, and they lack training in oral cavity examinations.[9] Targeting high-risk groups can enhance screening efforts.[10] A risk assessment tool for identifying high-risk individuals could improve examination strategies.[11]

We reviewed risk prediction models for early oral cancer detection and identified six studies, including two from India.[12] These models categorize risk factors into four types: sociodemographic history, medical history, dental health, and behavioral history (alcohol and tobacco use, and diet). The models demonstrated strong discrimination ability, ranging from acceptable (0.7–0.8) to excellent (0.8–0.9), with good predictive values. Calibration was performed only in Rao et al using the Hosmer–Lemeshow test.

Both risk assessment models were easy to use in resource-limited settings like India, albeit with several research gaps. The first was the absence of an inclusive risk factor model for oral cavity cancer in India. Gupta et al and Rao et al focused on oropharyngeal and upper aerodigestive tract cancers without addressing oral cavity specifics.[13] [14] Second, generalizability is limited due to region-specific risks. Gupta et al conducted a hospital-based case–control study in Pune with 240 control pairs, yielding a risk score of 0 to 26. Rao et al conducted an unmatched study in Karnataka with 180 cases and 272 controls, reporting scores of 0 to 28.[13] Third, Rao et al performed external validation to check the reliability of their risk score models in 200 bootstrap samples.[14] Fourth, both studies exhibited high bias risk regarding participants and analysis, raising concerns about applicability, as noted by the PROBAST tool.[12] Lastly, Gupta et al achieved a predictive value of AUC = 0.86 with 74.6% sensitivity and 84.6% specificity, while Rao et al had a higher predictive value (AUC = 0.9) with 93.5% sensitivity but lower specificity of 71.1%.[13] [14] The high false positivity could burden healthcare systems if used for screening of oral cavity cancer.

In a 2004 workshop by the National Cancer Institute, the UK recommended that cancer risk models be revised and strengthened for improved accuracy.[15] Our objective was to create a risk assessment model for oral cavity cancer using a detailed understanding of risk factors, aiming for generalizability across Indian populations.

Methods

Study Design

A multicenter case–control study was conducted from October 2022 to July 2023 across three cancer hospitals in Telangana, India, to identify oral cancer risk factors. The identification process involved several steps: Step 1: Literature review of studies on oral cancer risk predictors; Step 2: Developing an instrument with 25 items from various studies; Step 3: Validating content with 16 experts (two medical oncologists, four surgical oncologists, five oral cancer surgeons, five oral medicine specialists) through face-to-face interviews. Experts rated item relevance using a Likert-type scale from 1 (not relevant) to 4 (highly relevant) to refine the questionnaire format; Step 4: The content validity index (CVI) was calculated by scoring items rated 3 to 4 by experts and dividing by the number of experts to compute the CVI. Items with an I-CVI between 0.70 and 0.90 were revised, those with an I-CVI >0.90 were retained, and those with an I-CVI <0.70 were discarded. Step 5: The investigator was trained by an oral medicine specialist to diagnose oral cancer lesions using online photographs; Step 6: A pilot study was conducted using the revised questionnaire among 20 oral cancer cases and 20 controls.

Patient and Participant Selection and Recruitment

The case involved a person newly diagnosed with oral cancer, confirmed histopathologically, visiting a designated cancer hospital during the study. Anatomical sites included C00 t C06 based on the International Classification of Diseases, Oncology, 3rd edition.[16] Inclusion criteria required patients to be over 18, provide informed consent, and have a confirmed diagnosis. Patients with cancer recurrence, cognitive impairments, or advanced metastatic stages were excluded.

Controls were participants without oral cancer, including caregivers, relatives, and hospital visitors from the same hospitals as the cases. Their criteria were similar, except they must show no signs of cancer and could not have malignancies related to tobacco or alcohol, like liver, lung, or esophageal cancers, or cognitive impairments.

The detailed patient recruitment process, sample size estimation, data collection process, ethics and informed consent are documented elsewhere.[17]

Potential Predictors

Consumption of fruits and family history of cancer were deleted after consultation with experts on the subject due to the low frequency in the target population. After the initial round of content validation, the questionnaire had a total of 17 questions with an Item level CVI above 0.9. These included four demographic questions: history of adverse habits – eight items, dietary habits – two items, family history – one item and dental history – two items. A total of three questions had a CVI between 0.7 and 0.9 and were removed after repeat consultation with the experts, and five questions had a CVI value less than 0.7, leading to the removal of those questions. After assessment of the relevancy of each item, the instrument level CVI was calculated by taking an average item level CVI score of 17 items. The CVI score for this newly developed instrument was 0.86. No changes were required after the initial pilot test, and all the variables were included in the final questionnaire ([Supplementary Material], available in the online version only).

Primary and Secondary Outcomes

The primary outcomes include adjusted odds ratio with 95% confidence intervals and corresponding risk scores for the association between each predictor and the risk of oral cavity cancer. Secondary outcomes include performance metrics of the risk scores for each of these predictors in relation to the risk of oral cavity cancer.

Statistical Analysis

The association between each predictor and the risk of oral cavity cancer was assessed using a multivariable binary regression analysis controlling for age, gender, socioeconomic status, and place of residence to evaluate the relationship between different risk factors and oral cancer. A significance threshold of p < 0.05 was applied. Using the IPW package in R software, inverse probability of treatment weighting was also performed to evaluate the impact of different risk factors on oral cancer.[18] The inverse probability of treatment weights was computed using estimated propensity scores to achieve a balance between measured baseline covariates among cases and controls. Multiple logistic regression was used to calculate each person's propensity scores. The covariates were considered balanced when the absolute standardized mean difference was less than 0.1. Next, a weighted regression analysis determined odds ratios (OR) with 95% confidence intervals.

Derivation of the Risk Scores

Before including them in the risk assessment model, we assessed the variance inflation factor for multicollinearity among predictors. A stepwise backward elimination technique removed predictors with the highest p-values until only significant predictors (p < 0.05) remained, alongside clinically relevant ones ([Supplementary Material], available in the online version only). Risk scores were derived by combining the weight of each predictor to gauge oral cavity cancer risk. A scoring system based on regression coefficients rounded to integers assigned points. Each patient's score summed points of all predictors, categorizing them into high and low-risk groups. For external comparison, predictors with ORs beyond 10% from the pooled estimate were excluded, and revised ORs were calculated.[18] Pooled ORs from meta-analysis transformed into regression coefficients served as weights for each risk factor, contributing to the overall risk score.[18]

Model Performance

The model's performance was evaluated using AUC for discrimination ability, sensitivity, and specificity, along with calibration plots comparing predicted risks with observed primary and secondary outcomes. Receiver operating characteristic curves were created for the case–control study and pooled estimate model from the weighted dataset. Sensitivity and 1-specificity were plotted in both original and bootstrapped samples, resulting in an optimal cut-off score of 2.

The disease prevalence affects predictive values and sensitivity. The model's reliance on the case–control dataset makes measuring accurate disease prevalence difficult. Therefore, we adjusted for India's control sampling fraction and prevalence.[19] [20]

AUC values, sensitivity, and specificity were compared to identify the model with superior discriminatory power. Calibration plots and the Hosmer–Lemeshow goodness-of-fit test were used to assess how well the predicted probabilities aligned with the primary and secondary outcomes. The Brier score assessed overall accuracy using the mean squared error between predictions and observed results. Decision curve analysis evaluated the model's clinical utility and net benefit in risk prediction.

Model Validation

Internal validation of the model's predictive performance for risk scores was conducted by bootstrapping 1,000 samples to estimate confidence intervals and evaluate model stability. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for the final model at the optimum cut-off after bootstrapping. External validation used pooled ORs from a systematic review and meta-analysis.[18]

Subgroup analysis by socioeconomic status (upper vs. lower) investigated differences in risk factors for oral cavity cancer, examining whether specific predictors like tobacco use, diet, and chronic oral trauma varied by socioeconomic group.

Required permissions to collect data from the patients were obtained at the start of the study from respective hospital authorities. Permission was also obtained to access the patient's medical records to ascertain the diagnosis and treatment plan. Identified cases were contacted initially by the receptionist or ward nurses to confirm their willingness to participate in the study. To each potential subject, the researcher then explained the purpose of the study, procedures to be performed, and freedom to withdraw from the data collection process at any time during the study. The subjects were assured about the confidentiality of the data collected. They were informed about the oral examination to be performed and the time required for the interview in a language that they could understand. After clarifying their doubts/questions, informed consent was obtained from the subjects, and the interview was scheduled at their convenience.

The STROBE guidelines for case–control studies were used to report the methods and results of this study, along with the TRIPOD checklist for the risk assessment model.[21] [22]

Ethics

Ethical clearance for the study was obtained from the Human Research Ethics Committee of MNJ Cancer Hospital, Hyderabad (ECR/227/Inst/AP/2013/RR-19, dated 04 August 2022). All procedures performed in studies involving human participants were in compliance with the ethical standards of the 1964 Helsinki Declaration and ICMR-National Guidelines for Biomedical and Health Research.

Results

Of 250 eligible cases and 500 controls, 238 (95.2%) cases and 450 (90%) controls consented to participate. An initial imbalance in covariates was noted. After trimming extreme weights, balance was achieved regarding age, gender, socioeconomic status, and residence. A standardized mean difference of less than 0.1 was considered acceptable for balance. Our sample size estimation was 175 cases and 350 controls, as described elsewhere.[17] Therefore, the sample sizes of 214 cases and 420 controls were considered adequate for further analysis.

[Table 1] presents the final predictors, and their risk scores based on β coefficients. The Hosmer–Lemeshow test indicated good fit (p = 0.456). Risk scores were derived from the natural log of the regression model's ORs. The risk prediction model is defined as: Risk prediction model = −1.49 + 2.25 (smoking > 10 years) + 2.23 (smokeless tobacco > 10 years) + 1.69 (daily alcohol use) − 0.94 (vegetable consumption > 3 times/week) + 1.67 (chronic trauma history). The multivariable model from case–control data demonstrated good discrimination (AUC = 0.84, 95% CI: 0.78–0.86) with sensitivity (76%) and specificity (72%). Models based on upper and lower economic classes demonstrated strong discriminatory ability (AUC = 0.81, 95% CI: 0.72–0.84 for upper and AUC = 0.81, 95% CI: 0.75–0.86) with optimal sensitivity and specificity. The pooled data model had fair discrimination (AUC = 0.79, 95% CI: 0.75–0.82) and low specificity. Bootstrapped models from multivariable analysis exhibited good discrimination compared with the pooled model (AUC = 0.78), with all showing optimal sensitivity and specificity.

Table 1
Risk scores based on case–control study and pooled estimate
Characteristics		Case–Control			Pooled estimates
Characteristics		Beta coefficients	Adjusted OR	Risk scores	Beta coefficients	Adjusted OR	Risk scores
Years of smoking	Never	Ref
	<10 y	0.4291816	1.536 (0.525–4.135)	0	0	1.00 (0.73–1.38)	0
	>10 y	2.277574	9.753 (4.864–20.265)	2	1.050	2.86 (1.82–4.48)	1
Years of smokeless tobacco	Never	Ref
	<10 y	1.99320	7.339 (3.625–15.217)	2	0.620	1.86 (1.47–2.34)	1
	>10 y	2.19098	8.944 (4.587–18.0245)	2	1.532	4.19 (3.56–5.09)	2
Alcohol frequency	Never	Ref
	Occasional	0.18564	1.204 (0.670–2.216)	0	0.086	1.09 (0.81–1.47)	0
	Daily	1.77393	5.894 (3.236–11.019)	2	0.760	2.14 (1.67–2.75)	1
Vegetables	<3 times/week	Ref
Vegetables	>3 times/week	–1.00239	0.367 (1.414–6.208)	–1	1.108	0.33 (0.22–0.47)	–1
History of chronic trauma	Absent	Ref
History of chronic trauma	Present	1.2345	4.437 (1.281–9.103)	2	0.672	1.96 (1.053.62)	1

Abbreviation: OR, odds ratio.

A Brier score close to 0, ideally under 0.25, signifies optimal performance for oral cancer. The multivariable model had a Brier score of 0.291, indicating suboptimal performance despite optimal sensitivity and specificity. The upper class scored 0.1322 and the lower class 0.1673, indicating optimal performance ([Table 2]). The pooled model scored 0.345, reflecting suboptimal performance ([Table 2]).

Table 2
Performance metrics of risk scores derived from multivariable model, upper- and lower-class models, pooled model, and bootstrapped multivariable, upper and lower, and pooled models
Type of model	AUC	Sensitivity	Specificity	Brier score
Multivariable model	0.821 (95% CI: 0.794–0.858)	0.7184 (95% CI:0.656–0.774)	0.7666667 (95% CI:0.7247–0.804)	0.291
Bootstrapped multivariable model	0.841 (95% CI: 0.781–0.868)	0.7692 (95% CI:0.670–0.900)	0.7234 (95% CI:0.568–0.8104)
Upper class model	0.7637 (95% CI: 0.714–0.813)	0.8231707 (95% CI: 0.763–0.832	0.654321 (95% CI:0.632–0.781)	0.1322
Bootstrapped upper class model	0.81552 (95% CI:0.724–0.849)	0.7211155 (95% CI:0.721–0.725)	0.7871854 (95% CI:0.787–0.887)
Lower class model	0.8158702 (95% CI:0.759- 0.872)	0.7702703 (95% CI:0.717–0.824)	0.7439614 (95% CI:0.663–0.799)	0.1673
Bootstrapped lower class model	0.8152876 (95% CI:0.754–0.869)	0.7676145 (95% CI:0.636–0.882)	0.75224 (95% CI:0.616–0.871)
Pooled model	0.7803595 (95% CI:0.745–0.815)	0.8151261 (95% CI:0.725 0.846)	0.6377778 (95% CI:0.641–0.729)	0.345
Bootstrapped pooled model	0.7937 (95% CI:0.750–0.822)	0.7689868 (95%CI:0.714–0.822)	0.6328187 (95% CI:0.589–0.674)

Abbreviations: AUC, area under curve; CI, confidence Interval.

The calibration plots for the upper and lower classes were well-calibrated, with the predicted probability matching the actual outcome in the population. In the lower class, the model performed well at the lower risk threshold. Still, in the intermediate threshold range, the model underestimated or overestimated before calibrating well at the higher thresholds.

Decision curve analysis revealed a net benefit in using the multivariable model for predicting the risk of oral cavity cancer at a risk threshold between 0.2 and 0.4 ([Fig. 1]). The model had the highest net benefit compared with all treated at a threshold of 0.2. The pooled model had a higher net benefit in the mid-range thresholds between 0.2 and 0.4. Thresholds beyond 0.6 will diminish benefits and give no additional benefit of using the model ([Fig. 1]).

Fig. 1 Decision curve analysis for case–control study and pooled analysis.

Discussion

Identifying high-risk individuals for oral cavity cancer in resource-constrained settings is crucial. While some models show good predictive value, logistical constraints affect real-world applications.[12] There is a trade-off between simplicity and accuracy in mass screening risk assessments for serious diseases like oral cavity cancer. Simpler models may be more feasible for widespread use but can compromise sensitivity and specificity compared with more complex models.[12]

This model lacks transformations, interactions, or continuous variables and does not include an oral examination component. Due to this reason, although there is evidence of an association between tooth loss (a multifactorial process involving dental caries, periodontal disease, and various socio-economic factors) and oral cancer,[23] dental caries assessment was not considered for the model. Similarly, while the primary data from the case–control study on mouth rinsing were considered, they were not included in the risk scoring. The developed risk score model was created to be straightforward and user-friendly, requiring minimal logistics. Where trained health professionals are available, this information can be directly gathered using straightforward questions, and risk scores can be computed in future studies. In rural areas where trained health professionals are scarce, this model would be particularly beneficial.

The screening risk model developed in the present research had a good discriminatory ability (AUC= 0.84 for the multivariable model and 0.79 for the pooled model). This aligns with earlier studies, which reported an AUC ranging from 0.7 to 0.9.[12] The screening models developed by Rao et al and Gupta et al from hospital-based case–control studies in other regions of India also reported similar findings. Chewing quid with tobacco emerged as the most important predictor in all these models.[12] [13] [14]

The risk assessment model developed in the present study was well calibrated at lower risk thresholds but showed slight overestimation and underestimation in the middle range, before showing better calibration at higher thresholds. When risk models were developed separately based on the socioeconomic status of the target population, the upper- and lower-class models demonstrated good discrimination and were well-calibrated. Low SES feeds a vicious cycle that results in poor lifestyles, health behaviors, and educational outcomes.[24] In a meta-analysis by Conway et al, evidence suggested that socioeconomic conditions play a role in the risk of oral cavity cancer.[25] Thus, we hypothesize that the macroenvironment linked to low SES, including the impact of inadequate education on health, lack of access to healthcare, poor nutrition, poor hygiene, an unfavorable work environment, and substandard living conditions, may act in concert with other known risk behaviors frequently found in low SES groups to cause oral cancer through complex social interactions.[26]

A screening program's sustainability and effectiveness are based on its PPV and NPV, which are influenced by the population's disease prevalence.[27] Almost 86% of the adults in the study sample who had a risk score higher than the cutoff 2 were free of oral cancer. The oral cancer risk score's NPV was almost 70% higher than its PPV with the national prevalence. A higher NPV is likely to reassure a person with a lower score (negative test) that they are doubtful to have oral cancer in a population where the disease is more common.[13] However, in populations where prevalence is low, it may not provide useful information if applied to plausibly related populations with comparable behaviors that are highly susceptible to developing oral cavity cancer.

One of the common drawbacks of any risk assessment model is a high false positivity rate. Anxiety among the patients and their families, along with the trauma of undergoing further testing unnecessarily, may result when the false positivity rates are high. However, for a disease like oral cancer, in which early detection significantly improves survival outcomes, it is preferable to have a higher false positivity rate rather than a higher false negative rate.[28] With a cut-off score of 2, the false positive rate of the multivariable risk model was 23.3%.

Using the data on which the model was created, internal model validation measures the model's statistical performance and evaluates optimism. A risk prediction model's performance is likely to be overly optimistic in the data sample from which it was created. For internal validation, k-fold cross-validation or bootstrapping are the recommended methods.[29] Bootstrap resampling with 1,000 resamples revealed similar discrimination and calibration of the developed model.

To apply the risk model, predicting risk after identifying oral cavity cancer risk factors is crucial. Beta coefficients from these factors calculated risk scores in this study, ranging from 0 to 8, with a cutoff score of 2 achieving the highest sensitivity (76%) and specificity (72%). The pooled model, which included smoking history, scored from 0 to 9, also performing best at a cutoff of 2. Although its sensitivity (76%) and specificity (63%) were slightly lower than those of the multivariable model, this outcome was expected, as it was based on pooled estimates from various studies and assessed using a new dataset.

Risk prediction models should be based on cohort studies. For diseases like oral cancer, cohort studies can be costly and time-consuming due to long induction periods. Comprehensive databases with detailed information on participant risk factors are currently lacking in India. Thus, a case–control study is a viable alternative for developing a risk prediction model. The key issue with such studies is the selection of control.

The strength of this study was that it aimed to ensure controls accurately represented the disease's prevalence in the original population, selecting non-cancer patients from the same hospitals as the cases. Caregivers were involved to simplify the process and encourage patient participation while maintaining data confidentiality, which limited public involvement in later analyses. Propensity scores calculated by inverse probability weighting in this study help reduce dimensions with multiple confounders, addressing interactions and non-linearity—a strength over previous studies.[30] Minimal missing data occurred because a single examiner conducted direct interviews, thereby reducing bias and fostering rapport with participants. The findings indicated that an individual's socioeconomic status can mediate the effects of tobacco and alcohol consumption on oral cavity cancer causation. The models were well-calibrated and more accurate when stratified by economic factors class.

One limitation of case–control studies is the use of closed-ended questionnaires, which limit the collection of supplementary data on oral cavity cancer variables. However, focusing on a few specific factors minimizes recall bias.[31] The small sample size in each socioeconomic class, along with samples drawn from a single urban city in South India, restricts the model's generalizability, requiring validation on larger samples derived from multiple centers in the country. While we had data on the type of smoked form (cigarette or bidi/chutta) or smokeless form (Gutkha/Khaini or Quid without tobacco), we did not consider these individual forms for the risk scores due to limited sample sizes across the two categories. The most reliable test of a risk assessment model is external validation, which evaluates its statistical performance in comparable patient cohorts. This external validation can occur over different timeframes, locations, or populations.

There is a need for greater coverage of public health initiatives to prevent smoking, chewing tobacco, and alcohol consumption, as these increase oral cancer risk in the country. Effective tobacco control legislation and media campaigns have reduced tobacco use.[31] However, the rising popularity of carcinogenic products like areca nut and evasion of bans contributes to increased oral cancer rates. Future research should investigate their potential carcinogenic effects. Tobacco use in India is declining due to heightened awareness and local laws, but risk factors remain consistent. The increase in oral cancer among non-tobacco users necessitates further investigation[32] [33] Chronic trauma from poorly fitting dentures is another significant risk. Clinicians should ensure proper denture fit, educate patients, and monitor precancerous changes. Public health policies must improve dental care access, raise awareness, and implement preventive measures for high-risk groups, particularly the elderly and socioeconomically disadvantaged.[34] [35]

Conclusion

A predictive screening model utilizing risk scores was developed and validated in a population from a country with a high rate of oral cancer. Additional research is needed to confirm the model's effectiveness in different populations before it can be recommended to identify subgroups for more comprehensive screening strategies.

Conflict of Interest

None declared.

Authors' Contributions

M.M., P.K., and D.J. contributed to conception and design of the study. D.J. conceived the pooled analysis component which was used for external validation. M.M. led the contribution to data acquisition for the case––control study and for the pooled analysis component under the supervision of D.J. M.M. led the analysis and interpretation under the supervision of D.J. M.M. drafted the first manuscript and D.J. reviewed and modified the draft manuscript and prepared the final document. All authors critically reviewed the manuscript and gave final approval. M.M., P.K., and D.J. agree to be accountable for all aspects of work, ensuring integrity and accuracy.

Ethical Approval

The study has received ethics approval from Institutional Ethics Committee of MNJ Institute of Oncology and Regional Cancer Center, on August 04, 2022 (ECR/227/Inst/AP/2013/RR-19).

Data Availability Statement

Dataset can be obtained through written request to the corresponding author.

Patient Consent

Patient consent is not applicable for this study.

Supplementary Material

Supplementary Material (PDF) (opens in new window)

References
1 Borse V, Konwar AN, Buragohain P. Oral cancer diagnosis and perspectives in India. Sens Int 2020; 1: 100046

Crossref PubMed Search in Google Scholar
Download RIS citation
2 Dhillon PK, Mathur P, Nandakumar A. et al; India State-Level Disease Burden Initiative Cancer Collaborators. The burden of cancers and their variations across the states of India: the Global Burden of Disease Study 1990-2016. Lancet Oncol 2018; 19 (10) 1289-1306

Crossref PubMed Search in Google Scholar
Download RIS citation
3 Niaz K, Maqbool F, Khan F, Bahadar H, Ismail Hassan F, Abdollahi M. Smokeless tobacco (paan and gutkha) consumption, prevalence, and contribution to oral cancer. Epidemiol Health 2017; 39: e2017009

Crossref PubMed Search in Google Scholar
Download RIS citation
4 Swaminathan D, George NA, Thomas S, Iype EM. Factors associated with delay in diagnosis of oral cancers. Cancer Treat Res Commun 2024; 40: 100831

PubMed Search in Google Scholar
Download RIS citation
5 Joshi P, Nair S, Chaturvedi P, Nair D, Agarwal JP, D'Cruz AK. Delay in seeking specialized care for oral cancers: experience from a tertiary cancer center. Indian J Cancer 2014; 51 (02) 95-97

Crossref PubMed Search in Google Scholar
Download RIS citation
6 Coelho KR. Challenges of the oral cancer burden in India. J Cancer Epidemiol 2012; 2012: 701932

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Ray CS, Gupta PC. Oral cancer in India. Oral Dis 2024; 00: 1-16

Search in Google Scholar
Download RIS citation
8 Harnagea H, Lamothe L, Couturier Y, Emami E. How primary health care teams perceive the integration of oral health care into their practice: a qualitative study. PLoS One 2018; 13 (10) e0205465

Crossref PubMed Search in Google Scholar
Download RIS citation
9 Yadav A, Sabbarwal B, Tandon S, Chand S. Knowledge, attitude, and practices toward oral health among frontline healthcare workers in India. J Global Oral Health 2024; 0: 1-4

Search in Google Scholar
Download RIS citation
10 Ford PJ, Farah CS. Early detection and diagnosis of oral cancer: strategies for improvement. J Cancer Policy 2013; 1 (02) e2-e7

Crossref Search in Google Scholar
Download RIS citation
11 Su YF, Chen YJ, Tsai FT. et al. Current insights into oral cancer diagnostics. Diagnostics (Basel) 2021; 11 (07) 1287

Crossref PubMed Search in Google Scholar
Download RIS citation
12 Mocherla M, Krishnappa P. Early identification of people at high risk of oral cancer-A review of existing risk prediction models. J Family Med Prim Care 2024; 13 (08) 2851-2856

Crossref PubMed Search in Google Scholar
Download RIS citation
13 Gupta B, Bray F, Kumar N, Johnson NW. Associations between oral hygiene habits, diet, tobacco and alcohol and risk of oral cancer: A case-control study from India. Cancer Epidemiol 2017; 51: 7-14

Crossref PubMed Search in Google Scholar
Download RIS citation
14 Rao SK, Mejia GC, Logan RM. et al. A screening model for oral cancer using risk scores: development and validation. Community Dent Oral Epidemiol 2016; 44 (01) 76-84

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Freedman AN, Seminara D, Gail MH. et al. Cancer risk prediction models: a workshop on development, evaluation, and application. J Natl Cancer Inst 2005; 97 (10) 715-723

Crossref PubMed Search in Google Scholar
Download RIS citation
16 Fritz A, Percy C, Jack A. et al. International Classification of Diseases for Oncology. 3rd ed, 1st revision;. 2013. Geneva: World Health Organization; ; ISBN 978 92 4 069212 1

Search in Google Scholar
Download RIS citation
17 Mocherla M, Krishnappa P, John D. Risk factors associated with oral cancer: a hospital-based case-control study in Telangana state, India. Contemp Clin Dent 2025; 16 (01) 19-27

Crossref PubMed Search in Google Scholar
Download RIS citation
18 R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing; 2020. . Accessed February 12, 2026. https://www.R-project.org/

Search in Google Scholar
Download RIS citation
19 Mocherla M, Pushpanjali K, John D. Risk factors of oral cancer in India: a systematic review and meta-analysis. Indian J Med Paediatr Oncol Published online 2025

Thieme Connect
Download RIS citation
20 Ferlay J, Colombet M, Soerjomataram I. et al. Cancer statistics for the year 2020: An overview. Int J Cancer 2021; 149 (04) 778-789

Crossref PubMed Search in Google Scholar
Download RIS citation
21 von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Bull World Health Organ 2007; 85 (11) 867-872

Crossref PubMed Search in Google Scholar
Download RIS citation
22 Collins GS, Reitsma JB, Altman DG, Moons KG. TRIPOD Group. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Circulation 2015; 131 (02) 211-219

Crossref PubMed Search in Google Scholar
Download RIS citation
23 Gonde N, Rathod S, Kolte A, Lathiya V, Ughade S. Association between tooth loss and risk of occurrence of oral cancer – a systematic review and meta-analysis. Dent Res J (Isfahan) 2023; 20: 4

Crossref PubMed Search in Google Scholar
Download RIS citation
24 Janakiram C, Varghese N, Joseph J. Review of the correlation between social economic status and oral diseases in India. Amrita J Med 2020; 16 (04) 146

Crossref Search in Google Scholar
Download RIS citation
25 Conway DI, Purkayastha M, Chestnutt IG. The changing epidemiology of oral cancer: definitions, trends, and risk factors. Br Dent J 2018; 225 (09) 867-873

Crossref PubMed Search in Google Scholar
Download RIS citation
26 Warnakulasuriya S. Significant oral cancer risk associated with low socioeconomic status. Evid Based Dent 2009; 10 (01) 4-5

Crossref PubMed Search in Google Scholar
Download RIS citation
27 Maxim LD, Niebo R, Utell MJ. Screening tests: a review with examples. Inhal Toxicol 2014; 26 (13) 811-828

Crossref PubMed Search in Google Scholar
Download RIS citation
28 Lophatananon A, Usher-Smith J, Campbell J. et al. Development of a cancer risk prediction tool for use in the UK primary care and community settings. Cancer Prev Res (Phila) 2017; 10 (07) 421-430

Crossref PubMed Search in Google Scholar
Download RIS citation
29 Grant SW, Collins GS, Nashef SAM. Statistical primer: developing and validating a risk prediction model. Eur J Cardiothorac Surg 2018; 54 (02) 203-208

Crossref PubMed Search in Google Scholar
Download RIS citation
30 Chesnaye NC, Stel VS, Tripepi G. et al. An introduction to inverse probability of treatment weighting in observational research. Clin Kidney J 2021; 15 (01) 14-20

Crossref PubMed Search in Google Scholar
Download RIS citation
31 Deng Q, Yan L, Lin J. et al. A composite oral hygiene score and the risk of oral cancer and its subtypes: a large-scale propensity score-based study. Clin Oral Investig 2022; 26 (03) 2429-2437

Crossref PubMed Search in Google Scholar
Download RIS citation
32 Lahoti S, Dixit P. Declining trend of smoking and smokeless tobacco in India: a decomposition analysis. PLoS One 2021; 16 (02) e0247226

Crossref PubMed Search in Google Scholar
Download RIS citation
33 Krishnamurthy A, Ramshankar V. Early stage oral tongue cancer among non-tobacco users – an increasing trend observed in a South Indian patient population presenting at a single centre. Asian Pac J Cancer Prev 2013; 14 (09) 5061-5065

Crossref PubMed Search in Google Scholar
Download RIS citation
34 Chiesa-Estomba CM, Mayo-Yanez M, Vaira LA. et al. Oral cavity cancer secondary to dental trauma: a scoping review. Biomedicines 2024; 12 (09) 2024

Crossref PubMed Search in Google Scholar
Download RIS citation
35 Narang P, Dhoble A, Mathur M, Rana S, Mason S, Ali A. India's oral health outlook: challenges, economic impact and need for preventative strategies. Front Dent Med 2025; 6: 1544899

Crossref PubMed Search in Google Scholar
Download RIS citation

Address for correspondence

Denny John, BPT, MPH, MHA, PhD

Faculty of Life and Allied Health Sciences, MS Ramaiah University of Applied Sciences

New BEL Road, Bengaluru 560054, Karnataka

India

Email: dennyjohn.ah.ls@msruas.ac.in

Publication History

Article published online:
28 February 2026

© 2026. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Thieme Medical and Scientific Publishers Pvt. Ltd.
A-12, 2nd Floor, Sector 2, Noida-201301 UP, India

References
1 Borse V, Konwar AN, Buragohain P. Oral cancer diagnosis and perspectives in India. Sens Int 2020; 1: 100046

Crossref PubMed Search in Google Scholar
Download RIS citation
2 Dhillon PK, Mathur P, Nandakumar A. et al; India State-Level Disease Burden Initiative Cancer Collaborators. The burden of cancers and their variations across the states of India: the Global Burden of Disease Study 1990-2016. Lancet Oncol 2018; 19 (10) 1289-1306

Crossref PubMed Search in Google Scholar
Download RIS citation
3 Niaz K, Maqbool F, Khan F, Bahadar H, Ismail Hassan F, Abdollahi M. Smokeless tobacco (paan and gutkha) consumption, prevalence, and contribution to oral cancer. Epidemiol Health 2017; 39: e2017009

Crossref PubMed Search in Google Scholar
Download RIS citation
4 Swaminathan D, George NA, Thomas S, Iype EM. Factors associated with delay in diagnosis of oral cancers. Cancer Treat Res Commun 2024; 40: 100831

PubMed Search in Google Scholar
Download RIS citation
5 Joshi P, Nair S, Chaturvedi P, Nair D, Agarwal JP, D'Cruz AK. Delay in seeking specialized care for oral cancers: experience from a tertiary cancer center. Indian J Cancer 2014; 51 (02) 95-97

Crossref PubMed Search in Google Scholar
Download RIS citation
6 Coelho KR. Challenges of the oral cancer burden in India. J Cancer Epidemiol 2012; 2012: 701932

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Ray CS, Gupta PC. Oral cancer in India. Oral Dis 2024; 00: 1-16

Search in Google Scholar
Download RIS citation
8 Harnagea H, Lamothe L, Couturier Y, Emami E. How primary health care teams perceive the integration of oral health care into their practice: a qualitative study. PLoS One 2018; 13 (10) e0205465

Crossref PubMed Search in Google Scholar
Download RIS citation
9 Yadav A, Sabbarwal B, Tandon S, Chand S. Knowledge, attitude, and practices toward oral health among frontline healthcare workers in India. J Global Oral Health 2024; 0: 1-4

Search in Google Scholar
Download RIS citation
10 Ford PJ, Farah CS. Early detection and diagnosis of oral cancer: strategies for improvement. J Cancer Policy 2013; 1 (02) e2-e7

Crossref Search in Google Scholar
Download RIS citation
11 Su YF, Chen YJ, Tsai FT. et al. Current insights into oral cancer diagnostics. Diagnostics (Basel) 2021; 11 (07) 1287

Crossref PubMed Search in Google Scholar
Download RIS citation
12 Mocherla M, Krishnappa P. Early identification of people at high risk of oral cancer-A review of existing risk prediction models. J Family Med Prim Care 2024; 13 (08) 2851-2856

Crossref PubMed Search in Google Scholar
Download RIS citation
13 Gupta B, Bray F, Kumar N, Johnson NW. Associations between oral hygiene habits, diet, tobacco and alcohol and risk of oral cancer: A case-control study from India. Cancer Epidemiol 2017; 51: 7-14

Crossref PubMed Search in Google Scholar
Download RIS citation
14 Rao SK, Mejia GC, Logan RM. et al. A screening model for oral cancer using risk scores: development and validation. Community Dent Oral Epidemiol 2016; 44 (01) 76-84

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Freedman AN, Seminara D, Gail MH. et al. Cancer risk prediction models: a workshop on development, evaluation, and application. J Natl Cancer Inst 2005; 97 (10) 715-723

Crossref PubMed Search in Google Scholar
Download RIS citation
16 Fritz A, Percy C, Jack A. et al. International Classification of Diseases for Oncology. 3rd ed, 1st revision;. 2013. Geneva: World Health Organization; ; ISBN 978 92 4 069212 1

Search in Google Scholar
Download RIS citation
17 Mocherla M, Krishnappa P, John D. Risk factors associated with oral cancer: a hospital-based case-control study in Telangana state, India. Contemp Clin Dent 2025; 16 (01) 19-27

Crossref PubMed Search in Google Scholar
Download RIS citation
18 R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing; 2020. . Accessed February 12, 2026. https://www.R-project.org/

Search in Google Scholar
Download RIS citation
19 Mocherla M, Pushpanjali K, John D. Risk factors of oral cancer in India: a systematic review and meta-analysis. Indian J Med Paediatr Oncol Published online 2025

Thieme Connect
Download RIS citation
20 Ferlay J, Colombet M, Soerjomataram I. et al. Cancer statistics for the year 2020: An overview. Int J Cancer 2021; 149 (04) 778-789

Crossref PubMed Search in Google Scholar
Download RIS citation
21 von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Bull World Health Organ 2007; 85 (11) 867-872

Crossref PubMed Search in Google Scholar
Download RIS citation
22 Collins GS, Reitsma JB, Altman DG, Moons KG. TRIPOD Group. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Circulation 2015; 131 (02) 211-219

Crossref PubMed Search in Google Scholar
Download RIS citation
23 Gonde N, Rathod S, Kolte A, Lathiya V, Ughade S. Association between tooth loss and risk of occurrence of oral cancer – a systematic review and meta-analysis. Dent Res J (Isfahan) 2023; 20: 4

Crossref PubMed Search in Google Scholar
Download RIS citation
24 Janakiram C, Varghese N, Joseph J. Review of the correlation between social economic status and oral diseases in India. Amrita J Med 2020; 16 (04) 146

Crossref Search in Google Scholar
Download RIS citation
25 Conway DI, Purkayastha M, Chestnutt IG. The changing epidemiology of oral cancer: definitions, trends, and risk factors. Br Dent J 2018; 225 (09) 867-873

Crossref PubMed Search in Google Scholar
Download RIS citation
26 Warnakulasuriya S. Significant oral cancer risk associated with low socioeconomic status. Evid Based Dent 2009; 10 (01) 4-5

Crossref PubMed Search in Google Scholar
Download RIS citation
27 Maxim LD, Niebo R, Utell MJ. Screening tests: a review with examples. Inhal Toxicol 2014; 26 (13) 811-828

Crossref PubMed Search in Google Scholar
Download RIS citation
28 Lophatananon A, Usher-Smith J, Campbell J. et al. Development of a cancer risk prediction tool for use in the UK primary care and community settings. Cancer Prev Res (Phila) 2017; 10 (07) 421-430

Crossref PubMed Search in Google Scholar
Download RIS citation
29 Grant SW, Collins GS, Nashef SAM. Statistical primer: developing and validating a risk prediction model. Eur J Cardiothorac Surg 2018; 54 (02) 203-208

Crossref PubMed Search in Google Scholar
Download RIS citation
30 Chesnaye NC, Stel VS, Tripepi G. et al. An introduction to inverse probability of treatment weighting in observational research. Clin Kidney J 2021; 15 (01) 14-20

Crossref PubMed Search in Google Scholar
Download RIS citation
31 Deng Q, Yan L, Lin J. et al. A composite oral hygiene score and the risk of oral cancer and its subtypes: a large-scale propensity score-based study. Clin Oral Investig 2022; 26 (03) 2429-2437

Crossref PubMed Search in Google Scholar
Download RIS citation
32 Lahoti S, Dixit P. Declining trend of smoking and smokeless tobacco in India: a decomposition analysis. PLoS One 2021; 16 (02) e0247226

Crossref PubMed Search in Google Scholar
Download RIS citation
33 Krishnamurthy A, Ramshankar V. Early stage oral tongue cancer among non-tobacco users – an increasing trend observed in a South Indian patient population presenting at a single centre. Asian Pac J Cancer Prev 2013; 14 (09) 5061-5065

Crossref PubMed Search in Google Scholar
Download RIS citation
34 Chiesa-Estomba CM, Mayo-Yanez M, Vaira LA. et al. Oral cavity cancer secondary to dental trauma: a scoping review. Biomedicines 2024; 12 (09) 2024

Crossref PubMed Search in Google Scholar
Download RIS citation
35 Narang P, Dhoble A, Mathur M, Rana S, Mason S, Ali A. India's oral health outlook: challenges, economic impact and need for preventative strategies. Front Dent Med 2025; 6: 1544899

Crossref PubMed Search in Google Scholar
Download RIS citation

Permissions and Reprints

Supplementary Material

Supplementary Material (PDF) (opens in new window)

Related Journals

Subscribe to RSS

Share / Bookmark

Development and Validation of a Risk Assessment Tool for Detecting Oral Cavity Cancer in India: A Case–control Study Design Approach

Authors

Abstract

Introduction

Objectives

Materials and Methods

Results

Conclusion

Keywords

Introduction

Methods

Study Design

Patient and Participant Selection and Recruitment

Potential Predictors

Primary and Secondary Outcomes

Statistical Analysis

Derivation of the Risk Scores

Model Performance

Model Validation

Ethics

Results

Risk scores based on case–control study and pooled estimate

Performance metrics of risk scores derived from multivariable model, upper- and lower-class models, pooled model, and bootstrapped multivariable, upper and lower, and pooled models

Discussion

Conclusion

Conflict of Interest

Authors' Contributions

Ethical Approval

Data Availability Statement

Patient Consent

Supplementary Material

References

Address for correspondence

Publication History

References