Homeopathy
DOI: 10.1055/a-2668-1135
Brief Communication

Quantifying Symptom Similarity in Homeopathy Using Cosine Similarity Metrics

Authors

  • Kurian Poruthukaren

    1   Department of Repertory, Yenepoya Homoeopathic Medical College, Yenepoya (Deemed to be University), AYUSH Campus, Mangalore, Karnataka, India
  • Jeenu Joseph

    2   Department of Forensic Medicine and Toxicology, Yenepoya Homoeopathic Medical College, Yenepoya (Deemed to be University), AYUSH Campus, Mangalore, Karnataka, India
  • Arun K. John

    1   Department of Repertory, Yenepoya Homoeopathic Medical College, Yenepoya (Deemed to be University), AYUSH Campus, Mangalore, Karnataka, India
  • Shivaprasad Kotian

    3   Yenepoya Homoeopathic Medical College and Hospital, Yenepoya (Deemed to be University), AYUSH Campus, Mangalore, Karnataka, India
  • Fathimath Henna Thilleri Puthyaveettil

    3   Yenepoya Homoeopathic Medical College and Hospital, Yenepoya (Deemed to be University), AYUSH Campus, Mangalore, Karnataka, India

Homeopaths follow the principle of symptom similarity when prescribing remedies. They compare the patient-reported symptoms with the known remedy symptoms mentioned in the homeopathic materia medica and select the remedy demonstrating the highest degree of similarity. Homeopaths use repertories—a remedy diagnostic tool—to compare the symptoms. Repertories, after repertorization, yield a ranked list of remedies that have relevance to the patient's symptoms. However, repertories have inherent limitations, including structural inconsistencies and susceptibility to subjective errors.[1] These limitations reduce their diagnostic accuracy.[2] Moreover, homeopaths should have a solid working knowledge of repertories' structure and content to use them effectively. To overcome these limitations, we introduce a novel approach that integrates cosine similarity with Sentence-BERT (SBERT)[3] to quantify semantic similarity between patient and remedy symptoms.

Cosine similarity is a mathematical tool used in text analysis to measure the similarity between two documents.[4] Hence, the information retrieval systems use it extensively.[5] [6] It calculates the cosine of the angle between two vectors in a multi-dimensional vector space. In data retrieval and natural language processing, the vectorized representations of textual information help determine how similar two pieces of text are.

In the context of homeopathy, we propose using cosine similarity to estimate the degree of similarity between two symptom vectors: one representing the patient's symptoms and the other representing the symptoms of a remedy. Simply put, we can convert a symptom description into a list of numbers, known as a vector. This process, called vectorization, converts a symptom into a numerical format that a computer can understand. Specifically, it transforms each piece of text into a vector, where each number captures a feature or pattern in the text. Cosine similarity measures the angle between such vectors. If the angle is slight (close to 0 degrees), the cosine value is close to 1, indicating vectors pointing in the same direction, meaning texts are similar. If the vectors point in very different directions (angle close to 90 degrees), the cosine value approaches 0, meaning the texts are dissimilar.

The formula for calculating cosine similarity between two vectors A and B is:

Let us consider a simplified case to illustrate the cosine similarity estimation. See [Table 1] for the symptoms and their vectors.

Table 1

Symptom vectors illustrating the partial overlap between a remedy symptom 'Headache aggravated by sun' and a patient symptom 'Headache'

Symptoms

Remedy vector (A)

Patient vector (B

Aggravation from sun

1*

0

Headache

1

1

* We assign binary values to the presence or absence of symptoms.


The symptom vectors are: A = [1, 1]; B = [0, 1]. The dot product A∙B = (1 × 1) + (0 × 1) = 1. The magnitude of the symptom vectors is calculated as follows:

Zoom
Zoom

Therefore, cosine similarity is , indicating moderate–high similarity.

However, in a real-world scenario, the symptoms have rich semantic subtleties. Therefore, we integrated cosine similarity with a machine-learning framework for natural language processing called Sentence-BERT (SBERT). In our novel approach, SBERT performs the vectorization of symptoms. Unlike traditional methods that look only at word frequency, SBERT is trained to understand the context, meaning and relationships between words in a sentence. For example, it recognizes that 'Burning pain in the stomach at night' and 'Gastric burning sensation at night' are related in meaning, even if the words differ. The result is that similar symptom descriptions are converted into similar vectors in a high-dimensional space, allowing computers to compare them mathematically. When two vectors point in the same direction, the two symptom descriptions express nearly the same idea. Thus, cosine similarity integrated with SBERT gives us a numerical measure of how semantically close two symptoms are, even if phrased differently. SBERT generates dense vectors for each symptom. Let us consider the above-mentioned symptom examples to illustrate the vectorization by SBERT. See [Table 2] for the symptoms and vectors.

Table 2

Dense vector representations of symptom descriptions using Sentence-BERT

Symptom descriptions

Vector representation (truncated)*

Burning pain in the stomach at night

(0.10, 0.01, −0.04, 0.11, …)

Gastric burning sensation at night

(0.09, −0.01, −0.02, 0.08, …)

Cosine Similarity of symptom descriptions

0.84**

* Sentence-BERT created a 384-dimensional numerical representation for each symptom description. Only the first 4 dimensions are shown in the table.


** The estimation of cosine similarity using Sentence-BERT reflects the semantic similarity between the symptom descriptions.


We placed the cosine similarity values into the following categories: the value 0 indicates no similarity; values 0.01 to 0.49 indicate low similarity; values 0.5 to 0.69 indicate moderate similarity; values 0.7 to 0.99 indicate moderate–high similarity; value 1 indicates perfect similarity. For ease of interpretation, cosine similarity values ranging from 0 to 1, we suggest expressing the values as percentages by multiplying them by 100.

We developed a Python script to automate SBERT vector-ization and cosine similarity estimation. We can extend this approach to any number of homeopathic symptoms in a patient. We define a symptom as an expression of illness comprising three core elements: location, sensation and modalities. To improve the accuracy of this approach, we suggest structuring the patient's symptoms and the remedy's symptoms using the same core elements.

The method has practical implications for homeopathic practice. We offer a software tool that allows physicians to compare a patient's symptoms directly with the symptom profiles in the materia medica, without relying on repertorization or rubrics. Instead of selecting rubrics manually, the physician can input a detailed list of patient symptoms, and the tool will return the 10 most similar remedies based on cosine similarity scores. This approach enables a more objective and scalable process, allowing for rapid comparison of symptoms with hundreds of remedy profiles. Ultimately, it may simplify the remedy selection process, reduce dependence on repertorial structure and support more accurate individualized prescribing.

Our novel approach belongs to the domain of artificial intelligence and incorporates mathematically intensive processes. However, the software handles these computations entirely, making the method user-friendly. In contrast to more 'straightforward' artificial intelligence approaches, which may rely on rigid rules or non-transparent prediction models, our method is transparent, reproducible, and grounded in interpretable mathematics. The transparency enables practitioners to understand precisely how symptom similarity is estimated, which is essential for trust and reliability.

We successfully filed a patent application for the Python script in Chennai, India (file number 202541054428). We invite discussion and feedback from the scientific and homeopathic communities to further refine and apply this approach in clinical practice.



Publication History

Received: 10 June 2025

Accepted: 28 July 2025

Article published online:
13 November 2025

© 2025. Faculty of Homeopathy. This article is published by Thieme.

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany