Open Access
CC BY 4.0 · Pharmaceutical Fronts
DOI: 10.1055/a-2665-1298
Original Article

Quantitative Analysis of Pravastatin Sodium Polymorphs: a Comparative Study of Chemometric Techniques Combined with Powder X-ray Diffraction, Mid-Infrared, and Raman Spectroscopy

Yanyan Huang#
1   National Key Laboratory of Lead Druggability Research, Shanghai Institute of Pharmaceutical Industry, China State Institute of Pharmaceutical Industry, Shanghai, People's Republic of China
,
Chang Liu#
1   National Key Laboratory of Lead Druggability Research, Shanghai Institute of Pharmaceutical Industry, China State Institute of Pharmaceutical Industry, Shanghai, People's Republic of China
,
Hongjuan Pan
1   National Key Laboratory of Lead Druggability Research, Shanghai Institute of Pharmaceutical Industry, China State Institute of Pharmaceutical Industry, Shanghai, People's Republic of China
,
Dong Wang
2   BengBu Food and Drug Inspection Center, Antibiotic Room, Anhui, People's Republic of China
,
Jingjing Wei
3   National Institutes for Food and Drug Control, Chemical Drug Inspection Institute, Beijing, People's Republic of China
,
Jialiang Zhong
1   National Key Laboratory of Lead Druggability Research, Shanghai Institute of Pharmaceutical Industry, China State Institute of Pharmaceutical Industry, Shanghai, People's Republic of China
› Author Affiliations

Funding None.
 


Abstract

Pravastatin sodium (PS) is a hydrophilic statin lipid-lowering drug that reduces low-density lipoprotein levels in the blood by inhibiting the activity of 3-hydroxy-3-methylglutaryl coenzyme A reductase. It is known to exist in 17 crystalline forms, with some different crystalline forms overlapping in the powder diffraction patterns, making it difficult to control the purity of the crystalline forms. In this study, we aimed to determine the purity of PS crystals using powder X-ray diffraction (PXRD), mid-infrared (MIR) spectroscopy, and Raman techniques. The predictive ability of the partial least squares (PLS) model was constructed and assessed using SPXY, K_S, and Random methods at different partitioning ratios. PLS calibration curves were established based on the relationship between PXRD, MIR, and Raman data and the content of a solid forms of PS (PS-A) in different ranges (full and partial spectra) using different preprocessing algorithms such as multiplicative scattering correction, standard normal variable, Savitzky–Golay filtering, and derivative spectroscopy, or a combination of them. The results showed that the calibration model (y = 0.999x + 0.008 with R 2 = 0.999) established using the PXRD method was better, with a low detection limit (1.52%) and quantification limit (4.60%). In addition, by analyzing the testing results of the blind sample, it was found that the confidence intervals of the predicted values of MIR and Raman were wider, indicating a large uncertainty of their parameter estimation. Therefore, it will be better to select the calibration model established by the PXRD method to determine the purity of PS in actual production. This can provide more reliable methodological support for the quality control of pharmaceutical products.


Introduction

Polycrystallinity is a common phenomenon in solid drugs, wherein the same drug component presents different solid forms due to variations in molecular arrangement or conformation. The physicochemical properties of these different crystalline forms can influence the drug's processing characteristics and bioavailability and ultimately affect its overall efficacy.[1] [2] [3] [4] The thermodynamic stability of drug crystals can be changed depending on their chemical composition, which poses a significant challenge to maintaining the quality and efficacy of pharmaceuticals.[5] The U.S. Food and Drug Administration, in its draft guidance, recommends monitoring and controlling polymorphs in drug substances and drug products to ensure the reproducibility of drug production and quality.[6] [7]

Pravastatin sodium (PS) ([Fig. 1]) is a statin drug used for the treatment of lipid disorders, which acts mainly by inhibiting the activity of 3-hydroxy-3-methylglutaryl coenzyme A reductase.[8] PS tablets were first marketed in Japan in 1989 and then gradually marketed in the United States, Europe, and some other countries and regions. According to statistics, there are 17 crystal types of PS, with similar physicochemical properties,[9] and several companies are developing or have developed pharmaceutical ingredients (APIs) of PS optimized for different crystal types of preparation processes. PS-A and PS-D are two crystal types of PS found in the process of characterization of APIs. The former is stable and has a low moisture attraction, making it more conducive to the preparation, preservation, and prolongation of the effective period of drugs and is used as a medicinal crystal type in the pharmaceutical industry. However, the latter is more moisture-attractive, affecting the stability of PS. It is difficult to control the purity of the crystalline forms due to the partial overlap of the powder diffraction patterns of polycrystalline forms. Therefore, it is necessary to establish a quantitative model to control the purity of pharmaceutical crystalline forms. Based on the complexity of PS crystalline form, the focus of this paper is on the quantitative modeling of the pharmaceutical crystalline form A. Although the polymorphisms of PS have been investigated in several previous studies, quantitative analysis of these forms remains underexplored.

Zoom
Fig. 1 Chemical structure of pravastatin sodium.

There are several methods for quantitative analysis of crystallinity, including powder X-ray diffraction (PXRD),[10] [11] [12] [13] [14] differential scanning calorimetry (DSC),[14] [15] [16] [17] and vibrational spectroscopy, which is subdivided into near-infrared spectroscopy (NIR),[18] [19] [20] Raman spectroscopy (Raman),[19] [21] [22] [23] [24] mid-infrared (MIR) spectroscopy,[10] [19] [25] [26] solid-state nuclear magnetic resonance spectroscopy,[27] [28] and terahertz spectroscopy.[29] PXRD and DSC are commonly used and affected by factors such as selective orientation and sample filling. The combination of spectroscopic techniques, including NIR, MIR, and Raman spectroscopy, with chemometrics has become a hot research topic.[30] [31] [32] [33] The quantitative analysis of spectra relies on crystal stacking of different crystal types and the changes in molecular vibrations, but overlapping of characteristic peaks, blurring of spectral differences, and nonlinear relationships increase the difficulty of analysis. However, multivariate analysis models such as partial least squares regression (PLS) and principal component regression can solve those problems and achieve quantitative analysis by filtering valid information.[34] [35] [36] In addition, the raw spectra contain noise information. The raw spectral data need to be divided into datasets and preprocessed to reduce the impact of noise on the recognition accuracy. Dataset division methods include SPXY, K_S, and Random. Preprocessing methods include the multiplicative scattering correction (MSC), the standard normal variate (SNV), Savitzky–Golay filtering, and derivative spectra.[37] [38] [39]

This work aimed to establish a method for quantitative analysis of PS-A crystalline form. A review of the literature patents shows that the quantification of the crystalline form has not been reported. In this work, it is proposed to quantify PS-A in PS binary mixtures using PXRD, MIR, and Raman methods. The predictive ability of the PLS model was investigated by the SPXY, K_S, and Random methods at different division ratios. The PLS calibration curves were established based on the relationship between PXRD, MIR, and Raman data and PS-A content in different ranges (full and partial spectra) using different preprocessing algorithms including MSC, SNV, Savitzky–Golay filtering, and derivative spectroscopy, or combinations of them. A method suitable for quantifying the PS-A content in PS binary mixture was explored by comparing the performance of the calibration models established by different methods.


Material and Methods

Material and Sample Preparation

PS API (101240701∼101240706) was purchased from Shanghai Tianwei Biopharmaceutical Co., Ltd. (Shanghai, China) with a purity > 99%, whereas PS-A and PS-D were laboratory-made. They were characterized by PXRD, MIR, and Raman. PS-A and PS-D were sieved through a 100-mesh sieve and then mixed. To minimize sample homogeneity leading to differences in sampling, binary mixtures were prepared separately for each technique.

For the PXRD analysis, the samples were mixed using ultrasonication. A total of 40 mg binary mixture containing 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100% of PS-A, respectively, were prepared; the remaining mass balance being provided by crystalline type D. Appropriate amount of ether was added, the mixture was sonicated for 1 minute and then dried in an oven at 40°C for 90 minutes. Pure PS crystalline forms A and D were tested before and after pretreatment to exclude the possibility of any phase change during sonication and drying.

For MIR analysis, 1.3 mg of binary mixture containing 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100% of PS-A was prepared; the remaining mass was provided by PS-D. The mixture was ground with 200 mg of KBr for 3 minutes and pressed and set aside.

For Raman analysis, a 100 mg binary mixture containing 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100% of crystalline type A was prepared and the remaining mass balance was provided by crystalline type D. The samples were kept in polyethylene tubes, sealed with a sealing film, vortex-mixed for 5 minutes, and then placed in a thermostatic shaker and shaken at a constant temperature for 6 hours for backup.

The chemicals used in the test were analytically pure. The weighing method used in the experiments was the gravimetric method. For the weighing section, the PXRD and Raman samples were weighed using a Mettler Toledo 100,000 ppm balance with a minimum weight of 10.00 mg, and the MIR samples were weighed using a Mettler Toledo 1 million parts per million balance with a minimum weight of 1.136 mg.


Data Acquisition and Analysis

Data Acquisition

PXRD data were collected using a BRUKER D8 ADVANCE A25 powder X-ray diffractometer (Bruker, United States) at room temperature. Cu Kα rays (λ = 1.5418 Å) were used for the experiments, with a scanning range of 3 to 35°, a scanning step size of 0.02°, and a scanning time of 1 second for each step. Data were processed using Jade 6.5 software.

The MIR data were detected by an IRTracer-100 spectrometer of Shimadzu Corporation (Shimadzu, Japan) using the potassium bromide press method, and the samples were scanned in the range of 400 to 4,000 cm−1 with a scan number of 32 and a resolution of 2 cm−1.

Raman data were acquired with a DRX3xi Raman imaging microscope (Thermo Scientific, United States). The Raman microscope component of the DRX3xi utilizes a 532 nm laser with an output power of 8 mW, an exposure time of 0.01 seconds, a total of 50 exposures, scanning range of 50 to 3,400 cm−1.


Data Set Division and Division Ratio

Three algorithms, namely SPXY, K_S, and Random, were used to divide the dataset, and then a PLS model for the quantitative analysis of PS-A crystalline form was established to determine the advantages and disadvantages of the dataset division methods. In this study, the ratio of 1:4 to 1:1, the prediction set to the correction set, which is hereinafter referred to as H, was used to investigate the ratio of the dataset division. The dataset division ratios were investigated in terms of the root mean square error of the calibration set (RMSEC), root mean square error of the prediction set (RMSEP), and root mean square error of cross-validation (RMSECV). Correlation coefficient of calibration set (RC) and correlation coefficient of prediction set (RP) were evaluated comprehensively, and the optimal division method and the optimal division ratio were selected.


Data Analysis

Currently, there are many preprocessing algorithms, each with its own advantages and disadvantages. In this work, various preprocessing algorithms, including MSC, SNV, Savitzky–Golay filter, and Derivative Spectroscopy, were used to eliminate unimportant baseline (offset) interferences in the sample or to correct scattering effects and enhance the spectral signals of interest.

MATLAB 2020 was used for preprocessing analysis, combined with PLS regression to establish correction models for PXRD, MIR, and Raman. The PLS algorithm extracts the principal components of the independent variable (X) and the dependent variable (Y) at the same time, to maximize the interpretation of the correlation between X and Y while retaining the key information of the original variables, to efficiently establish a quantitative relationship between the spectral data and the crystal content. The optimal number of PLS factors was selected using complete cross-validation method (leave-one-out validation). The quality of the model was evaluated based on the correlation coefficient (R 2), RMSEC, RMSEP, and RMSECV. The formula, Equation ([1]), is as follows:

Zoom

Where у i represents the theoretical value; ŷ i the calculated value, and n the number of samples. The limit of detection (LOD) and limit of quantification (LOQ) were estimated as 3.3 and 10 times the standard deviation of the blank divided by the slope of the calibration curve, respectively. The standard deviation of the blank was replaced by the standard deviation of the lowest concentration for three measurements. To verify the accuracy of the calibration model, several samples with known PS-A concentrations were subjected to PXRD, MIR, and Raman for three measurements, respectively. LOD and LOQ were calculated using Equations [(2)] and [(3)].

Zoom
Zoom

Where σ is the standard deviation of the predicted content values, and S is the slope of the calibration curve.




Results and Discussion

Characterization of Pravastatin Sodium Crystal Types A and D

The solid forms of PS (PS-A and PS-D) were characterized using PXRD, MIR, and Raman spectroscopy, and the results are described below.

Powder X-ray Diffraction

As shown in [Fig. 2A], the characteristic peaks of PS-A (2θ = 3.9°, 4.5°, 6.2°, 7.3°, 8.6°, 9.2°, 10.0°, 11.7°, 12.0°, 17.0°, 19.9°) and PS-D (2θ = 3.5°, 6.3°, 9.7°, and 16.9°) was showed in the experimental PXRD diffractograms, which are consistent with the results of the patent.[9] For PS-A quantification, the average PXRD spectra of PS binary mixtures containing different PS-A content were collected. As shown in [Fig. 2B], the intensity of the characteristic peaks of PS-A in the binary mixtures becomes stronger as the increased in PA content.

Zoom
Fig. 2 PXRD overlay of (A) two solid forms of PS (PS-A and PS-D) and (B) binary mixture samples containing different content of PS-A. PS, pravastatin sodium; PXRD, powder X-ray diffraction.

Mid-Infrared

The MIR spectra of PS-A and PS-D solid powder samples, obtained in KBr, are shown in [Fig. 3A]. Although polycrystalline species show similar spectra, small differences can be detected. Martín-Islan et al resolved its IR pattern as follows.[40] The strong and broad band at 3,700–3,100 cm−1 is attributed to the ν (OH) vibration of the hydroxyl group in the pravastatin molecule and water molecule in the lattice. The band appearing at 3,040–2,800 cm−1 is attributed to the ν (C–H) vibration. The strong narrow band appearing at 1,727 cm−1 is attributed to the ν (C = O) vibration of the ester group. Both show this band at the same frequency, with the D-type showing a small shoulder peak at 1,711 cm−1. The maximum absorption peak is located at 1,569 cm−1 and is attributed to the ν (C = O) vibration of the carboxylate. Crystal type A shows a 7 cm−1 shift of the maximum (1,577 cm−1) at higher frequencies, and crystal type D shows an additional small shoulder at higher frequencies (1,606 cm−1). Several bands that can be attributed to δ (C–H) vibrations appear in the range 1,480–1,230 cm−1, where small differences are observed. The band appearing at 1,220–1,140 cm−1 can be attributed to the ν (C–O) vibration of the ester group, while the band appearing at 1,120–1,000 cm−1 corresponds to the ν (C–O) vibration of the hydroxyl group. In the range of 890–800 cm−1, the two differ in that type D peaks at 854 cm−1, while type A does not. Similarly, as can be seen in [Fig. 3B], the intensity of the characteristic peaks in the MIR varies with increasing PS-A content and is marked with an orange box.

Zoom
Fig. 3 MIR overlay of (A) two solid forms of PS (PS-A and PS-D) and (B) binary mixture samples containing different content of PS-A. MIR, mid-infrared; PS, pravastatin sodium.

Raman

Raman spectra of PS-A and PS-D solid powder samples, obtained at 532 nm laser wavelength, are shown in [Fig. 4A]. Although polycrystalline species show similar spectra, minor differences can be detected. The Raman spectra were analyzed as follows. The C–H stretching vibration in the range of 3,000–2,800 cm−1 belonged to methyl and methylene groups. The strong narrow band that appeared at 1,725 cm−1 was attributed to the C = O stretching vibration of ester groups. The maximum absorption peak at 1,647 cm−1 is attributed to the C = O stretching vibration of the carboxyl group, and both show this band at the same frequency. In the range of 1,300–1,000 cm−1 attributed to the C–O stretching vibration of the ester group. In the range of 200 to 50 cm−1, it can be seen that the difference between the two lies in the difference in the location of the peaks for both. For type A, there is an outgoing peak at 134 cm−1, whereas for type D, there is a double-shouldered peak. Similarly, in [Fig. 4B], it can be seen that the trend of the intensity of the Raman characteristic peaks with increasing/decreasing PS-A content is also marked with an orange box.

Zoom
Fig. 4 Raman overlay of (A) two solid forms of PS (PS-A and PS-D) and (B) binary mixture samples containing different content of PS-A. PS, pravastatin sodium.


Analysis of the Results of Different Dataset Division Methods and Division Ratio of the Sample

We explored the effect of the dataset on the PLS model under different division ratios. In this work, K_S and SPXY methods were used to divide the samples under different division ratios, respectively, and then establish the PLS quantitative model. When the interval of the H value is [0.25,1], the number of PXRD correction sets varies in the range of [16,10], starting from 16 and decreasing in steps of 1 until 10 to change the division ratio. The range of the number of MIR correction sets is [38,24], starting from 38 and decreasing in steps of 1 until it stops at 24 to transform the division ratio. The number of Raman correction sets ranges from [36,24], and the division ratio is transformed by reducing the number of Raman correction sets in steps of 1 from 36 to 24. The division results were evaluated one by one using PLS modeling values. The number of principal factors of the PLS model was determined based on leave-one-out cross-validation. In addition, the Random method cannot arbitrarily divide the number of correction sets and prediction sets, so the optimal ratio is not explored; only the performance of the Random method is compared with the other two methods.

Specific analysis content is included in the [Supplementary Material] (available in online version), and the optimal division ratio of different dataset division methods was obtained as shown in [Table 1], with Cal denoting the number of correction sets and Val denoting the number of prediction sets, and the results were assessed by comparing the RMSEP values under the optimal ratios of different partitioning methods. It is found that the SPXY partitioning method has the smallest RMSEP, and therefore is the optimal dataset partitioning method for the PLS models of PXRD, MIR, and Raman.

Table 1

Comparison of partial least squares modeling effects with different optimal ratios of dataset division

Data set partitioning method

Rc

Rp

RMSEC

RMSEP

RMSECV

Val/Cal

H

Level of model optimization (%)

PXRD

K_S

1.000

0.999

0.012

0.010

0.017

5/15

0.33

23

SPXY

1.000

0.999

0

0.007

0.016

4/16

0.25

42

Random

1.000

0.998

0

0.017

0.016

5/15

0.33

MIR

K_S

0.990

0.951

0.029

0.031

0.030

22/26

0.85

11

SPXY

0.989

0.948

0.029

0.024

0.032

13/35

0.37

51

Random

0.967

0.936

0.046

0.052

0.035

12/36

0.33

Raman

K_S

0.976

0.932

0.046

0.078

0.067

20/27

0.74

15

SPXY

0.953

0.976

0.066

0.042

0.078

11/36

0.31

35

Random

0.958

0.923

0.061

0.083

0.072

12/35

0.34

Abbreviations: PLS, partial least squares; PXRD, powder X-ray diffraction; Raman, Raman spectroscopy; MIR, mid-infrared; RMSEC, root mean squared error of calibration; RMSEP, root mean squared error of prediction; Cal, the number of correction sets; Val, the number of prediction sets.



Quantification of PS-A in Binary Mixtures

Raw spectral data generated by all of the techniques discussed below are presented in the [Supplementary Material] (available in online version).

Quantitative Modeling of Powder X-ray Diffraction

X-ray diffraction spectra of 16 different PS binary mixtures were scanned. The calibration curves were plotted. The raw PXRD data were preprocessed using a combination of MSC, SNV, Savitzky–Golay filtering, and derivative, respectively. The preprocessed graphs are shown in [Fig. 5]. After preprocessing, 3–35°, 3–5°, 3–5° and 6–14°, 3–5° and 15–20° of the 16 sets of PXRD data were selected for the quantitative determination of PS-A content using the PLS method to establish the calibration model. The results are shown in [Table 2].

Table 2

Data processing for different diffraction ranges for pravastatin sodium crystal form A quantification in binary mixtures

Region (°2θ)

Preprocessing

PLS factors

R 2 SEC

RMSEC (%)

R 2 SEP

RMSEP (%)

RMSECV (%)

3–35

MSC

6

1.000

0.018

0.999

0.661

1.656

SNV

3

0.995

2.014

0.998

1.449

2.393

S-G+ first derivative

3

0.997

1.611

0.991

1.708

4.736

S-G+ second derivative

3

0.997

1.664

0.979

3.258

4.919

3–5

MSC

4

0.999

0.869

0.994

1.382

2.007

SNV

3

0.993

2.450

0.990

3.313

3.473

S-G+ first derivative

2

0.987

3.499

0.997

1.018

5.804

S-G+ second derivative

2

0.980

4.275

0.992

1.562

6.147

3–5, 6–14

MSC

6

0.999

0.028

0.998

1.391

1.768

SNV

2

0.993

2.404

0.995

2.479

2.312

S-G+ first derivative

2

0.992

2.648

0.994

1.373

5.234

S-G+ second derivative

4

0.998

1.241

0.992

1.577

4.901

3–5, 15–20

MSC

3

0.999

1.103

0.987

1.998

3.248

SNV

3

0.994

2.373

0.972

3.327

3.009

S-G+ first derivative

5

0.999

0.734

0.993

1.457

5.933

S-G+ second derivative

2

0.993

2.503

0.947

3.922

5.037

Abbreviations: PLS, partial least squares; RMSEC, root mean squared error of calibration; RMSECV, root mean squared error of cross-validation; RMSEP, root mean squared error of prediction.


Zoom
Fig. 5 PXRD patterns of PS binary mixture samples with different pretreatment MSC, SNV, S-G + first derivative, and S-G + second derivative. PS, pravastatin sodium; PXRD, powder X-ray diffraction.

Based on the R 2 values in [Table 2], it is better to establish the PLS regression models for the four different diffraction ranges processed by MSC, SNV, Savitzky–Golay filtering, and derivative. Combining the RMSEC, RMSEP, RMSECV values, the PLS factor numbers, and the LOD, and LOQ values in [Table 3], it showed that MSC (3–35°), S-G + first-order derivatives (3–5°), SNV (3–5°, 6–14°), and SNV (3–5°, 15–20°) are better models. The actual (41.548, 76.231, and 94.701%) and predicted PS-A weight percentage values of the quantitative models are shown in [Table 3]. The best-performing calibration model was built using PLS after MSC at 3 to 5° and 6 to 14° ([Fig. 6]). Subsequently, confidence intervals and prediction intervals were computed for the calibration model; the confidence intervals were used to assess the stability of the model estimates, with narrower confidence intervals indicating greater stability and wider confidence intervals implying greater uncertainty. Prediction intervals provide confidence ranges for point predictions of new observations and assess prediction uncertainty and can also be used to identify potential outliers or outliers. Four samples from the prediction set were used to validate the accuracy of the constructed model, which was analyzed by MATLAB 2020 software to derive the predicted values, and the predicted values were used to fit with the true values to obtain the prediction results as shown in [Fig. 7A], which shows a narrower 95% confidence interval and prediction intervals suggesting that the model has a good degree of confidence and that the four samples of the prediction were all within the prediction intervals.

Zoom
Fig. 6 Quantitative calibration model of PS-A for PS binary mixtures. (A) PXRD. (B) MIR. (C) Raman. MIR, mid-infrared; PS, pravastatin sodium; PS-A, pravastatin sodium crystal form A; PS, pravastatin sodium; PS-A, pravastatin sodium crystal form A; PXRD, powder X-ray diffraction; Raman, Raman spectroscopy.
Zoom
Fig. 7 Quantitative prediction model of PS-A for PS binary mixtures. (A) PXRD. (B) MIR. (C) Raman. MIR, mid-infrared; PS, pravastatin sodium; PS-A, pravastatin sodium crystal form A; PXRD, powder X-ray diffraction; Raman, Raman spectroscopy.
Table 3

Actual and predicted PS-A weight percentage values based on quantitative modeling

Analytical technique

Preprocessing

LOD

LOQ

Verification Sample 1

Verification sample 2

Verification sample 3

PXRD

41.55

76.23

94.70

MSC (3–35°)

2.35

7.12

40.41

77.34

94.38

MSC (3–5°)

4.85

14.70

33.82

73.61

93.51

MSC (3–5°, 6-14°)

1.52

4.60

41.41

75.78

92.40

S-G + first derivative (3–5°, 15–20°)

4.13

12.52

42.55

75.94

94.26

MIR

41.58

76.11

90.73

MSC (4,000–400 cm−1)

3.52

10.66

44.98

70.69

90.78

SNV (898–666 cm−1)

7.47

22.63

42.15

69.17

94.31

MSC (1,790–1,000 cm−1)

5.32

16.12

36.90

65.53

86.37

S-G + second derivative (898–666 cm−1, 1,790–1,000 cm−1)

7.78

23.57

37.37

75.86

92.40

Raman

45.115

75.199

95.017

MSC (3,400–50 cm−1)

1.80

5.54

46.745

73.218

94.523

S-G + first derivative (400–50 cm−1)

3.97

12.02

43.083

73.250

92.172

G + second derivative (1,700–900 cm−1)

2.97

9.01

46.678

71.033

92.615

G + second derivative (400–50 cm−1, 1,700–900 cm−1)

2.83

8.57

47.591

74.320

93.599

Abbreviations: PS-A, pravastatin sodium crystal form A; PXRD, powder X-ray diffraction; Raman, Raman spectroscopy; MIR, mid-infrared.



Quantitative Modeling of Mid-Infrared

Thirty-five sets of MIR spectra of different PS binary mixtures were collected for calibration modeling. Before modeling, the raw MIR data were preprocessed using a combination of MSC, SNV, Savitzky–Golay filtering, and derivative spectroscopy, respectively, and the preprocessed results are shown in [Fig. 8]. After preprocessing of 35 sets of MIR data at 4,000–400 cm−1, 898–666 cm−1, 1,790–1,000 cm−1, and 898–666 and 1,790–1,000 cm−1, a corrective model was built using PLS to quantify the PS-A content, and the results are shown in [Table 4].

Table 4

Data processing for different mid-infrared spectral ranges for pravastatin sodium crystal form A quantification in binary mixtures

Region (cm−1)

Preprocessing

PLS factors

R 2 SEC

RMSEC (%)

R 2 SEP

RMSEP (%)

RMSECV (%)

4,000–400

MSC

6

0.989

2.866

0.948

2.414

3.227

SNV

5

0.986

3.237

0.936

2.683

3.163

S-G+ first derivative

4

0.988

2.897

0.388

3.013

2.372

S-G+ second derivative

6

0.997

1.560

0.731

2.539

2.827

898–666

MSC

3

0.982

3.651

0.866

2.752

2.793

SNV

3

0.982

3.678

0.866

2.744

2.458

S-G+ first derivative

3

0.975

4.290

0.781

3.860

3.269

S-G+ second derivative

5

0.986

3.137

0.853

3.290

3.152

1,790–1,000

MSC

4

0.990

2.654

0.965

1.528

3.425

SNV

5

0.993

2.283

0.957

1.680

3.381

S-G+ first derivative

4

0.987

3.023

0.560

3.224

2.046

S-G+ second derivative

3

0.968

4.806

0.289

5.060

2.511

898–666, 1,790–1,000

MSC

4

0.989

2.803

0.945

2.019

2.695

SNV

4

0.988

2.913

0.932

2.214

2.712

S-G+ first derivative

4

0.990

2.669

0.737

2.427

2.196

S-G+ second derivative

7

0.997

1.345

0.892

1.633

2.174

Abbreviations: PLS, partial least squares; PS-A, pravastatin sodium crystal form A; RMSEC, root mean squared error of calibration; RMSECV, root mean squared error of cross-validation; RMSEP, root mean squared error of prediction.


Zoom
Fig. 8 MIR profiles of PS binary mixture samples with different pretreatments (MSC, SNV, S-G+ first derivative, and S-G+ second derivative). MIR, mid-infrared; PS, pravastatin sodium.

According to the data in [Table 4], it can be seen that the correction models established for the four different spectral ranges treated with a combination of MSC, SNV, Savitzky–Golay filtering, and derivative spectroscopy are similar, with R 2 value, PLS factor number, RMSEC, RMSECP, and RMSECV showing good correction ability. However, by calculating its LOD and LOQ values, it was found that the model treated by MSC (4,000–400 cm−1) had the smallest LOD and LOQ values, which was theoretically the most suitable regression model. However, in conjunction with the actual quantitative effect of the model, the one processed by S-G + second-order derivatives (898–666, 1,790–1,000 cm−1) established the best-corrected model ([Fig. 6B]). Although the corrected model exhibits a good linear relationship with an R2 value of 0.997, showing a high degree of goodness of fit, the quality of the model's predictions is not satisfactory as shown by the results of the validation of the 13 samples of the prediction set, which is consistent with the wider confidence intervals ([Fig. 7B]). The wider confidence intervals indicate that the uncertainty of the prediction results is large, and the prediction accuracy of the model in practical applications needs to be improved. The reasons for this may be that the model overfits the data, the data itself are more variable, or the model fails to adequately capture the key features in the data.


Quantitative Modeling of Raman

Thirty-six sets of Raman spectra of PS binary mixtures were collected for calibration modeling. Before modeling, the raw data were preprocessed using MSC, SNV, Savitzky–Golay filter, and derivative spectral combination, respectively. The results are shown in [Fig. 9]. Four spectral regions, 3,400–50 cm−1, 400–50 cm−1, 1,700–900 cm−1, and 400–50 and 1,700–900 cm−1, were used for the PLS regression analysis. As illustrated in [Table 5], the four designated spectral regions are all deemed suitable for quantification. However, when combined with the actual quantification of the model, the optimal PLS model for quantifying PS-A in binary mixtures was obtained through MSC (3,400–50 cm−1) performed on the Raman data. MSC has proved to be useful as it eliminates any light scattering from the powder with RMSEC and RMSEP values of 6.634% and 4.228%, respectively. As shown in [Fig. 6C], the PS-A plot of prediction versus measurement showed an R2 value of 0.953. However, the model prediction is not very good through the 11 samples predicted, which is in line with the wider confidence intervals given in [Fig. 7C], which also suggests that the accuracy of the model is not very good, and the factor that affects this could be the insufficient sample size, as larger samples provide more information and reduce the estimated Uncertainty.

Table 5

Data processing for different Raman spectral ranges for pravastatin sodium crystal form A quantification in binary mixtures

Region (cm−1)

Preprocessing

PLS factors

R 2 SEC

RMSEC (%)

R 2 SEP

RMSEP (%)

RMSECV (%)

3,400–50

MSC

4

0.953

6.634

0.976

4.228

7.753

SNV

4

0.952

6.666

0.977

4.202

7.755

S-G+ first derivative

10

0.999

0.225

0.956

4.639

7.203

S-G+ second derivative

11

1.000

0.090

0.928

5.662

7.174

400–50

MSC

3

0.962

6.287

0.917

5.666

6.746

SNV

3

0.960

6.446

0.916

5.720

6.775

S-G+ first derivative

4

0.968

5.509

0.921

5.665

7.854

S-G+ second derivative

5

0.978

4.450

0.843

6.607

7.663

1,700–900

MSC

3

0.966

5.499

0.929

6.227

6.943

SNV

3

0.966

5.494

0.931

6.148

7.021

S-G+ first derivative

3

0.964

6.006

0.941

5.758

7.158

S-G+ second derivative

3

0.969

5.594

0.948

5.538

7.124

400–50, 1,700–900

MSC

3

0.953

6.628

0.965

5.133

7.350

SNV

4

0.966

5.676

0.960

5.519

7.344

S-G+ first derivative

2

0.946

7.395

0.942

5.696

7.470

S-G+ second derivative

5

0.992

2.888

0.965

4.444

7.683

Abbreviations: PLS, partial least squares; PS-A, pravastatin sodium crystal form A; RMSEC, root mean squared error of calibration; RMSECV, root mean squared error of cross-validation; RMSEP, root mean squared error of prediction.


Zoom
Fig. 9 Raman profiles of PS binary mixture samples with different pretreatments (MSC, SNV, S-G+ first derivative, and S-G+ second derivative). PS, pravastatin sodium.


Comparison of the Three Technologies

The aim of this study was to find the most suitable method for the quantitative determination of PS-A in binary mixtures. PXRD is a nondestructive test method. MIR compression is one of the most widely used methods for the determination of solid samples. Raman requires the least amount of sample preparation, and the measurements are noncontact and nondestructive. Each technique has its advantages and disadvantages. Sometimes a combination of two or three techniques is required for effective quantitative analysis. From the results, the models developed by PXRD, MIR, and Raman accurately predicted PS-A in binary mixtures, but PXRD was superior to spectroscopy. This is evident from the 95% confidence intervals and prediction intervals of the fitted curves, which are wider for both the MIR and Raman models and narrower for the PXRD model, with higher accuracy of the measurements. In addition, the PXRD model has lower LOD (1.52%) and LOQ (4.60%), and the RMSEC, RMSEP, and RMSECV values are significantly smaller than those of the MIR and Raman models. To further compare the accuracy of the three techniques, a set of validation samples containing different amounts of PS-A was analyzed ([Table 3]). For example, PXRD was the most accurate to determine validation sample 1, with the difference decreasing as the PS-A content increased. Overall, PXRD gave more accurate predictions.

PXRD, MIR, and Raman can all predict the polycrystalline transformation of PS with varying degrees of accuracy and specificity. However, the accuracy and specificity of PXRD are usually higher because of its ability to provide detailed crystal structure information, and the difference in prediction between MIR and Raman stems from the relatively low accuracy of their model development, resulting in less predictability than PXRD.


Application of the Methodology

To test the applicability of the established model, six batches of blind samples were selected for analysis. The measured data were imported into the established PXRD, MIR, and Raman models to obtain the predicted values, and 95% confidence intervals were calculated for the predicted results. As shown in [Table 6], the prediction results for the unknown samples demonstrate that the PXRD method yields results with a high degree of accuracy, with a range of 96 to 100%, which exceeds the 95% threshold. The calculated 95% confidence interval for this range was [97.877, 99.295]. The partial predictions of the MIR and Raman methods were relatively low, and their 95% confidence intervals were calculated as [93.377, 98.848] and [96.586, 99.777]. Compared with the PXRD method, the MIR and Raman confidence intervals were relatively wide, indicating a higher uncertainty in the parameter estimation. The PXRD method showed high accuracy and reliability in predicting the crystalline purity of PS owing to its high specificity and detailed crystal structure. In contrast, the MIR and Raman methods, although having some predictive ability, showed a higher uncertainty in the prediction results because of their sensitivity to the nature of the sample and spectral properties.

Table 6

Comparison of detection results of unknown samples determined by different methods

Batches

PS-A content/%

PXRD

MIR

Raman

101240701

96.925

96.445

96.931

101240702

99.798

91.640

95.315

101240703

97.389

95.105

98.042

101240704

99.417

100.527

99.720

101240705

98.150

93.516

100.956

101240706

98.686

99.442

98.125

Abbreviations: MIR, mid-infrared; PS-A, pravastatin sodium crystal form A; PXRD, powder X-ray diffraction; Raman, Raman spectroscopy.


Given the advantages of the PXRD method in determining the crystal form purity of PS, it is recommended that the quantitative model established by PXRD be utilized to determine crystal purity in subsequent studies.



Discussion

This study successfully developed quantitative modeling of the A crystalline content in binary mixtures of PS using PXRD, MIR, and Raman techniques combined with PLS and various preprocessing algorithms. The PXRD model demonstrated significant advantages, with an LOD value of 1.52% and an LOQ value of 4.60%, which is substantially better than other spectroscopic techniques. This result shows that PXRD is less affected by quantitative analysis due to the distinct diffraction peaks of the crystal structure, making it more accurate and reliable for the quantitative analysis of PS-A crystal form. In contrast, while NIR and Raman techniques are uniquely valuable in chemical analysis, their performance in quantifying PS-A crystal morphology is limited by the overlap of spectral signals, resulting in lower predictive power than PXRD.

However, there are certain limitations in the application of crystal form quantitative modeling of APIs in formulations. Although PXRD shows high accuracy and reliability in the quantitative analysis of the crystal form of APIs, the presence of excipients in formulations may bring complex background signals, which may interfere with the detection results of PXRD, thus affecting the accuracy of the quantitative analysis. It can be coupled with other analytical techniques, such as MIR and Raman, to reduce the effect of this interference. These spectroscopic techniques provide information on the crystalline form of the drug and the molecular structure of the excipients, reduce excipient interferences, detect the signals of low levels of APIs, and analyze the spatial distribution of the drug and excipients with the help of microscopic imaging techniques to further improve the accuracy of quantitative analysis. When building quantitative models of complex polycrystals, it is crucial to thoroughly assess the generalization capabilities of the model to ensure its reliability and accuracy in different application scenarios, such as screening and optimizing polycrystals in APIs, monitoring crystallization during manufacturing processes, and quantifying crystalline forms in complex formulations. The complexity of the crystal structure, the diversity of the formulation process, the wide distribution of the data, and the ability of the model to predict new crystalline forms all affect the model's ability to generalize. The physical and chemical properties of APIs in different crystal forms vary widely and are susceptible to transformation during processing. Variables in the formulation process can also lead to instability and diversity of crystals, thus affecting the ability of the model to generalize. To improve the generalization ability of the model, it is necessary to combine more types of samples, covering samples from different sources, different preparation methods, and under different storage conditions. This not only helps to improve the prediction accuracy of the model, but also enhances its reliability in real-world applications, thereby improving the model's adaptability and robustness in complex tasks.


Conclusion

In this study, the quantitative modeling of the A crystalline content in the binary mixture of PS was successfully developed by PXRD, MIR, and Raman techniques in combination with PLS and various preprocessing algorithms. The PXRD model showed higher accuracy and specificity, and its limits of detection and quantification were 1.52 and 4.60%, which were significantly better than those of the MIR and Raman models, respectively. The difference in prediction between the MIR and Raman models may stem from the lower accuracy of their model development, resulting in less predictability than PXRD. In addition, the developed models were tested with blind samples, and from the test results, the confidence intervals of the predicted values of MIR and Raman were wider and the uncertainties of the parameter estimation were larger compared with those of PXRD, so the calibration model developed by the PXRD method was chosen to be used for the determination of the crystalline purity of PS in the actual production, and to provide reliable methodological support for the quality control of the pharmaceutical products.



Conflict of Interest

None declared.

Acknowledgments

In the process of completing the Pravastatin Sodium API Project Research, we have received support and help from many parties, especially thanks to Shanghai Tianwei Biopharmaceutical Co., Ltd., for the gift of Pravastatin Sodium API, which provides the basis for the smooth progress of the project. This work is supported by the National Key Laboratory of Lead Druggability Research, which is an important component of the construction of the “Shanghai Municipal Professional Service Platform for Drug Solid State and Quality Control Technology” (23DZ2292600). The successful completion of the Pravastatin Sodium API project has laid a solid foundation for subsequent drug development and research.

Supporting Information

This section includes (1) preparation process of pravastatin sodium A crystal, pravastatin sodium D crystal form, and binary mixture samples; (2) sample collection and preprocessing; (3) analysis of the results of different dataset division methods and division ratio of the sample; (4) quantitative model construction and evaluation; (5) the plot of PXRD raw data, mapping of MIR raw data and Raman raw data and trend plot of PLS modeling results at different scales of KS method and SPXY method ([Supplementary Figs. S1]–[S5] [available in online version]); and (6) samples used to build and validate quantitative PXRD models, MIR models, and Raman models ([Supplementary Tables S1]–-[S3] [available in online version]).


# These authors contributed equally to this work.


Supplementary Material


Address for correspondence

Jingjing Wei, Master's Degree
National Institutes for Food and Drug Control
31 Huatuo Road, Beijing 102629
People's Republic of China   

Jialiang Zhong, PhD
National Key Laboratory of Lead Druggability Research, Shanghai Institute of Pharmaceutical Industry, China State Institute of Pharmaceutical Industry
285 Gebaini Road, Shanghai 201203
People's Republic of China   

Publication History

Received: 06 March 2025

Accepted: 24 July 2025

Article published online:
22 August 2025

© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany


Zoom
Fig. 1 Chemical structure of pravastatin sodium.
Zoom
Zoom
Zoom
Zoom
Fig. 2 PXRD overlay of (A) two solid forms of PS (PS-A and PS-D) and (B) binary mixture samples containing different content of PS-A. PS, pravastatin sodium; PXRD, powder X-ray diffraction.
Zoom
Fig. 3 MIR overlay of (A) two solid forms of PS (PS-A and PS-D) and (B) binary mixture samples containing different content of PS-A. MIR, mid-infrared; PS, pravastatin sodium.
Zoom
Fig. 4 Raman overlay of (A) two solid forms of PS (PS-A and PS-D) and (B) binary mixture samples containing different content of PS-A. PS, pravastatin sodium.
Zoom
Fig. 5 PXRD patterns of PS binary mixture samples with different pretreatment MSC, SNV, S-G + first derivative, and S-G + second derivative. PS, pravastatin sodium; PXRD, powder X-ray diffraction.
Zoom
Fig. 6 Quantitative calibration model of PS-A for PS binary mixtures. (A) PXRD. (B) MIR. (C) Raman. MIR, mid-infrared; PS, pravastatin sodium; PS-A, pravastatin sodium crystal form A; PS, pravastatin sodium; PS-A, pravastatin sodium crystal form A; PXRD, powder X-ray diffraction; Raman, Raman spectroscopy.
Zoom
Fig. 7 Quantitative prediction model of PS-A for PS binary mixtures. (A) PXRD. (B) MIR. (C) Raman. MIR, mid-infrared; PS, pravastatin sodium; PS-A, pravastatin sodium crystal form A; PXRD, powder X-ray diffraction; Raman, Raman spectroscopy.
Zoom
Fig. 8 MIR profiles of PS binary mixture samples with different pretreatments (MSC, SNV, S-G+ first derivative, and S-G+ second derivative). MIR, mid-infrared; PS, pravastatin sodium.
Zoom
Fig. 9 Raman profiles of PS binary mixture samples with different pretreatments (MSC, SNV, S-G+ first derivative, and S-G+ second derivative). PS, pravastatin sodium.