CC BY-NC-ND 4.0 · Yearb Med Inform 2018; 27(01): 098-109
DOI: 10.1055/s-0038-1667083
Section 4: Sensor, Signal and Imaging Informatics
Georg Thieme Verlag KG Stuttgart

Deep Learning on 1-D Biosignals: a Taxonomy-based Survey

Nagarajan Ganapathy
1  Peter L. Reichertz Institute for Medical Informatics, University of Braunschweig — Institute of Technology and Hannover Medical School, Braunschweig, Germany
2  Indian Institute of Technology Madras, Chennai, India
Ramakrishnan Swaminathan
2  Indian Institute of Technology Madras, Chennai, India
Thomas M. Deserno
1  Peter L. Reichertz Institute for Medical Informatics, University of Braunschweig — Institute of Technology and Hannover Medical School, Braunschweig, Germany
› Author Affiliations
Further Information

Correspondence to

Thomas M. Deserno

Publication History

Publication Date:
29 August 2018 (online)



Objectives: Deep learning models such as convolutional neural networks (CNNs) have been applied successfully to medical imaging, but biomedical signal analysis has yet to fully benefit from this novel approach. Our survey aims at (i) reviewing deep learning techniques for biosignal analysis in computer- aided diagnosis; and (ii) deriving a taxonomy for organizing the growing number of applications in the field.

Methods: A comprehensive literature research was performed using PubMed, Scopus, and ACM. Deep learning models were classified with respect to the (i) origin, (ii) dimension, and (iii) type of the biosignal as input to the deep learning model; (iv) the goal of the application; (v) the size and (vi) type of ground truth data; (vii) the type and (viii) schedule of learning the network; and (ix) the topology of the model.

Results: Between January 2010 and December 2017, a total 71 papers were published on the topic. The majority (n = 36) of papers are on electrocariography (ECG) signals. Most applications (n = 25) aim at detection of patterns, while only a few (n = 6) at predection of events. Out of 36 ECG-based works, many (n = 17) relate to multi-lead ECG. Other biosignals that have been identified in the survey are electromyography, phonocardiography, photoplethysmography, electrooculography, continuous glucose monitoring, acoustic respiratory signal, blood pressure, and electrodermal activity signal, while ballistocardiography or seismocardiography have yet to be analyzed using deep learning techniques. In supervised and unsupervised applications, CNNs and restricted Boltzmann machines are the most and least frequently used, (n = 34) and (n = 15), respectively.

Conclusion: Our key-code classification of relevant papers was used to cluster the approaches that have been published to date and demonstrated a large variability of research with respect to data, application, and network topology. Future research is expected to focus on the standardization of deep learning architectures and on the optimization of the network parameters to increase performance and robustness. Furthermore, application-driven approaches and updated training data from mobile recordings are needed.


1 Introduction

Biosignals are electrical, mechanical, thermal, or other signals measured over the time from the human body or from other organic tissue. They became applicable for medical diagnoses in 1895 when Willem Einthoven invented electrocardiography (ECG) as a clinical usable, non-invasive device. An ECG device measures the electrical activity of the heart muscle and depicts the complete cardiac cycle on an individual heartbeat using electrical polarization-depolarization patterns of the heart[1]. Since then, a huge variety of signals have been discovered that can be derived from the surface (skin) or from inside the human body. Prominent examples include electroencephalography (EEG), that depicts the activity of the brain recording voltage fluctuations from the scalp that result from ionic current within the neurons of the brain[2]; electromyography (EMG), that records the electric potential generated by muscle cells when these cells are electrically or neurologically activated[3]; photoplethys-mography (PPG), that depicts the volumetric changes of an organ (e.g., the microvascular bed under the skin) over the time by recording changes in light absorption[4]; or ballistocardiography (BCG), that monitors the heart activity recording ballistic forces (acceleration) on the chest[5].

Initially, analysis of biosignals was done purely manually. In the early 1980s, low-level signal processing was applied for noise reduction and filtering. Then, feature extraction and classification were implemented. However, these early systems were time-consuming and suffered from an unreliable accuracy[6].

Later from the 1990s, time-series models and supervised expert systems were used for feature extraction, and statistical classifiers were applied to support diagnosis. Over the last few decades, automated analysis of biosignals has turned into a core component for computer-aided diagnosis (CAD) and clinical decision-making. However, existing approaches are not effective for high-dimensional, more complex, and real-world noisy data that is continuously monitored using portable devices[7]. Therefore, the major goal of current research is to increase accuracy and speed of diagnostic systems towards event prediction from real-time signal analysis[7] [8].

Artificial intelligence and machine learning help in automated and effective analysis of medical data[9]. Neural networks are one of the well-known techniques used to develop high-level expert systems for solving a wide range of medical tasks such as clustering, detection, and recognition of diseases[10]. Traditionally, most expert systems rely on hand-crafted features. As in many papers[10] [11] [12], we refer to “hand-crafted features” when the raw data is transformed before it is entered to the input layer of the neural network, and this transformation is performed or decided by a human. However, biosignals are generally non-linear, non-stationary, dynamic, and complex in nature[13]. Handcrafted or manually selected features are time-consuming, not optimal, domain-specific, and they require specific expert knowledge[6].

Neurons are the basic processing units in a neural network and they perform a non-linear transformation of the data input from neurons connected in the previous layer. Such a structure is incapable of processing raw biosignals[14]. Therefore, automated extraction and selection of task-specific as well as robust features are necessary to solve the complex real-world problems[15].

Deep learning is a machine learning approach that is based on a deep network architecture composed of multiple hidden layers. We have considered machine learning, disregarding whether it is performed supervised or unsupervised, as “traditional” if it is composed of five or less hidden layers. Contrarily with deep learning, feature extraction and selection are performed within the network that is fed with raw (or low level-processed) data but not with handcrafted features. Each hidden layer transforms the data into representations that are learned automatically using a general learning procedure[16]. Outstanding performance has been obtained on a various number of benchmark datasets. In particular, convolutional neural networks (CNNs) have been designed for solving complex image analysis tasks[15] [17]. Such networks may be composed of several millions of neurons, which are interconnected in a two-dimensional (2-D) matrix-like structure of neurons and hence, can perform spatial convolutions within their internal structure. However, in supervised learning, a huge number of training data is required for the millions of parameters, which are usually not available in the medical domain. Medical applications solve that problem using pre-trained networks from other domains, and they have demonstrated outstanding results[18]. Inherent to this concept, the ilter coeficients of convolution operation that have been used previously for the handcrafting of features are determined intrinsically by the network.

However, most biosignals do not provide any 2-D structure, and as a result, deep learning models have not been used much in biosignal analytics. Some preliminary research has achieved positive outcomes for the analysis of biomedical signals using deep learning approaches. Recently, Kira-nyanz et aL,[19] have proposed 1-D CNN for ECG signal analysis. Similarly, recurrent neural networks (RNNs) are used to describe time-dependency in time-series data, namely phonocardiography (PCG) signals.

This survey offers a comprehensive overview of deep learning models applied to 1-D biosignals in both a methodology-driven and an application-focused perspective. In many papers, EEG is considered as a 2-D signal. The same holds for biosignals such as functional magnetic resonance imaging (fMRI) and magnetoencephalog-raphy (MEG) signals. To focus the review on 1-D biosignals, we have excluded such matrix-based spatial measures.


2 Characteristics of Biosignals

All biosignals are recorded as sampled data points over a period of time. Due to the intrinsic properties of the biological systems, physiological biosignals are highly irregular, high-dimensional, composed from multi-components, non-stationary, and heterogeneous[13] [20]. In this section, we discuss the technical and clinical properties of biosignals and focus on the complexity to explain the necessity of deep learning algorithms.

2.1 Technical Properties

The human body is a complex electro-mechanical system composed of affective, perceptual, and cognitive physiological processes. Dynamic changes can be recorded as biosignals. These signals vary continuously over time and reflect the clinical state of the human body[21]. Most biosignals include electrical activities and conductance as well as measurements of flow, volume, temperature, pressure, sound, and acceleration[22] [23] [24]. The signal's cut-off frequency, sampling rate, number of channels (in ECG also referred to as leads) and other technical characteristics are spanning large ranges of dimensions ([Table 1]). For instance, the signal frequencies range from 0.05 Hz up to 5,000 Hz (order of 106) and the recording duration spans (milli-)seconds up to the human lifetime (100 years = 3,1536×109 s).

Table 1

Characteristics and technical parameters of biosignals.


Number of channels

Signal frequency [Hz]

Recording frequency [Hz]

Amplitude level [mV]

Quantization [bits]

Recording duration


1 – 12

0.05 – 150

250 – 1,000

0.1 – 5


10s – 24h


1 – 32

25 – 5,000

512 – 10,000

0.1 – 100


30s – 24h



10 – 400

1 – 2,000

- 2 – 2


0.05s – 24h



0.25 – 40

5 – 500

-10 – 10


120s – 24h



1 – 20

1 – 20

- 0.05 – 0.05


2s – 24h

Skin temperature


1 – 200

2 – 50,000

- 50 – 50


60s – 24h

Skin conductance


0.1 – lé

16 – 128

0 – 100 piS


120s – 24h


2.2 Clinical Properties

The physiological and clinical characteristics of biosignals are more diverse:

  • ECG is recorded non-invasively and is quasi-periodic and often multi-component in nature. Any change in the rhythm, heart rate, and the pattern is used for the diagnosis of heart-related diseases[25] [26]. Today, smart clothes supporting continuous ECG recording are available and require real-time analysis and event prediction[7].

  • EMG is recorded invasively (needle EMG) or non-invasively (surface EMG) and is non-linear, non-stationary, and multi-component in nature. EMG supports the diagnosis of neuromuscular disorders such as muscle fatigue myelitis and McArdle disease[27].

  • PCG is recorded non-invasively and is periodic and multi-component in nature. It is a method to record the sounds and murmurs produced by the heart. PCG is considered as one of the simplest biomarkers for the detection of various heart diseases[26].

  • PPG is recorded non-invasively using low-cost oximeters. The PPG waveform is complex due to its dependence on the thickness of skin, the portion of muscle ibers in the tissue, and the amount of fat. PPG is sensitive to any heart irregularity[28] and considered as reliable biomarker for detection of cardiac arrhythmias.

  • BCG is recorded non-invasively yielding noisy, quasi-periodical data. It is a method to record the micro-vibrations of the body due to the pumping ofblood from the heart during the systole and the movement of blood through the veins[5] [29].

  • Other biosignals include electrooculogra-phy (EOG), continuous glucose monitoring (CGM), acoustic respiratory signals (ARS), blood pressure (BP), electrodermal activity (EDA), and skin temperature. They are used for detection, clustering, and diagnostics of various diseases. For example, EDA indicates emotional states[10].


2.3 Complexity

The complexities of physiological biosignals are remarkable and several obstacles are hindering their automatic analysis.

  • Big data: most of physiological signals are individual continuous time-dependent information that may change dramatically in pathological situations (variety). Today, ECG is recorded continuously (velocity) over 24/7 intervals delivering up to 4 GB of uncompressed data (volume)[7]. Although for the application phase, signals are typically decomposed into epochs, and hence, volume and velocity do not matter, the sampling rate has a high impact on the volume ofthe data during the training phase.

  • Device specificity: most approaches described in the literature have abruptly failed when applied to a different dataset within the same class of physiological signals[30]. Generalized frameworks for similar physiological signals are required that are irrespective of the device, its sampling rate, the acquisition protocol, subjects, and regional varieties.

  • Domain specificity: the analysis of data in one specific domain is often not enough to describe the clinical significance[31]. The multi-modal signal analysis is another challenge that has not yet been addressed accordingly. CAD systems often are domain-specific, i.e., they do not generalize to other application domains.

  • Noise: there are several sources of noise, artifacts, and dropout periods in biosignals. This issue is recognized as one of the serious challenges for automated clinical diagnosis[32]. The wavelet transforms or adaptive filters reduce noise but also lower the dimensionality and originality of the data. For instance, most R-wave detection algorithms fail when applied to noisy recordings[33], although R-wave forms are the most recognizable pattern in ECG.

  • Real-time requirements: physiological signals have high levels of morphological and temporal variation. Previous approaches in the literature are not eficient enough to detect the variation instantaneously. Also, automated event prediction from long-term recordings is very critical[7]. Furthermore, mobile data recording requires processing on mobile devices, which still are less performant than workstations or laptop computers.

  • Missing ground truth: real-time monitoring of physiological signals can detect and alert early occurrence of diseases like seizure and arrhythmias. However, learning of real-time unlabeled data is difficult and computationally expensive[8]. Robust approaches are required to extract reliable features and patterns from a large amount of unlabeled data[33].


2.4 Necessity of Deep Learning

In summary, biosignals are highly complex and form multi-dimensional data, usually without a reliable ground truth. Linear and non-linear methods have failed to robustly perform their clinical analysis[34]. Similarly, machine learning with shallow architectures is incapable to handle the complexity in an ad-hoc manner. Many of the existing approaches are not effective at discovering the unique properties and patterns of physiological signals for clinical diagnosis[11] or event prediction[7]. This is mainly due to the dynamic and multivariate characteristics of biosignals. The same holds for extreme learning machines (ELMs), which are feedforward networks composed of a small number of hidden layers with a large number of nodes and where the hidden layer parameters are not tuned.

Deep learning attempts to automatically detect the unobservable patterns needed for the analysis from raw data. Multiple (up to several hundred) layers are interconnected to transform the raw level into a higher level of abstract data representation[34] [35]. Deep learning has gained performance in various fields such as computer vision, image understanding, natural language processing, and acoustic speech analysis. Some of the benefits of deep learning approaches include automated feature learning from raw data, noise robustness, multi-task learning, and better optimization with a minimized recognition error[11] [34]. Deep learning is adaptive and capable to handle multi-modal and complex data. Thus, deep learning models may provide tools and interfaces to complex biosignals for better information understanding and clinical decision support.


3 Deep Learning Technology

The technical classification of deep learning architectures follows generative versus discriminative models ([Fig. 1]). Generative models predict and synthesize new partial input data t+1 based on the previous data t by learning a general representation of the data. It is mainly used for the enhancement and prediction of the physiological signals. A discriminative model is capable to represent data even if the input is noisy. Discriminative models are effective for the classification, detection, and recognition ofphysiological signals[6]. The four major network architectures can be characterized as follows:

  • The restricted Boltzmann machine (RBM) is a neural network that consists of binary-valued neurons in the visible layer and Boolean hidden units as other layers. A greedy layer-by-layer feature learning is implemented to compute constant and connection weights in different levels of abstraction (layers)[37]. An unsupervised layer-based pre-training is used for the initialization of all parameters. A simple backpropagation is used to ine-tune and slightly adjust the parameters throughout the network[36].

  • Auto-encoders are unsupervised learning models with an equal number of input and output nodes. It is a generative model with a non-linear approach for feature extraction[11]. Each auto-encoder is composed of individual encoders, which transfer the input into lower dimension space, and corresponding decoders, which reconstruct the input using most discriminative features. To obtain more robust results in representation learning, a variation of auto-encoders is applied such as sparse auto-encoders (SAEs), de-noising auto-encoders (DAEs), contractive auto-encoders (CAEs), and zero-bias auto-encoders (ZAEs)[38].

  • The CNN is one of the most popular deep learning architecture. Inspired by the neurobiological architecture of the visual cortex, a CNN is a hierarchical model that consists of convolutional and subsampling layers. In the convolutional layer, the weights of neuron are coupled, and hence, the network computes a spatial convolution determining the mask coef-icients itself during the training. Hence, CNNs have become very popular in image analysis. For instance, AlexNet is a special deep CNN model used to classify 1.2 million images into 1,000 classes[40].

  • The recurrent neural network (RNN) is a model that has been adapted from simple feed-forward modeling towards processing of sequential data[6]. The RNN is a deep architecture with a high-dimensional hidden state, which receives the input, updates its hidden state information, and makes a prediction at each time step. RNNs are mainly used for the analysis of streamed data[41] [42]. Here, long short-term memory (LSTM) and gated recurrent units (GRU) are frequently used to overcome the vanishing gradient in gradient-based learning methods and back propagation[43].

Zoom Image
Fig. 1 Deep learning methods (RBM = restricted Boltzmann machine, CNN = convolutional neural network, RNN = recurrent neural network).


4 Deep Learning on Biosignals

In this section, we describe the method implemented to select relevant papers, the categories used to classify the papers, biosignals and their applications, and the clustering of papers according to the dimension and types of biosignals. Three clinical applications are particularly highlighted.

4.1 Selection of Papers

In this survey, 437 research papers were reviewed ([Fig. 2]). Existing databases (PubMed, Scopus and ACM) were queried with search terms for title, keywords, and abstract (see [Appendix 1]). Only papers published from January 2010 to Dec 2017 were considered. After duplicates were removed, a total of 382 records were obtained. Based on the title and the abstract of each paper, contributions that did not relate to deep learning (deined as having more than ive hidden layers) or 1-D biosignals were excluded. Based on a full text assessment, work related to EEG, fMRI, MEG as source signal and review papers were excluded. After careful inspection on the architecture of the deep learning models, 35 papers were excluded because the number of hidden layers was less than ive. This process yielded a final collection of 71 research papers.

Zoom Image
Fig. 2 Paper selection process.


4.2 Categories to Classify Papers

Our analysis of the literature identified several criteria to categorize papers and approaches. The most important is the biosignals to which deep learning is applied. Besides ECG and EMG, some papers use a combination of multiple signals as input for the neural network. Moreover, biosignals can be 1-D (single lead) or composed of multi leads. In case of EEG, for instance, the multiple leads are arranged in a spatial matrix, which makes CNNs directly applicable. A 2-D spatial structure can also be generated using 2-D frequency transforms. Therefore, the origin, dimension, and type of biosignals are coded as B for “biosignal”, and denoted B(origin, dimension, type). We use simple numbers to indicate the instances in each of the criteria ([Fig. 3]).

Zoom Image
Fig. 3 Classification of the parameters used for the selection of deep learning models. The dependencies are color coded. Note that A(..x) = N(x..) for all x in {1,2}.

The second category used to distinguish the various approaches is the application domain. When deep learning is applied to a biosignal, it can be used for simple signal enhancement, detection of uncertain patterns (computer-aided detection, CADe), clustering of the signal or parts of the signal, recognition of given patterns (computer-aided diagnostics, CADx), or prediction of future signal alterations or events. We call this the goal of the application. To train the network, data is needed. Such datasets are sometimes quite small (less than 100 records or less than 5 hours of total recording time), medium (up to 1,000 records or 50 hours of recording time), but sometimes relatively large (up to 10,000 records or 500 hours of recording time). They may have a label to indicate the ground truth (GT) or not. Therefore, the application is coded A for “application”, and denoted A(goal, GT-size, GT-type), and again simple numbers are used within the criteria to code the instances ([Fig. 3]).

Finally, the networks that are used for biosignal analysis differ. For instance, the learning type of the network may be supervised or unsupervised. Note that this criterion is strictly correlated with the type of GT data, which can be labeled or unlabeled, respectively. The training of the network can be scheduled offline, online, or in real time. Of course, the topology of the network is another important criterion, and the instances we have chosen here correspond to Section 2.2. Consequently, network categories are denoted N for “network”, and coded N(L-type, L-schedule, topology) ([Fig. 3]).

In summary, three categories of deep learning on biosignals have been identified, each comprised of three criteria. Since the type of GT data is directly linked to the type of network learning, only eight effective criteria remain. In total, 37 different instances are suggested. These instances have been used to code the papers retrieved from the literature review. For example, the paper of Rahhal et al.,[44] on active classification of ECG signals is coded as B(121)A(322)N(212).


4.3 Biosignals and their Application

Supervised learning is used in most applications: N(2..). Supervised learning is the ability of deep learning models to learn data with annotation. However, annotation (labeling) of the physiological signals requires expert knowledge and is often expensive and time-consuming. Unsupervised learning, coded as N(1..), is sometimes ineffective for multivariate inputs and ambulatory monitoring due to long-term time dependencies[6]. Based on the analysis of physiological signals, selection of generative and discriminative network topology is considered. Discriminative models are coded as A(2..) - A(4..). They are effective for the detection, clustering, and diagnostics of physiological signals. A discriminative model is capable of modeling the noisy data for training. Generative models are mainly used for the enhancement and prediction of the physiological signals. Models coded as N(..4) can predict and synthesize new partial input data at time t+1 based on the previous data at time t by learning the data. Generative models are also more robust to analyze noisy data. The characteristics of physiological signals play a vital role in the selection of deep learning models. If the physiological signal has a spatiotemporal structure, the model selected must incorporate both spatial and temporal coherence of the physiological signals using regularization. A CNN is considered as a good choice to handle both temporal and spatial data. However, selection criterions for the deep learning model should be more application-oriented and robust for input data types.

[Table 2] shows the codes for all the 71 papers that have been considered in this survey. There are only a few duplicates showing the diversity of research as well as demonstrating that our code is suitable for different approaches. Counting the total number of topology yields 15, 12, 34, and 3 for RBMs, auto-encoders, CNNs, and RNNs, respectively. In addition, there are 7 “Other” types of network topology, where the authors have combined several deep learning networks to improve performance.

Table 2

Coding schemes for the 71 papers selected








[19] [45] [46] [47] [48] [49]


[50] [51] [52] [53]




[58] [59] [60]








































[98] [99] [100]






































[96] [108] [109]






[84] [85]


















[104] [105]


4.4 Clustering by Application and Biosignals

[Table 3] visualizes six clusters of current research with respect to the goal of application and the biosignal considered in the paper. A more comprehensive table with respect to the network architecture and the optimizers and regularizers used is given in [Appendix 2].

Table 3

Deep learning on biosignals with respect to the goal of the application and the origin of the biosignal (colors indicate the six clusters).








Multiple Sources






















Enhancement A(1..)

[50] [51] [52] [53]




Detection A(2..)

[61] [12] [56]

[45] [46] [19] [47] [48] [49] [62]

[72] [74] [66] [68] [77] [94]






[57] [107]

[76] 1 [79] 2

Clustering A(3..)

[44] [64]

[70] [69] [71]


[58] [60] [59]



[84] [85]


[67] [65]

[98] [99] [100]

[102] [55]

Diagnostics A(4..)

[30] [83]

[95] [97]

[86] [92] [89]

[23] 3


[104] [105]

Prediction A(5..)

[87] [90]


[96] 4 [108] 4 [109] 4


1B(512);2B(612); 3B(711); 4B(812)

4.4.1 Multi-lead ECG: B(12.)

Generally, multi-lead ECG signals are highly complex and have a large volume in size. Also, the multi-lead structure may support CNN and RNN architectures to apply directly.

Deep Learning Applied to Multi-lead ECG RawData: B(121)

We have identified nine papers that apply deep learning methods to enhancement[50] [51] [52] [53], detection[12] [56] [61], and clustering[44] [64]. This is due to the capability of deep learning methods to extract strong features and better data representation. All approaches used raw data as input, but different deep learning methods for processing. For instance, Pourbbabee et al.,[61] and Zhou et al.,[12] suggested applying CNN and LSTM on biosignals, while others used different types of auto-encoders for the analysis. Pourbbabee et al., divided 30 min signals into six segments of 5 min, which were used as input to the CNN. In the case of Zhou et al., individual heartbeats were extracted from the ECG and fed to the LSTM model.

All papers, except one[61], used the publicly available standard ECG databases to evaluate the performance of their proposed methods. For example, Zhou et al., applied lead CNN and LSTM on the Massachusetts Institute of Technology, Beth Israel Hospital (MIT-BIH) database[110] and they used the trained model for the validation of the Chinese cardiovascular disease database (CCDD)[111]. Similarly, Rahhal et al.,[44] validated their method on two other databases, namely the St. Petersburg Institute of Cardiological Technics (INCART) database[112] and the supraventricular arrhythmia database (SVDB)[113]. Liu et al.,[56] used a multi-lead CNN to detect myocardial infarction on ECG signals obtained from the German Physikalisch Technische Bundesanstalt (PTB) database[114]. [Appendix 2] provides a more comprehensive list of the databases used in the papers for the evaluation of their methods as well as the results and performances obtained, as reported by the authors.

It is observed that CNN and LSTM are used for the clustering task while auto-encoders are used for the application of signal enhancement and reconstruction. Besides, Gogna et al.,[50] used the Split-Bregman optimization technique to overcome the issues of error backpropagation for unambiguous auto-encoders learning, while the other authors have applied the normal auto-encoder model. Jin and Dong[64] developed and implemented a multi-lead CNN model for the classification of normal/abnormal signals. Further, the rule inference approach is applied to improve the performance of the training model. In general, the results obtained by the nine papers were above 90% accuracy. Zhou et al., obtained an average accuracy of 99.41% and 98.03% on MIT-BIH and CCDD, respectively, by combining CNN and LSTM as base classifier in the proposed network.

Deep Learning Applied to Features from Multi-lead ECG: B(122)

A cluster of seven papers applied features derived from multi-lead ECG to deep learning networks. Such features may be RR interval[70], the QRS complex[69], or morphological and temporal features[83]. In contrast to raw data-based approaches, enhancement is not a target application but it improved detection, clustering, and diagnostics. Features extracted from biosignals are used as a key element for the identification of boundary conditions to separate classes in the multi-dimensional feature space. Five applications were evaluated on publicly available databases[30] [69] [70] [71] [83], while the others recorded private data[87] [90]. All papers applied to clustering used deep belief networks for processing[66] [75] [76]. The results reported by the papers were all above 85%. In particular, Majumdar and Ward[69] showed an average accuracy of 97.0% for classification. Zhang et al.,[30] considered eight diverse ECG databases to evaluate the performance of their system and obtained 93.5%, 96.5%, and 90.5% for overall human identification task, normal subject identification, and abnormal subject identification, respectively.


4.4.2 Single-lead Biosignals: B(x1.)

This type of signal is formed from a single sampled measurement, usually derived as an electrical potential, and usually of pseudo-periodic nature, disregarding whether the biosignal is cardiac, respiration, or blood pressure. Therefore, our analysis is pooled with x = {1, 2, 3, 4, 5, 6, 7, 8, 9} where x represents different physiological signals as shown in [Table 3].

Deep Learning Applied to Single-lead Bio-signal Raw Data: B(x11)

In this group of 16 papers[19] [23] [45] [46] [47] [48] [49] [58] [59] [60] [62] [65] [67] [73] [78] [81], single-lead raw data is fed as input to the deep learning methods for applications that focus on detection; for instance, the paper of Acharya et al.,[46] aimed at automated detection of abnormalities in ECG signals. The approach of Kiranyaz et al.,[19] focused on a personalized monitoring system for arrhythmias. The high number of research papers may come from the fact that single-channel biosignals are simple, easy to acquire, and effective in decision making system. Out of the 16 papers, 12 used 1-D CNN models[19] [45] [46] [47] [48] [49] [58] [59] [60] [62] [65] [81]; two were based on auto-encoders[23] [73]; one used an RNN model[67]; and one used an RBM model[78].

Out of 10 papers using single lead ECG inputs, seven[19] [47] [49] [58] [59] [60] [62], two[45] [46] , and one[114] were evaluated on the MIT-BIH, the German Physikalisch Technische Bundesanstalt (PTB), and the Fantasia database, respectively (see [Appendix 2]).

In general, the overall results were above 94% accuracy. In particular, Kiranyaz et al.,[60] obtained an average accuracy of 99.00% on the MIT-BIH database and Lei et al.,[45] reported 99.33% accuracy on the PTB database[114]. In an interesting study, Schozel and Dominik[73] used an RNN model to study PCG signals and they generated an artificial ECG signals from PCG. Papers describing works from biosignals other than ECG obtained an overall accuracy above 79%. For instance, Zhang et al.,[23] reported an average accuracy of 80.22% for emotion recognition using respiration signal. Similarly, Ryu et al.,[65] obtained an overall accuracy of 79.55% for the classification of heart sounds using PCG signals, which may be due to the unbalanced datasets used to validate the proposed model. In another study, Atzori et al.,[81] proposed to classify the movements of a prosthetic hand using a CNN model. EMG signals from amputee subjects were recorded and validated using the proposed models. They obtained the lowest accuracy of38.09% for the amputee's datasets.

Deep Learning Applied to Features from Single-lead Biosignals: B(x12)

The number of scientific papers with features as input to the deep learning method is high. In this cluster, features were extracted from the single-lead biosignals: 21 papers have been retrieved on this topic covering applications such as enhancement[63] [106], detection[54] [66] [68] [72] [74] [76] [77] [79] [82] [94], clustering[80] [84] [85], diagnostics[86] [89] [92], and prediction[96] [108] [109]. Examples of features were RR interval[72] [74], mean absolute value of signal[85], or frequency spectral coeficients[92]. Nine out of 21 papers used unsupervised learning and deep belief networks to perform detection[54] [66] [68] [72] [82] [94], clustering[84] [85], diagnostics[86], or prediction[96] [108] [109], while the others used supervised learning[63] [74] [76] [79] [80] [89] [92].

In 10 papers, the authors conducted experiments on public databases[66] [68] [76] [77] [80] [84] [85] [86] [94] [96]. In two papers, signals were simulated[63] [79]. In four papers, signals were acquired from the 2016 Physionet Challenge database[116]. Only four papers presented an evaluation using the authors' own data[54] [74] [89] [92].

The results were diverse. In particular, Mohebi et al.,[79] generated continuous glucose monitoring (CGM) signals to identify type 2 diabetes patients using a CNN model. Lee and Chang[96] combined a deep belief network with bootstrapping to estimate blood pressure; bootstrapping enhanced the performance by about 10%. In general, the results were above 83% in accuracy for the various applications. Specifically, Jindal et al.,[54] obtained an accuracy of 96.1% for biometric identification using PPG signals. Some papers with inputs other than ECG signals obtained an accuracy of 71.39%, e.g., for congestive heart failure detection[72].

In an interesting study, Lee and Chang[108] [109] developed an ensemble of deep belief network-deep neural network (DBN-DNN) approach for the estimation of oscillometric blood pressure with low data samples. Further, artificial features were synthesized to overcome the issue of low data samples for training deep learning models. To improve the prediction rate of cardiovascular events, Kim et al.,[106] used a DBN model on blood pressure signals to reduce artifacts. Similarly, Zhenjie et al.,[77] proposed a multiscale CNN model for the detection of atrial fibrillation from ECG signals and achieved 98.18% accuracy on the MIT-BIH database[110].

Deep Learning Applied to 2D-transformed Biosignals: B(x13)

Ten papers describe utilizing a 2-D spectrum as input to the deep learning system[57] [88] [91] [93] [95] [97] [98] [99] [100] [107]. It is reported that 2-D spectra of transformed biosignals describe the spatial and temporal information of the signals. Also, it allows the direct use of CNNs. All the papers used supervised learning to train a CNN and the applications are detection[57] [88] [93] [107], classification[91] [98] [99] [100], and diagnostics[95] [97].

Out of the 10 papers, five used publicly available databases to evaluate the performance of their systems[91] [97] [98] [99] [100] and the other ive used experimental protocols for signal acquisition[57] [88] [93] [95] [107] (see [Appendix 2]). In general, the results obtained were above 90% of accuracy. In particular, Dominguez-Morales et al.,[98] obtained 97% of accuracy using a modified version of the AlexNet model[40]. Interestingly, the authors converted the PCG signal obtained from a neuromorphic auditory sensor into address-event representations, which were later plotted as 2-D sonogram for deep learning. In order to improve the prediction ability of their deep learning model, Xia et al.,[88] used a hybrid model combining CNN with RNN.

Most of the papers in this cluster utilized supervised learning. Out of eight papers, six papers used publicly available databases to validate the model proposed. Cote-Allard et al.,[95] developed and tested a transfer-learning-based hand gesture learning system. A trained model, obtained from other datasets, was used to detect and improve different hand gestures. Xia et al.,[93] used 2-D-trans-formed ECG signals as input for the detection of pathological conditions with an average accuracy of 98.63%.


4.4.3 Multiple Source Biosignals

Seven papers proposed approaches to manage multiple source biosignals. Except two papers[104] [105], all used the raw data to feed the deep learning system for signal enhancement[101], clustering[55] [102], diagnostics[103] [104] [105], and prediction[10]. For instance, Bengio et al.,[10] presented a major breakthrough applying a CNN for affect classification from raw physiological signals. The accuracy obtained by all papers was above 85%. The performance of the systems was dependent on the combination of the biosignals used for the analysis. For example, Zhang et al.,[102] obtained 98.49% of accuracy for the automated classification of sleep stages using sparse DBNs. The authors used EMG, EEG, and ECG collectively to investigate the impact of different biosignals on the algorithm performance. In[10], the authors used a CNN model on spe-cific signals from the database for emotion analysis using physiological signals (DEAP)[13], namely skin conductance and blood volume pulse. Out of the seven papers, three used a CNN model[10] [55] [103], two were based on an ensemble approach[104] [105], one used a DBN model[102], and another used a DNN model[101]. Belo et al.,[101] proposed an interesting approach for generalized biosignals learning and synthesis using DNN models. Chow[103] proposed an online biometric recognition system using ECG and EDA signals. In this approach, physiological signals are acquired and trained in an online mode using pre-trained networks. Three papers[10] [104] [105] have used the publicly available DEAP database for the validation (see [Appendix 2]).


4.5 Clinical Application

Although most of the papers are focused on the methodology of biosignal analysis using deep learning, there are some examples that present a real-world application of such methods. We have selected three of them as examples.

HeartID[30] used a multiresolution CNN for ECG-based biometric identification of humans in smart health applications. First, the ECG stream was blindly split into segments of two seconds disregarding the R-wave positions. Then, the segments were transformed to the Wavelet domain to reveal more detailed time and frequency characteristics in multiple resolutions. An auto-correlation was performed to each wavelet to remove the blind-segmentation-based phase shift. Despite using wavelet-transformed output (image) as input to the CNN approach, a 1-D-CNN was applied to each individual wavelet component to clearly learn the local patterns. Furthermore, each wavelet component was considered as a 1-D-image (feature vector) and was fed as input into the 1-D-CNN to learn the intrinsic pattern of the individuals. To evaluate the system, several publicly available databases were re-sampled to 360 Hz yielding 100 normal and 120 abnormal datasets (arrhythmia, malignant ventricular ectopy, ST depression). The correct identification rate yielded 96.5% and 93.5%, respectively. The system could be generalized to other quasi-periodic biosignals such as PPG, BCG, or a multi-modal combination of them.

Another remarkable system focused on personalized event prediction to detect the occurrences of arrhythmias or abnormal beats in the ECG signal[19]. Facing the challenge that an ECG of a healthy person without any history of cardiac arrhythmias should exhibit no abnormal beats, the authors developed a set of 464 ilters performing regularized least-squares optimization to synthesize individual pathologic patterns out of healthy ECG cycles. For the early detection of cardiac arrhythmias, a personalized training dataset was created synthetically over the subject's average normal beat. A one-dimensional (1-D) CNN dedicated to that specific subject was trained and used to monitor streamed real-time recordings. The system's evaluation was based on the MIT-BIH database[110], where the first 5 minutes of recordings were used to form the average normal beats. Overall, 34 patient records with a total of 63,341 beats were selected for the evaluation. The probability of detecting at least one among the first three abnormal beats was 99.4%, supporting a meaningful clinical application of the system.

Generally, deep learning is considered to be computationally expensive. Pourbabaee et al.,[61] developed a computationally efficient screening system for patients with paroxysmal atrial ibrillation (PAF). The authors used a large volume of ECG time-series data and a deep CNN to identify unique features patterns for screening patients. The 30-min-ute raw ECG signals were first divided into six equal segments. Later, these short ECG segments were used as input to the CNN to obtain robust deep features which were then passed through standard classifiers namely end-to-end CNN, k-nearest neighbors (kNN), support vector machine (SVM), and multilayer perceptron. To evaluate the performance of the system, signals were taken from the PAF prediction challenge database[117]. The authors showed that the CNN-learned features can be classified conventionally, which is computationally inexpensive and eficient. The CNN network did not require any prior domain knowledge for learning the features and, therefore, the approach could potentially be utilized for other biosignals. The network yielded an accuracy of 93.60% and 92.96% using an end-to-end CNN and a Gaussian kernel SVM, respectively.


5 Discussion and Conclusion

A comprehensive literature research was performed to identify works combining biosignals of any nature (except EEG due to its 2-D spatial structure) with deep learning networks. The examined bibliographic databases (PubMed, Scopus, and ACM) provided relevant papers including works publisehd in IEEE or SPIE conference proceedings. We have developed a scheme to differentiate existing research by biosignals, applications, and networks that yields 30 instances from seven independent categories. Such a graph may yield more than 25,000 different codes, but due to internal dependencies (network topology and application goal), that figure reduces to 6,480. As it is shown in [Table 2], research is diverse, but there are two codes occurring multiple (more than three) times, B(111) A(212)N(213) and B(121)A(112)n(212). The code B(111)A(212)N(213) is found to be used six times on single lead ECG signals to perform pattern detection. Similarly, B(121)A(112)N(212) is used four times for multi-lead ECG targeting signal enhancement for better clinical decision supports.

The overall accuracy obtained by methods using raw ECG signals is higher than those with features as input. This may be due to the ability of deep learning to capture features which optimally represent the biosignal specific to the problem. The overall performance of the papers that utilized the authors’ own signal acquisition protocol was lower than those experimenting with publicly available databases. This may be due to the presence of noise and artifacts in the recordings. To date, most work has been performed on relatively small datasets. Five papers used 3,126 recordings from the 2016 Physionet Challenge database for heart sound analysis, and one paper mentioned utilizing 1,075 records[92]. This inding is in line with Deserno and Marx[7], who reported the need for more realistic reference databases.

Most of the papers (n = 30) use features which are pre-extracted from the biosignals as input to the deep learning models. This may result from the assumption that the performance of machine learning is improved when using pre-computed features as compared to noisy raw data since a feature vector is much smaller than the signal itself and hence this reduces the required volume of training data. However, the basic idea of deep learning is to automatically identify patterns from large volumes of biosignals without human involvement. On the other hand, only 11 papers (less than 15%) applied a 2-D transform to take advantage of the CNN matrix-like architecture. This igure was much lower than anticipated, but may be the result of the fact that 1-D CNN performs well.

In conclusion, after having shown their impact on image and video analysis, deep learning approaches have become successfully applied to the analysis of 1-D biosignals. The number of published work increases continuously, and the results are promising. However, there is a large variety of signals, applications, and reference data, which makes an objective comparison of the published approaches difficult.

In future, standardization of network topology and parameters is expected. Reference data recorded with novel devices is required, which represents not only pathology but also normal recordings from healthy subjects, as well as outliers, drop out sequences, and noise.


Supplementary Material

Correspondence to

Thomas M. Deserno

Zoom Image
Fig. 1 Deep learning methods (RBM = restricted Boltzmann machine, CNN = convolutional neural network, RNN = recurrent neural network).
Zoom Image
Fig. 2 Paper selection process.
Zoom Image
Fig. 3 Classification of the parameters used for the selection of deep learning models. The dependencies are color coded. Note that A(..x) = N(x..) for all x in {1,2}.