What future works have the authors mentioned in the paper "A hybrid model combining neural networks and decision tree for comprehension detection" ?

CONCLUSIONS AND FUTURE WORK The authors propose that future work should explore several options to simplify the decision trees and their representation. Second, investigation of the potential to reduce the number of input channels – through empirical experiment, by identifying the potentially lowest contributing channels through calculating information content and by grouping channels.

What have the authors stated for future works in "A hybrid model combining neural networks and decision tree for comprehension detection" ?

CONCLUSIONS AND FUTURE WORK The authors propose that future work should explore several options to simplify the decision trees and their representation. Second, investigation of the potential to reduce the number of input channels – through empirical experiment, by identifying the potentially lowest contributing channels through calculating information content and by grouping channels.

How do the authors reduce the rule sets extracted from the more efficient trees?

by using fuzzy rule extraction or random forest techniques to reduce the rule sets extracted from the more efficient trees to a more tractable size.

What is the way to clean up the data?

pre-preprocessing the data to cleanse it, particularly removing outliers, noise and conflicting records - all of which might be better handled by the BPANN than DTs.

(Open Access) A hybrid model combining neural networks and decision tree for comprehension detection. (2018) | James D. OrShea

Q: What was the initial range used for the pruning experiments?

The initial ranges used for the experiments were, for CI: 0.25,0.2, 0.15, 0.1, 0.05, and for MNO: 2, 5, 10, 15, 20.V. RESULTSTable

Crockett, KA ORCID logoORCID: https://orcid.org/0000-0003-1941-6201,

O’Shea, James ORCID logoORCID: https://orcid.org/0000-0001-5645-2370,

Khan, Wasiq ORCID logoORCID: https://orcid.org/0000-0002-7511-3873

and Bandar, Zuhair (2018) A hybrid model combining neural networks and

decision tree for comprehension detection. In: 2018 International Joint Con-

ference on Neural Networks (IJCNN), 08 July 2018 - 13 July 2018, Rio de

Janeiro, Brazil.

Downloaded from:

https://e-space.mmu.ac.uk/624526/

Publisher: IEEE

DOI: https://doi.org/10.1109/IJCNN.2018.8489621

Please cite the published version

https://e-space.mmu.ac.uk

A hybrid model combining neural networks and

decision tree for comprehension detection.

James O’Shea

, Keeley Crockett

, Wasiq Khan

, Zuhair Bandar

School of Computing, Mathematics and Digital Technology

Manchester Metropolitan University

Chester Street, Manchester, M1 5GD, UK

Silent Talker Ltd, Manchester, UK

J.D.OShea@mmu.ac.uk

Abstract— The Artificial Neural Network is generally

considered to be an effective classifier, but also a “Black Box”

component whose internal behavior cannot be understood by

human users. This lack of transparency forms a barrier to

acceptance in high-stakes applications by the general public. This

paper investigates the use of a hybrid model comprising multiple

artificial neural networks with a final C4.5 decision tree classifier

to investigate the potential of explaining the classification

decision through production rules. Two large datasets collected

from comprehension studies are used to investigate the value of

the C4.5 decision tree as the overall comprehension classifier in

terms of accuracy and decision transparency. Empirical trials

show that higher accuracies are achieved through using a

decision tree classifier, but the significant tree size questions the

rule transparency to a human.

Keywords—knowledge rule extraction, artificial neural

networks, decision trees, backpropagation, comprehension,

FATHOM, Silent Talker, non-verbal behavior.

I. INTRODUCTION

Non-Verbal Behaviour (NVB) was first studied systematically

by Charles Darwin and it has become a well-established part of

sciences such as biology and psychology. NVB consists of all

of the signs and signals - visual, audio, tactile and chemical

used by human beings to express themselves apart from speech

and manual sign language. It has been postulated that NVB

features are indicators of internal mental states, in particular

that they can be used to detect deception during interviews [1].

The first system to classify deceptive behaviour automatically,

Silent Talker (ST), used Artificial Neural Networks [2].

The Silent Talker architecture is highly flexible, and has been

adapted to monitor human comprehension in clinical trials

using non-verbal behaviour, employing ANN classifiers [3].

This version is known as FATHOM and currently work is

underway to incorporate FATHOM in an intelligent tutoring

system to provide round-the-clock support in the form of

learner-adaptive online teaching and learning tutorials. For

both FATHOM and ST there has been great interest in how the

system works i.e. which non-verbal indicators are actually

conveying the information to perform the classification. This is

particularly true for Lie Detection where interrogators are

looking for techniques they can apply during questioning and

suspects are looking for countermeasures they can use to avoid

being detected, for example, the well-known myth that looking

up and to the right indicates lying. Unfortunately, although

ANNs are powerful and versatile components in the AI

toolbox, they are also black boxes with no ready explanations

of how they achieve their ends and this has been a concern for

decades [4].

There are many other fields than education in which ANNs

may make high-stakes decisions and some progress has been

made in extracting rules from ANNs, although the degree to

which solutions to reasonably complex problems could be

understood by a non-AI specialist remains debatable. These

include classifying incipient faults in a power transformer [5],

hydrological modelling [6], Credit-Risk Evaluation [7] and

software cost estimation [8]. Some progress has been made in

extracting rules from recurrent neural networks by

transforming them to finite state machines [9], and [10] has

attempted to unify various neuro-fuzzy rule approaches for

ruled generation from recurrent and feedforward neural

networks in a single soft computing framework. Nevertheless,

in analysing a study of using neural networks to predict

academic performance of college students one year in advance,

Schneider et al. [11] observed that the basic problems of

communicating how they reach their conclusions in

meaningful terms has yet to be solved. They highlighted the

problem of explaining how a combination of currently high

subject performances could lead to an anticipated decrease in

the student’s achievement.

Decision trees [12] are highly effective for classification

tasks. They are also considered inherently transparent in

explaining how they reach their conclusions and may be

expressed in the form of production rules, which are generated,

by learning and reasoning from feature-based examples. Many

studies have been conducted to compare decision trees with

neural networks – a more recent study of multiple classifiers

can be found in Delgado et al. [13]. In general, ANNs take

longer to train than decision trees due to the large number of

iterations required to ensure training reaches its full potential

[14]. Classification accuracy is largely dependent on the

dataset, but the transparent nature of decision trees gives

insight into the relationships between features [14]. In such a

domain as the analysis of NVB for comprehension detection,

decisions trees would provide an insight into key behaviours

and their interactions.

In the FATHOM architecture to date, classification of

comprehension / non-comprehension has been performed by a

single, final back propagation artificial neural network

(BPANN), preceded by layers of BPANNs that process

individual features. It is this final stage in which an

intervention should be possible to explain how these features

indicate comprehension / non-comprehension. Therefore, the

research questions addressed in the work presented in this

paper are:

1. Can the final ANN classifier be replaced by a decision tree

without loss of performance?

2. Can the decision tree be converted into comprehensible

production rules?

For a comprehensible rule set be possible, there must be a

limited number of rules for the human user to interpret and

these are proportional to the number of nodes in the tree.

Consequently, the primary interest in answering question 2 is

whether or not the tree has a manageable number of nodes.

To answer these research questions, two datasets collected

from FATHOM studies have been used. The experimental

study known as “Termites”, reported in [15] was used to

identify whether high and low human comprehension

associated multi-channels of non-verbal behaviour reside

within a video-recorded British (UK-based/English speaking)

sample of participants in a class room environment. The

Termites exploratory study builds upon lessons learned in prior

work [3] where evidence was found that comprehension / non-

comprehension could be detected in an African female

population sample using a BPANN. This second study is

known as HIV Informed Consent.

This paper continues as follows: Section II reviews related

work in non-verbal behaviour and comprehension, and then

describes the FATHOM comprehension monitoring system

that uses BPANNs. Section III describes the comprehension

scenarios from which the two datasets used in this study were

obtained. Section IV and V describe the experimental

methodology and results. Conclusions and recommendations

for future work are presented in section VI.

II. RELATED WORK

A. Non-verbal Behaviour and Comprehnsion

Non-verbal behaviour comprises all of the signals or cues,

which human beings use to communicate, including visual,

audio, tactile and chemical components [16, 17]. During a

spoken dialogue, humans will often transmit non-verbal cues

before the verbal component [17], which can be used to detect

the sender’s state. It has been recognised that the face is a

source of rich information in terms of exhibiting meaningful

non-verbal behaviour. Little work has been done in the

automatic detection of classification of non-verbal behaviour.

Traditional methods employed human judges to code each

channel [18, 19]. However, each judge needs to be trained and

will provide a subjective opinion on the behaviour being

delivered by a particular channel. The process is time

consuming and an impossible task for a human to monitor

more than a limited number of channels accurately.

Two recent research strategies for acquiring non-verbal

behavioural cues have attracted attention in the literature; these

are Facial Microexpressions [1] and using the Microsoft Kinect

computer vision algorithm [20]. Micro-expressions are said to

be a small “universal” set of expressions of extreme emotion:

disgust, anger, fear, sadness, happiness, surprise, and contempt,

and a formalised method of encoding them was defined by

Ekman. The weaknesses of this technique are: its results are

largely based on highly artificial “posed” images using actors

or students provided with highly specific instructions [21,22]

or even training in how to produce facial actions [23,24], low

numbers of detectable Ekman micro-expressions in

spontaneous interviews [25] and a low Classification

Accuracy (CA) for those micro-expressions actually found

[26].

The Microsoft Kinect is primarily aimed at observing

whole body gestures in commercial video game applications.

However, there has been some interest in adapting it for NVB

research. For example, facial expressions have been

investigated as indicators of happiness, anger, sadness and

surprise that are integrated with the head pose changing

information to conceive the human interaction with 3D sensing

technology [27]. Although it should be noted that the

experimental results show emotional and head position change

instead of discrete level accuracy in terms of emotional

classification for the four aforementioned emotions. Likewise,

it was tested on a limited participants (i.e. 20) as well as

insufficient facial channels (i.e. 12). Typically, psychological

experiments do not provide any methodology for applying

these population sample differences to classify particular

individuals.

FATHOM (described in Section II, C) is distinguished

from these two techniques in three respects. Firstly, it uses

large numbers of features at a much finer level of granularity

than body gestures or facial expressions. Secondly, the domain

it operates in, human comprehension has not been a previous

subject of AI research. Thirdly, it does classify an individual

person’s state of comprehension / non-comprehension based

the non-verbal behaviour. Fathom does not rely on high frame-

rate cameras or constrained recording environments that

facilitate the setup of the technology, nor does it depend on

specialised hardware whose future availability may be

dependent on market forces (such as the game-oriented Kinect)

– making it suited for everyday classroom use.

B. Non-Comprehension

Non-comprehension is regarded as “a state of knowledge that

ranges from uncertainty to complete lack of understanding of

the materials under discussion” [28], i.e. an absence of

comprehension. The vast majority of research on

comprehension concerns reading and the understanding of

written text, initially by identifying the main ideas in the text

[28, 29]. A further elaboration is the view that successful

comprehension depends on the construction of a coherent

representation of text in memory [30]. Despite the traditional

bias towards reading texts, there has been interest for some

time in comprehending audio and video materials in language

teaching [31] and the informed consent process [32]. At a

more abstract level, the comprehension of metaphors, requires

thinking beyond the literal meaning in order to understand the

figurative meaning of the sentence [33] – yet metaphors and

similes are frequently used by good teachers to convey

complex ideas. In the completely independent field of

advertising, a controlled degree of cognitive complexity is

considered desirable, where confronting an audience with a

cognitive challenge generates an appreciative payoff if they

can solve the challenge [34]. So the non-comprehension state

may be characterized as an inability to extract and characterize

the salient elements of information received, an inability to

model such information in a more abstract form or an inability

to generalize from a specific meaning to more abstract

thoughts about such a communication.

C. FATHOM

FATHOM utilises a bank of BPANN’s to capture, monitor

and detect multiple channels of human non-verbal behaviour

continuously. FATHOM has been successfully shown to detect

non-verbal behaviour associated with comprehension in two

studies [3] [15].

Input to FATHOM is currently offline through recorded

videos, which are streamed into FATHOM where a series of

BPANN facial object locators, identify the location in a video

frame of key visual features such as the eyes. For each non-

verbal behavioural feature identified from a specific visual

feature, the BPANN facial object pattern detectors identify its

state i.e. the left eye is half-open. The NVBs identified are then

coded into individual channels and group channels i.e. all

channels associated with eye behaviour.

States are typically collated over a time interval, e.g. 3

seconds grouped into one vector for Silent Talker– but this can

be varied depending on the problem domain and FATHOM

uses a 1-second interval. Classification features (patterns) are

extracted from aggregated video-streamed frames over the time

interval and compiled to form a vector. Each vector is passed

to the final BPANN Comprehension classifier which outputs a

value between +1 and -1, indicating whether the person

exhibits high comprehension (+1) or low comprehension (-1)

during that period of time. If there were insufficient

information in the vector during a specific time slot, FATHOM

would recognise this and categorise the timeslot as

unclassifiable. At the end of a session i.e. a tutorial, the overall

comprehension/non-comprehension classification level is

displayed.

FATHOM simultaneously monitors 40 non-verbal

behavioural channels that include 20 channels capturing facial

features such as blushing and 16 channels capturing eye

movement such as right eye looking left. An overview of the

FATHOM architecture can be seen in Figure 1.

The work presented in this paper investigates the

consequences of replacing the BPANN comprehension

classifier in the FATHOM system by a C4.5 decision tree [12],

to answer questions about their relative performance and

transparency.

Fig.1 FATHOM Architecture

III. COMPREHENSION SCENARIOS

This section outlines the two comprehension scenarios

used to collect the data.

A. Study 1:HIV Informed Consent

The first comprehension study was undertaken in Tanzania in

Africa by FHI-360 [35] in collaboration with the National

Institute for Medical Research (NIMR) [36]. NIMR enlisted

sexually active women aged 18-35, who were native Kiswahili-

speakers. 292 participants took part in the study. Two different

experimental conditions (tasks) were used for data collection:

condition A was designed to be familiar and easy-to-

comprehend (condom use) and condition B was designed to be

unfamiliar and intentionally hard-to-comprehend (the effects of

HIV viral mutation on antiretroviral treatment). Each

participant listened to a short learning task script and then

received the associated ten closed and open-ended questions

with randomisation applied. Task order was also randomised so

that half of the participants completed task A followed by task

B and vice versa.

B. Study 2: Termites

Prior to the study a short learning topic was selected, which

was a factual digital video on Termites with a total duration of

8 minutes 40 seconds. The Termite video was targeted at the

general public with no age restriction and covered: functional

architectural aspects of the termite mounds, roles within the

social structure of a termite colony and locations where termite

colonies thrive. Two experts (Academic Professors in the field)

on the subject area were recruited to develop ten difficult

(hard) questions and ten easy questions related to the video

content. The experts agreed both the question difficulty levels

and the contents of the answer that the participants should

provide. The experts were required to devise five open

questions and closed questions within each set of hard and easy

questions. At the same time, the experts noted down the correct

answer(s) for each question, which were later incorporated into

a scoring scheme.

Forty participants were selected to participate in the study,

from academic and technical staff at the Manchester

Metropolitan University (MMU) in the UK. The sample was

composed of 20 males and 20 females. The males had a mean

age of 41 years old (SD = 14 years) and the females had a

mean age of 39 years old (SD = 14 years). Each participant was

invited to engage individually in a short learning task, which

was comprised of watching a short video on Termites and then

answering a small set of associated assessment questions whilst

being video recorded.

IV. EXPERIMENTAL METHODLOGY

The experimental methodology was to take the pair of

datasets outlined in Section III and use them to train and

evaluate C4.5 decision trees to replace the final stage BPANN

classifier.

A. Using a Back Propagation ANN as the final classifier

For each study, FATHOM’s object locators and pattern

detectors were used to extract and collate the non-verbal

vector-based dataset for the purpose of training the final

BPANN classifier. For both studies, HIV Informed Consent

and Termites, each vector in the final dataset covered a 1-

second time period and represented the state changes for the

compiled non-verbal channels over the period. Each channel

was normalised in the range +1 to -1. The last attribute in each

vector was the desired classification, with discrete values of +1

for comprehension and -1 for non-comprehension. The

following training parameters (determined from previous

exploratory cross-validation sessions) were used to train the

single hidden layer neural network in the Fathom training

application:

 Topology: 40:20:1

 Accept value: 1.0 (output >= 0.0 equals comprehension

AND output <0.0 equals non-comprehension)

 Maximum epochs: 10,000

 Checking epochs: 250, i.e. at every 250th epoch the total

Classification accuracy (CA) was checked and if there was

no improvement training was terminated.

 Learning rate (ƞ) was set at 0.005.

 Weight initialisation: automatic range (0±1/sqrt(fan-in))

where fan-in represents the number of inputs entering the

neuron.

 Cross-validation: 10-folds

For study 1, eighty randomly selected participant videos (

from the 292 obtained in the study) comprised the HIV

Informed Consent dataset containing 71,787 vectors with

63.5% comprehension and 36.5% non-comprehension. For

study 2, the forty participant videos yielded 16,951

comprehension vectors and 23,857 non-comprehension

vectors. The study 2 Termites dataset was composed of 40,808

vectors with 41.5% in the comprehension class.

B. Using a C4.5 Decision Tree as the final classifier

In order to use a decision tree as a comprehension

classifier the final, BPANN, comprehension classifier

shown in Figure 1 was replaced by the C4.5 decision tree

algorithm.

1) The experiment consisted of a series of trials using

different degrees of pruning to find the optimal C4.5

decision trees and determine the extent to which they

could be pruned. The Weka implementation of C4.5,

known as J48 was used. This was achieved by

establishing a baseline decision tree for each scenario

setting the pruning parameters to Confidence Interval

(CI) = 0.25 and minimum number of objects = 2 cases

per leaf (i.e. the default settings)

2) This was followed by fixing the minimum number of

objects (MNO) at 2 and conducting a series of trials over

a range of confidence interval values to determine which

provides the greatest improvement in CA over the

baseline tree performance.

3) Then the complementary process was performed, fixing

the CI at 0.25 and conducting a series of trials over a

range of values of MNO.

4) Finally, further experiments were performed varying

confidence interval and MNO independently, to find the

most severely pruned tree for each dataset, which, was

not significantly worse than the baseline in terms of CA.

The initial ranges used for the experiments were, for CI: 0.25,

0.2, 0.15, 0.1, 0.05, and for MNO: 2, 5, 10, 15, 20.

V. RESULTS

A. BPANN Comprehension Classifier

Table I shows the overall best BPANN Classifiers for both

studies. Comprehension (C%) and Non-comprehension (NC%)

are the percentages of comprehension and non-comprehension

vectors, respectively, which were classified correctly. Overall

% is the total normalised percentage of comprehension and

non-comprehension vectors classified correctly.

TABLE I: BPANN RESULTS

NC%

Overall CA

Study 1: HIV

Informed Consent

88.08

86.79

87.44

Study 2: Termites

72.77

84.09

78.43

B. C4.5 Comprehension Classifier

Table II shows the results of varying the Confidence Interval

used for pruning in decision tree construction (Pruning CI).

A hybrid model combining neural networks and decision tree for comprehension detection.

Figures

Citations

The politics of deceptive borders: ‘biomarkers of deceit’ and the case of iBorderCtrl

The association between 5-HTTLPR and spontaneous facial mimicry: An investigation using the Facial Action Coding System (FACS)

Data mining for assessing the credit risk of local government units in Croatia

Reconciling Adapted Psychological Profiling with the New European Data Protection Legislation

The politics of deceptive borders: 'biomarkers of deceit' and the case of iBorderCtrl

References

Comprehension: A Paradigm for Cognition

Nonverbal communication in human interaction

The Career of Metaphor.

Bosphorus Database for 3D Face Analysis

Trees vs Neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption

Related Papers (5)

Text Categorization Using Neural Networks Initialized with Decision Trees

Improving accuracy of intention-based response classification using decision tree.

A connectionist approach to generating oblique decision trees

Noisy Hangul character recognition with fuzzy tree classifier

A co-evolving decision tree classification method

Frequently Asked Questions (16)

Q1. What are the contributions in "A hybrid model combining neural networks and decision tree for comprehension detection" ?

Q2. What future works have the authors mentioned in the paper "A hybrid model combining neural networks and decision tree for comprehension detection" ?

Q3. What contributions have the authors mentioned in the paper "A hybrid model combining neural networks and decision tree for comprehension detection" ?

Q4. What have the authors stated for future works in "A hybrid model combining neural networks and decision tree for comprehension detection" ?

Q5. What was the purpose of the study?

Q6. What is the definition of non-verbal behaviour?

Q7. How do the authors reduce the rule sets extracted from the more efficient trees?

Q8. How many participants were selected to participate in the study?

Q9. How is the input to FATHOM offline?

Q10. What was the initial range used for the pruning experiments?

Q11. How many vectors were included in the study?

Q12. What was the training method used to train the BPANN classifier?

Q13. What is the way to clean up the data?

Q14. How many participants were invited to participate in the study?

Q15. What is the purpose of the paper?

Q16. What was the purpose of the experiment?