Journal Article•DOI•

Analysis of a large fMRI cohort: Statistical and methodological issues for group analyses.

Bertrand Thirion¹, Philippe Pinel, Sébastien Mériaux, Alexis Roche, Stanislas Dehaene, Jean-Baptiste Poline - Show less +2 more•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Mar 2007-NeuroImage (Neuroimage)-Vol. 35, Iss: 1, pp 105-120

TL;DR: The study shows that inter-subject variability plays a prominent role in the relatively low sensitivity and reliability of group studies and focuses on the notion of reproducibility by bootstrapping.

read less

About: This article is published in NeuroImage.The article was published on 2007-03-01 and is currently open access. It has received 541 citations till now. The article focuses on the topics: Sample size determination & Random effects model.

...read moreread less

Summary (10 min read)

Jump to: [1.1 Motivation] – [1.2 SysML] – [1.3 Research Objectives] – [1.4 Organization of the Dissertation] – [2.1 Software Project Management] – [2.2 Effort Estimation] – [2.2.1 Empirical parametric models] – [2.2.2 Empirical nonparametric models] – [2.2.3 Analogical models] – [2.2.4 Theoretical models] – [2.2.5 Heuristic models] – [2.3 Function Point Analysis] – [2.4 Use Case Points (UCP) Model] – [2.5 Object-Oriented Function Point (OOFP) Model] – [2.6 Class Point (CP) Model] – [2.7 SysML Point Overview] – [3.1 Design Patterns] – [3.1.1 History] – [3.1.2 Uses] – [3.1.3 Classification] – [3.2 The Pattern Point Model] – [3.2.1 The pattern point method] – [3.3 Identification and Classification of User Objects] – [3.4 Evaluation of a Pattern Complexity Level] – [3.5 Estimating the Total Unadjusted Pattern Point] – [3.6 Technical Complexity and Environmental Factor Estimation] – [4. THEORETICAL VALIDATION] – [4.1 Representation of Systems and Modules] – [4.3 Proof] – [5. EMPIRICAL VALIDATION] – [5.1 IBM Lotus Quickr 8.0] – [5.2 Applying the Pattern Point Method to Lotus Quickr] – [5.3 Reverse Engineering Using MaintainJ] – [5.4 The Cross Validation Process] – [5.5 Partitioning the Data Set] – [5.6 OLS Regression Analysis to Derive Effort Prediction Models] – [5.7 Accuracy Evaluation of the Prediction Models] – [6.1 Single Measures and Their Sums] – [6.2 Multivariate OLS Regression] and [7.1 Conclusions]

1.1 Motivation

Traditional software effort estimation techniques rely on analytic equations, statistical data fitting, expert judgment or some combination of the three.
They are still notoriously inaccurate.
There are two bases that make the approach taken in this dissertation feasible and practical.
Secondly, a key characteristic of the object-oriented paradigm is the continual realization and refinement of the same system artifacts/objects at each phase of development or within each development iteration (depending on the chosen project life cycle).
Unobtrusively from the CASE tools in the stages of development preceding implementation, to predict the effort required to further realize, refine and develop these and other system artifacts regardless of the level of realization or refinement of the existing artifacts.

1.2 SysML

This dissertation proposes an effort prediction model – the SysML Point Model - for object-oriented development systems that is based on a common, structured and comprehensive modeling language (OMG SysML), which can be built using the CASE tools from which data can be unobtrusively gathered and applied to prediction equations.
OMG SysML [98] is a specification that defines a general-purpose modeling language for systems engineering applications.
The Block Definition Diagram in SysML defines features of a block and relationships between blocks such as associations, generalizations, and dependencies.
The state machine represents behavior as the state history of an object in terms of its transitions and states.
Use case diagrams include the use case and actors and the associated communications between them.

1.3 Research Objectives

The focus of this dissertation is to define and validate the Pattern Points (PP) method of the SysML Point approach.
Object-oriented analysis (OOA) is concerned with the transformation of software engineering requirements and specifications into a system's object model, which is composed of a population of interacting objects (rather than the functional views or traditional data of systems) [108].
The Pattern Points (PP) model is an empirical parametric estimation method that uses object interactions and the class structure of object-oriented design patterns to predict development effort in the late analysis phase of an object-oriented project.
In software engineering, a design pattern is a common reusable solution to a frequently occurring problem in software design.
The Use Case Point model, which is based on use case counts called use case points, is defined in Carol et al [9].

1.4 Organization of the Dissertation

Following this introductory section, this dissertation is presented in six additional sections.
This includes literature on the subject of software project management and effort estimation.
Design patterns and the Pattern Point model are described in Section 3.
Section 5 explains the project experiment results used to empirically test the research model.

2.1 Software Project Management

Software project management is a major endeavor that helps to realize a successful software project.
Planning is central to software project management; it involves the identification of the activities, milestones, and deliverables produced by a project [39].
Estimates for the software project’s effort and cost are derived according to a documented procedure.
If a project is behind schedule, the manager can increase resources or decrease features.
Process is significant because it lets people efficiently build products by imposing a structure on the progression of the project.

2.2 Effort Estimation

Even though the difficulties of software cost estimation were discussed 30 years ago in “The Mythical Man Month” [42], it is as much a relevant area of research now as it was then.
Effort estimation is critical because of the following [107]: Exploring the practicality of developing or purchasing a new system Determining a price or schedule for a new system.
Planning how to staff a software development project.
Understanding the impact of changing the functions of an existing system.
In spite of the importance, software cost estimates are more often than not imprecise, and there is no indication that the software engineering community is making significant gains in making better predictions.

2.2.1 Empirical parametric models

The most prevalent of estimation models are empirical parametric models.
An alternative empirical parametric methodology is to calibrate a model by estimating values for the parameters (a and b in the case of (2.1)).
COCOMO was first published in 1981 by Barry J. Boehm [43] as a model for estimating effort, cost, and schedule for software projects.
The amount of effort required to produce a software product, the defects remaining in a software product, and time required to create a software product are all estimated using Volume attributes.
Even though function points are a popular measure, they too have drawbacks:.

2.2.2 Empirical nonparametric models

Nonparametric models typically involve the use of artificial intelligence techniques in producing an effort estimate.
In the comparison, the OSR methodology produced a lower mean absolute relative error than both the two parametric models, with the COCOMO model performing least favorably.
Even though they may provide better effort estimates, empirical nonparametric methods such as a neural network are hard to set up and they typically require more work than preparing a statistical regression model [91].
There have been several attempts to use regression and decision trees to estimate aspects of software engineering.
A single organization can provide a large enough data set but it is hard to believe that all the projects would come from the same environment.

2.2.3 Analogical models

Effort estimation by analogy (EBA) is an established method for software effort prediction.
In EBA, the estimated effort of the project under consideration (target project) is a function of the known effort values from analogous historical projects.
The data set used to develop ESTOR is a subset of 10 projects from the Kemerer [74] data set.
It avoids the problems associated both with knowledge elicitation and extracting and codifying the knowledge.
Analogy-based systems only need deal with those problems that actually occur in practice, while generative (i.e., algorithmic) systems must handle all possible problems.

2.2.4 Theoretical models

In comparison to the algorithmic (parametric and non-parametric) and analogical models, there is less research on the development of theoretical models for software effort estimation.
Wang and Yuan [113] have developed a ‘coherent’ theory on the nature of collaborative work and their mathematical models in software engineering.
The FEMSEC model provides a theoretical foundation for software engineering decision optimizations on the optimal labor allocation, the shortest duration determination, and the lowest workload/effort and costs estimation.
The model works from an initial estimate for overall effort and then explores how the actual effort is influenced by the model’s assumptions about the interactions and feedback between project and decisions.
Simulations of project management scenarios can be run to investigate the effects of management policies and decisions.

2.2.5 Heuristic models

Heuristics are rules of thumb, developed through experience that capture knowledge about relationships between attributes of the empirical model.
Since initial software cost estimates are made based on preliminary data, re- estimating is desirable when additional information is available.
The process of re-estimation is made more complicated by such issues, but in order to successfully estimate the total effort or time to complete successfully, effort estimation models need to incorporate these measures.
Models that are more difficult to develop and apply are typically based on a large number of variables such as Abdel-Hamid and Madnick [89].
In the following sections the authors explore Function Point analysis and the variations to the Function Point method that were designed to suite the object-oriented development model.

2.3 Function Point Analysis

The Function Point method was introduced in 1979 by Albrecht [30] to measure the size of a data-processing system from the end-user’s point of view.
The first step is the identification of all functions - each function is classified as belonging to one of the following function types: external input (EI), external output (EO), external inquiry (EQ), internal logical file (ILF), and external interface file (EIF).
Each function is then weighted based on its type and on the level of its complexity, in agreement with standard values as specified in the Counting Practices Manual.
As an example, for transactions (EI, EO, and EQ), the rating is based on the number of Data Element Types (DETs) and Referenced File Types (FTRs).
The FP measure has been used by application developers to estimate productivity, in terms of Function Points per person-month, and quality, in terms of the number of defects per Function Point with respect to requirements, design, coding, and user documentation phases.

2.4 Use Case Points (UCP) Model

The Use Case Points (UCP) model [9] is a software sizing estimation method based on use case counts called use case points.
Use cases describe the interaction between a primary actor—the initiator of the interaction—and the system itself, represented as a sequence of simple steps.
Actors are something or someone which exist outside the system under study, and that take part in a sequence of activities in a dialogue with the system, to achieve some goal: they may be end users, other systems, or hardware devices.
Use case modeling is part of the UML 2.0 and is therefore applicable in the early estimation of an object oriented software development project.
Weighing Environment Factor is an exercise to calculate a Use Case Point modifier which will modify the UUCP by the weight of the Environment factors.

2.5 Object-Oriented Function Point (OOFP) Model

Another related work in the sizing of OOP is the Object-Oriented Function Point (OOFP) model.
The function point size metric uses functional, logical entities such as inputs, outputs, and inquiries that tend to relate more closely to the functions performed by the software as compared to other measures, such as lines of code.
Inputs, Outputs and Inquiries are all treated in the same way: they are generically called “service requests” and correspond to class methods.
Classes within the application boundary correspond to ILFS, while classes outside the application boundary (including libraries) correspond to EIFS.
The OOFP is an adaptation of the original FP and although it attempts to use Object Oriented metrics, the framework itself is not very well suited to the objectoriented paradigm.

2.6 Class Point (CP) Model

The Class Point model as defined by Costagliola et al, 2005 [29], is similar to the OOFP approach in that it attempts to give an estimate of the size metric based on design/structural artifacts.
There are two forms of the Class point metric, named CP1 and CP2 respectively.
The former is used later in the design stage as more information is available where as CP1 is meant to be used a bit earlier at the beginning of the design process to carry out a preliminary size estimate.
These are the problem domain type (PDT)/entity classes, the human interaction type (HIT)/boundary classes, the data management type (DMT)/data classes, and the task management type (TMT)/ control classes.
C) Estimating the Total Unadjusted Class Point: this consists of computing a weighted total of the classes with their complexity levels determined.

2.7 SysML Point Overview

In object-oriented development projects, it is desirable to have an estimation model that imitates the continuous realization and refinement of the same system artifacts through the pre-implementation activities of the project development.
Use cases models are realized into object interaction diagrams and analysis classes, and these are further refined into the class structures that will be coded.
The Pattern Point model is a constituent of the proposed SysML point approach (Figure 3).
The remainder of this dissertation defines and validates the Pattern Point estimation model.

3.1 Design Patterns

In software engineering, a design pattern is a common reusable solution to a frequentlyoccurring problem in software design.
A design pattern is not a finished design that can be transformed directly into code.
It is a description or template for how to solve a problem that can be used in many different situations.
Typically, object-oriented design patterns display relationships and interactions between classes or objects without specifying the final application classes or objects that are involved.
Algorithms are not considered design patterns because they solve computational problems and not design problems.

3.1.1 History

The concept of a design pattern was not formalized for several years.
Patterns, in general, emerged as an architectural concept by Christopher Alexander in 1977.
In 1987, Kent Beck and Ward Cunningham began experimenting with the concept of applying patterns to computer programming and presented their results at the OOPSLA conference that year [20], [21].
In the following years, Beck, Cunningham and others followed up on this work.
That same year, the maiden Pattern Languages of Programming Conference was held and the following year, the Portland Pattern Repository was created for documentation of design patterns.

3.1.2 Uses

Design patterns provide tested, proven development paradigms and can thus speed up the development process.
Effective software design demands the consideration of issues that may not come to light until later in the implementation stage.
These techniques are difficult to apply to a broader range of problems.
Design patterns provide general solutions, documented in a format that doesn't require specifics tied to a particular problem.
Moreover, patterns enable developers to communicate using established names for software interactions.

3.1.3 Classification

Object-oriented design patterns are classified into the categories: Creational Patterns, Structural Patterns, and Behavioral Patterns, and described using the concepts of aggregation, delegation, and consultation [21].
Creational Patterns are design patterns that are concerned with object creation mechanisms; trying to create objects in a manner suitable to the situation.
Lastly, Behavioral Patterns are design patterns that identify common communication patterns between objects and realize these patterns.
By doing so, these patterns increase flexibility in carrying out this communication.
Table 1 lists design patterns classified into the three categories.

3.2 The Pattern Point Model

The Pattern Points (PP) model is an empirical parametric estimation method that utilizes UML sequence diagrams (object interactions) to predict development effort in the analysis phase of an object-oriented development project.
Each pattern is sized based on a pattern ranking and an implementation ranking.
As the interaction model is refined and designers have identified which patterns to use in the construction of each object interaction, a single unadjusted component size estimate can be attained.
Size estimates are then adjusted to accommodate for technical and environmental factors such as the lead programmer experience and requirements volatility.
At the late analysis stage where the object interactions have been further refined to reflect some initial design elements, the PP metric is computed a little differently.

3.2.1 The pattern point method

The Pattern Point size estimation process is composed of three main phases, corresponding to analogous phases in the FP approach [30].
The former is applicable at the beginning of the analysis phase where a majority of the design constructs have not been formalized, where as the latter takes into account the structural constructs that have been identified in the late analysis phase.
Following are the three main steps in estimating the Pattern Point size.

3.3 Identification and Classification of User Objects

The user objects that form the design patterns are classified into 4 groups.
Table 1 shows a default grouping as defined for the objects that comprise the 23 design patterns as defined by Gamma et al [3].
With regard to the previous example, the objects EmergencyReportForm and ReportEmergencyButton belong to HIT.
In the example [3], a DMT component is the IncidentManagement subsystem containing classes responsible for issuing SQL queries in order to store and retrieve records representing Incidents in the database.
D. Task management type (TMT) - TMT objects are responsible for the definition and control of tasks.

3.4 Evaluation of a Pattern Complexity Level

The second step is to evaluate the complexity level of the design patterns that are found in the object interaction analysis of the system.
The structural complexity is a function of the # of classes and # of associations that are identified in the structure of the design pattern.
These are the Interface pattern and the Filter pattern as defined in [4].
The PP1 metric is a function of the Degree of Difficulty (DD) and Structural Complexity (SC) of the design pattern, and PP2 takes the number of implemented concrete classes in the pattern also into consideration.

3.5 Estimating the Total Unadjusted Pattern Point

After estimating the complexity of each of the design patterns found in the object interaction analysis of the system according to Table 2, the authors can now compute the Total Unadjusted Pattern Point (TUPP).
To achieve this, Table 3 below, as defined in the Class Point estimation [29] is completed for Pattern Point estimation.
Typology and complexity level are given by the corresponding row and column, respectively.

3.6 Technical Complexity and Environmental Factor Estimation

The Technical Complexity Factor (TCF) [9] is determined by assigning the degree of influence (ranging from 0 to 5) that 13 general system characteristics have on the application, from the designer’s point of view.
The estimates given for the degrees of influence are recorded in the Technical factors table illustrated in Table 4.
The final value of the Adjusted Pattern Point (PP) is obtained by multiplying the Total Unadjusted Pattern Point value by the TCF and EAF PP = TUPP * TCF *EAF.
It is worth mentioning that the Technical Complexity Factor and Environmental Adjustment Factor are determined by taking into account the characteristics that are considered in the FP.

4. THEORETICAL VALIDATION

The PP metric as well as its composite metrics: DD, SC and PC have been defined so far, but a software measure can be acceptable and effectively usable only if its usefulness has been proven by means of a validation process.
The goal of such a process is to convey that a measure really measures the attribute that it is supposed to and it is practically useful [29].
Theoretical validation is a fundamental step in the validation process and should allow one to demonstrate that a measure satisfies properties characterizing the concept (e.g., size, complexity, coupling, etc.) it is intended to [5].
The framework contributes to the definition of a stronger theoretical ground of software measurement by providing convenient and intuitive properties for several measurement concepts, such as complexity, cohesion, length, coupling and size.
Within the framework, a system is characterized as a set of elements and a set of relationships between those elements, as formalized in the following definition.

4.1 Representation of Systems and Modules

A system S will be represented as a pair < E, R >, where E represents the set of elements of S and R is a binary relation on E (R ⊆ E Χ E) representing the relationships between S’s elements.
The basic properties of size measures are very intuitive; they ensure that the size cannot be negative, it is null when the system has no element, and it can be obtained as the sum of the size of its modules when they are disjoint.

4.3 Proof

Since the PP value is obtained as a weighted sum of nonnegative numbers, the Nonnegativity property holds.
If no design pattern (i.e. classes/objects, associations/calls) is present in the system analysis the PP value is trivially null and the Null Value property is also verified.
This means that for each pattern, the values for DD and SC will be unchanged after the partitioning.

5. EMPIRICAL VALIDATION

In the literature, it is largely accepted that system size is strongly correlated with development effort [17]-[20].
The theoretical validation conducted in the previous section illustrates that the Pattern Point measures satisfy properties that are considered requisite for size measures.
A theoretical validation alone does not guarantee the usefulness of the measures as predictors of effort and cost.
Thus, the author has performed an empirical study purposed to determine whether the Pattern Point measures can be used to predict the development effort of OO systems in terms of person-days (8 hours per day).
The subject of the study was the initial release of the IBM Lotus Quickr software product.

5.1 IBM Lotus Quickr 8.0

Lotus Quickr is IBM’s team collaboration and content sharing software that helps users access and interact with the people, information and project materials that they need to get their work done.
The software was released June 2007 and the Pattern Point method was applied retroactively on the recorded data.

5.2 Applying the Pattern Point Method to Lotus Quickr

Like many software development projects, there was incomplete documentation particularly with respect to the artifacts from the analysis phase of the software product i.e. there was little or no documentation of object interaction analyses including sequence diagrams.
There was ample data on implemented use case scenarios, and also the package structure of the code was designed for easy identification of the design patterns in play, which helped in the reverse engineering of the object interaction diagrams in the following section.
The reverse engineering tool MaintainJ was employed to reverse engineer the object interaction diagrams involved in a particular use case.

5.3 Reverse Engineering Using MaintainJ

MaintainJ is an Eclipse plug-in that generates runtime UML sequence and class diagrams for a use case.
In Step 2, the user can now log in to the Quickr application and perform use case scenarios with the MaintainJ application running.
The author has written a separate tool that takes as input the trace file and it outputs class and method names involved in the object interactions to a text file.
The same was also verified in the Class Point approach, i.e. whether or not the 14 Function Point factors are useful in their context, and also if the four additional Class Point factors enhance the prediction accuracy [29].
In the remainder of section 6, the cross validation process applied to PP1 and PP2 is described.

5.4 The Cross Validation Process

To carry out the cross validation process on the 78 selected use cases from the Lotus Quickr, the following steps were performed: 1. The whole data set was partitioned into eight randomly selected test sets; seven of equal size (10) and the last test set had two less data elements (8).
For each data set, the remaining use cases were analyzed to identify the corresponding training set obtained by removing influential outliers.
An Ordinary Least-Squares (OLS) regression analysis was performed on each training set to derive the effort prediction model.
Accuracy was separately calculated for each test set and the resulting values have been aggregated across all 8 test sets.
In what follows, the authors describe each of the above steps.

5.5 Partitioning the Data Set

Table 6 reports the data of the 78 use cases, following the order resulting from the random partition performed.
Thus, the first ten use cases form the first test set, the subsequent ten use cases form the second one, and so on.

5.6 OLS Regression Analysis to Derive Effort Prediction Models

An Ordinary Least-Squares regression analysis was applied in order to perform an empirical validation of the PP1 and PP2 measures.
When applying the OLS regression, a number of determinative indicators have been taken into account to establish the quality of the prediction.
Furthermore, to evaluate the statistical significance a t-test was performed and the p-value, t-value of the coefficient and intercept for each model was determined.
When it is less than 0.05, the authors can reject the hypothesis that the coefficient is zero; the reliability of the predictor is then given by the t-value of the coefficient.

5.7 Accuracy Evaluation of the Prediction Models

In order to assess the acceptability of the effort prediction models, the criteria suggested by Conte et al. [31] were adopted.
For each test set, the prediction accuracy has been evaluated by taking into account a summary measure, given by the Mean of MRE (MMRE), to measure the aggregation of MRE over the 10 observations.
The values of such measures are reported in Tables 18 to 25.
This represents an acceptable threshold for an effort prediction model, as suggested by Conte et al [31], which is confirmed by the aggregate (mean) and median MMRE values for PP1 and PP2 in Table 17, which are both ≤ 0.25.
This suggests the use of the PP1 measure at the beginning of the development process, in order to obtain a preliminary effort estimation, which can be refined by employing PP2 when the number of Pattern Concrete classes is known.

6.1 Single Measures and Their Sums

Courtney et al. [71] report that researchers who set out to learn empirical relationships by experimenting with different combinations of measures and functional forms before choosing the one with the highest correlation tend to make a good model with small data sets.
Then, the performance of the derived models for all considered measures was evaluated using the data coming from the corresponding testing sets.
Table 27 shows a summary descriptive statistics of the measures considered.
In fact, all the measures with SC fair slightly better than the PP1 metric.
First, the PP2 metric is better correlated to effort than any single measure composing it.

6.2 Multivariate OLS Regression

In order to complete the analysis, a multivariate OLS regression using as independent variables the basic measures of the Pattern Point approach, was carried out.
Again, the 8-fold cross validation technique was applied by carrying out a multivariate OLS regression on the eight training sets, and then evaluating the performance of the derived models, using the data coming from the corresponding testing sets.
Table 29 reports the aggregate MMRE and PRED (0.25) resulting from this analysis.
Compared with the values reported in Table 26, it can be deduced that the PP2 measure exhibits a more accurate predictive capability.
In any case, this study has confirmed once again that the use of the PP2 measure may yield a better predictive accuracy in models, which are based on a multivariate regression as well.

7.1 Conclusions

There are several models in existence that are used to estimate the size of software systems.
The Pattern Point model provides a system-level size measure using the design patterns from object interaction analyses in the late OOA phase of development.
The empirical study presented in the dissertation has suggested that the PP1 measure may have an equal or lesser predictive capability than its constituent SC metric.
A multi-project study is desired to assess the possible effects of the Technical Complexity Factors and Environmental Factors in the Pattern Point method.

Did you find this useful? Give us your feedback

Figures (9)

Figure 2: Example of mixture of binomial distribution. The empirical histogram of G(v) is modelled by the model in Eq. (7), with R = 8. The Y axis is in log-coordinates for the sake of readability.

Figure 4: Statistical model of the effects for the audio instructions-video instructions contrast, on S = 81 subjects. (a) z-value associated withe the RFX test; (b) group variance estimate; (c) z-value of the D’Agostino-Pearson test for normality of the effects β̂; (d) z-value of the D’AgostinoPearson test applied to the normalized effects β̂ σ̂ . Note that all the z-maps are thresholded at z = 8 for numerical reasons. The color scale of the variance image has been chosen arbitrarily in order to have supra-threshold areas that are comparable with the other maps. The variance is expressed in squared percentage of the BOLD mean signal.

Figure 9: Dependence of the reliability κ (a), the sensitivity λ (b), inter-supra-threshold cluster distance penalty Φ (c-d) of the statistical analysis on the group statistic used. Φ is based clusters of size greater than η = 10 (c) or η = 30 (d). These quantities assessed considering R = 8 disjoint groups of size S = 10 within the population of 81 subjects, using the left click-right click contrast.

Figure 3: Statistical model of the effects for the left click-right click contrast, on S = 81 subjects. (a) z-value associated withe the RFX test; (b) group variance estimate; (c) z-value of the D’Agostino-Pearson test for normality of the effects β̂ ; (d) z-value of the D’Agostino-Pearson test applied to the normalized effects β̂ σ̂ . Note that all the z-maps are thresholded at z = 8 for numerical reasons. The color scale of the variance image has been chosen arbitrarily in order to have supra-threshold areas that are comparable with the other maps. The variance is expressed in squared percentage of the BOLD mean signal.

Figure 6: Dependence of the reproducibility and of the sensitivity of the group RFX analysis on the functional contrast under consideration. These results are obtained by drawing 5 disjoint groups of S = 16 subjects in the population of 81 subjects, and applying the procedure described in section 2.5.1. The threshold is θ = 3.1 (a) The results are more reliable for the contrast that shows auditory regions than for a contrast that shows motor activity or a contrast that shows the regions involved in the computation task. (b) By contrast the size of the putatively activated areas is greater for the contrast that shows regions involved in computation, and smaller for the contrast that shows the regions involved in motor activity. (c-d) The cluster variability penalty Φ is presented for clusters of more than η = 10 (c) or η = 30 (d) voxels. The behaviour is as expected, with the smallest value for the auditory-specific contrast.

Figure 7: Dependence of the reproducibility, the sensitivity, and the distance between suprathreshold clusters of the group RFX analysis on the threshold chosen to binarize the statistic maps. These results are obtained by drawing 5 disjoint groups of S = 16 subjects in the population of 81 subjects, and applying the procedure described in section 2.5.1. This is performed on the images of the left click-right click contrast. (a) The reproducibility index κ shows a maximum for θ ∼ 2.7. (b) The sensitivity decreases when θ increases. (c,d) The average distance between supra-threshold clusters of more than 10(c) or 30(d) voxels across groups has a minimum around θ ∼ 3.

Figure 8: Dependence of the reproducibility κ (a), the sensitivity λ (b) and the average distance between supra-threshold cluster centroids Φ (c) of the group RFX analysis on the group size. The reliability is assessed considering disjoint groups of size S = 10, 13, 16, 20, 27, 40 within the population of 81 subjects. This is performed on the images of the left click-right click contrast. (a) The reproducibility increases with S and reaches a plateau for S > 20. (b) The size of putatively activated areas steadily increases with S. (c-d) The average intra-cluster distance decreases with S; it reaches a plateau for S > 20 when η = 10 (c), whereas it further decreases when η = 30 (d).

Figure 1: Illustration of the low sensitivity and weak reliability of supra-threshold patterns in standard group studies. (a) For a functional contrast that shows regions involved in a computation task, we show activity maps thresholded at a p < 0.001 level after a random effect analysis on 6 disjoint groups of 13 subjects; the position of the view is z=37mm in the MNI normalized space.(b) In the same plane, here is the same map computed from all the subjects together. Note the low sensitivity and weak reliability of the maps in (a).

Figure 5: Statistical model of the effects for the computation-reading contrast, on S = 81 subjects. (a) z-value associated withe the RFX test; (b) group variance estimate; (c) z-value of the D’Agostino-Pearson test for normality of the effects β̂; (d) z-value of the D’Agostino-Pearson test applied to the normalized effects β̂ σ̂ . Note that all the z-maps are thresholded at z = 8 for numerical reasons. The color scale of the variance image has been chosen arbitrarily in order to have supra-threshold areas that are comparable with the other maps. The variance is expressed in squared percentage of the BOLD mean signal.

Frequently Asked Questions (13)

Q1. What have the authors contributed in "Analysis of a large fmri cohort: statistical and methodological issues for group analyses" ?

While many efforts have been made to control the rate of false detections, statistical characteristics of the data have rarely been studied, and the reliability of the results ( supra-thresholds areas that are considered as activated regions ) has rarely been assessed. In this work, the authors take advantage of the large cohort of subjects who underwent the Localizer experiment to study the statistical nature of group data, propose some measures of the reliability of group studies, and address simple methodological questions as: is there, from the point of view of reliability, an optimal statistical threshold for activity maps ? Their results suggest that i ) optimal thresholds can indeed be found, and are rather lower than usual corrected for multiple comparison thresholds ii ) 20 subjects or more should be included in functional neuroimaging studies in order to have sufficient reliability, iii ) non-parametric significance assessment should be preferred to parametric methods iv ) cluster-level thresholding is more reliable than voxel-based thresholding v ) mixed effects tests are much more reliable than random effects tests.

Q2. What are the future works mentioned in the paper "Analysis of a large fmri cohort: statistical and methodological issues for group analyses" ?

Several directions may be addressed in the future: • First trying to relate inter-subject variability to behavioral differences and individual or psychological characteristics of the subjects. Once again, such investigation may be undertaken only on large databases of subjects, and the authors the data basis used in this experiment might and probably will be used in such a framework. • Second, efforts will further be made to relate spatial functional variability to anatomical variability. While some cortex-based analysis reports have indicated a greater sensitivity than standard volume-based mappings [ Fischl et al., 1999 ], statistical evidence is still lacking, and it is not clear at all how much can be gained when taking into account macroanatomical features, e. g. sulco-gyral anatomy.

Q3. How many voxels are used to handle the case I(r)?

Appropriate penalty terms are used to handle the case I(r) = 0. The authors have performed some experiments using η = 10 voxels or η = 30 voxels, and use δ = 6mm.

Q4. What is the main reason why scientists spend lots of effort in neuroimaging studies?

Many scientists spend lots of efforts in order to obtain statistically significant results in neuroimaging studies, in order to validate a prior hypothesis on brain function, and it is certainly true that one of the greatest difficulties that they have to face is the high variability that is present in their datasets across subjects.

Q5. What is the way to analyse data from group studies?

Voxel-based random effects analysis is the standard way to analyse data from group studies (although the extraction of discrete local maxima [Worsley, 2005] presents an attractive alternative).

Q6. What is the reason for the choice of parametric tests?

While parametric tests are particularly efficient and computationally cheap, they are based on possibly unrealistic hypotheses that may reduce their sensitivity.

Q7. How can the authors compare the reliability of a statistical model?

In order to estimate the reliability of a statistical model, the authors need a method to compare statistical maps issued from the same technique, but sampled from different groups of subjects.

Q8. How long can a test be performed on a dataset?

It is worthwhile to note that the implementation of the tests in C reduces computation time to very reasonable time (cluster-level P-values can e.g. be computed in less than one minute on a dataset of ten subjects).

Q9. What is the reliability measure for the left click-right click contrast?

The reliability measure is computed for 100 different splits of the population of subjects into R = 5 groups of S = 16 subjects, in the case of the left click-right click contrast.

Q10. What is the reason for the statistical function's higher performance?

Although the statistic function does not take into account the group variance - as argued earlier, this is probably the reason of its higher performance - its distribution under the null hypothesis is tabulated by random swaps of the effects signs, so that it is indeed a valid group inference technique.

Q11. What is the probability of the datawriteslog being estimated using EM or Newton’s?

using a spatial independence assumption, the log-likelihood of the datawriteslog(P (G)|λ, π0A, π 0 The author) = cst +V ∑v=1log ( λ(π0A) R−G(v)(π1A) G(v) + (1 − λ)(π0I ) R−G(v)(π1I ) G(v) )(7)Assuming R ≥ 3 the three free parameters, π0A, π 0 The author, λ can be estimated using EM or Newton’s methods.

Q12. How large is the magnitude order of local shifts?

The magnitude order of such local shifts is probably as large as 1cm in many instances (this can be observed for functional regions like the the motor cortex or the visual areas [Thirion et al., ress, Stiers et al., 2006] or the position of anatomical landmarks[Collins et al., 1998, Hellier et al., 2003]).

Q13. What is the difference between the voxels in the group result map?

Since the parcel centres are defined at the group level in Talairach space, the voxels in the group result map are assigned to the parcel with the closest center in Talairach space.

Analysis of a large fMRI cohort: Statistical and methodological issues for group analyses.

Summary (10 min read)

1.1 Motivation

1.2 SysML

1.3 Research Objectives

1.4 Organization of the Dissertation

2.1 Software Project Management

2.2 Effort Estimation

2.2.1 Empirical parametric models

2.2.2 Empirical nonparametric models

2.2.3 Analogical models

2.2.4 Theoretical models

2.2.5 Heuristic models

2.3 Function Point Analysis

2.4 Use Case Points (UCP) Model

2.5 Object-Oriented Function Point (OOFP) Model

2.6 Class Point (CP) Model

2.7 SysML Point Overview

3.1 Design Patterns

3.1.1 History

3.1.2 Uses

3.1.3 Classification

3.2 The Pattern Point Model

3.2.1 The pattern point method

3.3 Identification and Classification of User Objects

3.4 Evaluation of a Pattern Complexity Level

3.5 Estimating the Total Unadjusted Pattern Point

3.6 Technical Complexity and Environmental Factor Estimation

4. THEORETICAL VALIDATION

4.1 Representation of Systems and Modules

4.3 Proof

5. EMPIRICAL VALIDATION

5.1 IBM Lotus Quickr 8.0

5.2 Applying the Pattern Point Method to Lotus Quickr

5.3 Reverse Engineering Using MaintainJ

5.4 The Cross Validation Process

5.5 Partitioning the Data Set

5.6 OLS Regression Analysis to Derive Effort Prediction Models

5.7 Accuracy Evaluation of the Prediction Models

6.1 Single Measures and Their Sums

6.2 Multivariate OLS Regression

7.1 Conclusions

Figures (9)

Citations

Cites background from "Analysis of a large fMRI cohort: St..."

Cites background from "Analysis of a large fMRI cohort: St..."

Cites background from "Analysis of a large fMRI cohort: St..."

References

"Analysis of a large fMRI cohort: St..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (13)

Q1. What have the authors contributed in "Analysis of a large fmri cohort: statistical and methodological issues for group analyses" ?

Q2. What are the future works mentioned in the paper "Analysis of a large fmri cohort: statistical and methodological issues for group analyses" ?

Q3. How many voxels are used to handle the case I(r)?

Q4. What is the main reason why scientists spend lots of effort in neuroimaging studies?

Q5. What is the way to analyse data from group studies?

Q6. What is the reason for the choice of parametric tests?

Q7. How can the authors compare the reliability of a statistical model?

Q8. How long can a test be performed on a dataset?

Q9. What is the reliability measure for the left click-right click contrast?

Q10. What is the reason for the statistical function's higher performance?

Q11. What is the probability of the datawriteslog being estimated using EM or Newton’s?

Q12. How large is the magnitude order of local shifts?

Q13. What is the difference between the voxels in the group result map?