Advances in Methodology and Statistics
Comparing Two Partitions of Non-Equal Sets of Units
2018 Marjan Cugmas and Anuška Ferligoj; 15(1):1-21
Rand (1971) proposed what has since become a well-known index for comparing two partitions obtained on the same set of units. The index takes a value on the interval between 0 and 1, where a higher value indicates more similar partitions. Sometimes, e.g. when the units are observed in two time periods, the splitting and merging of clusters should be considered differently, according to the operationalization of the stability of clusters. The Rand Index is symmetric in the sense that both the splitting and merging of clusters lower the value of the index. In such a non-symmetric case, one of the Wallace indexes (Wallace, 1983) can be used. Further, there are several cases when one wants to compare two partitions obtained on different sets of units, where the intersection of these sets of units is a non-empty set of units. In this instance, the new units and units which leave the clusters from the first partition can be considered as a factor lowering the value of the index. Therefore, a modified Rand index is presented. Because the splitting and merging of clusters have to be considered differently in some situations, an asymmetric modified Wallace Index is also proposed. For all presented indices, the correction for chance is described, which allows different values of a selected index to be compared.
Download the paper
Web Survey Paradata on Response Time Outliers: A Systematic Literature Review
2018 Miha Matjašič, Vasja Vehovar and Katja Lozar Manfreda; 15(1):23-41
In the last two decades, survey researchers have
intensively used computerised methods for the collection of different types of
paradata, such as keystrokes, mouse clicks and response times, to evaluate and
improve survey instruments as well as to understand the survey response
process. With the growing popularity of web surveys, the importance of paradata
has further increased. Within this context, response time measurement is the
prevailing paradata approach. Papers typically analyse the time (measured in
milliseconds or seconds) a respondent needs to answer a certain item, question,
page or questionnaire. One of the key challenges when analysing the response
time is to identify and separate units that are answering too quickly or too
slowly. These units can have a poor response quality and are typically labelled
as response time outliers. This paper focuses on approaches for identifying and
processing response time outliers. It presents a systematic overview of
scientific papers on response time outliers in web surveys. The key observed
characteristics of the papers are the approaches used, the level of time
measurement, the processing of response time outliers and the relationship
between response time and response quality. The results show that knowledge on
response time outliers is scattered, inconsistent and lacking systematic
comparisons of approaches. Consequently, there is a need to improve and upgrade
the knowledge on this issue and to develop new approaches that will overcome
existing deficiencies and inconsistencies in identifying and dealing with
response time outliers.
Download the paper
Download the supplementary information (Appendix)
Behind the Curve and Beyond: Calculating Representative Predicted Probability Changes and Treatment Effects for Non-Linear Models
2018 Bastian Becker; p. 15(1):43-58
Parameter coefficients from non-linear models are
inherently difficult to interpret, and scholars frequently opt for computing
and comparing predicted probabilities for variables of interest. In an
influential article, Hanmer and Ozan Kalkan (2013) discuss the two most common
approaches, the average case respectively observed values
approach, and make a strong case for the latter. In this paper, I propose a
further refinement of the observed values approach for the purpose of computing
predicted probability changes. This refinement concerns the use of
counterfactual values for the independent variable of interest. I demonstrate
that accounting for non-linearities with regards to the variable of interest is
important to avoid estimation biases. I also discuss the implications of this
insight for estimating average treatment effects from observational data.
Download the paper
Download the supplementary information (Computer code)
Gumbel GARCH Model with Stock Application
2018 Mehrnaz Mohammadpour and Fatemeh Ziaeenejad; p. 15(1):59-72
The paper proposes a new GARCH model with Gumbel
conditional distribution. Several statistical properties of the model are
established, like autocorrelation function and stationarity. We consider two
methods for estimating the unknown parameters of the model and investigate
properties of the estimators. The performances of the estimators are checked by
a simulation study. We investigate the application of the process using a real
stock data.
Download the paper
Internal Evaluation Criteria for Categorical Data in Hierarchical Clustering: Optimal Number of Clusters Determination
2018 Zdeněk Šulc, Jana Cibulková, Jiřı́ Procházka and Hana Řezanková ; p. 15(2):1-20
The paper compares 11 internal evaluation criteria for
hierarchical clustering of categorical data regarding a correct number of
clusters determination. The criteria are divided into three groups based on a
way of treating the cluster quality. The variability-based criteria use the
within-cluster variability, the likelihood-based criteria maximize the
likelihood function, and the distance-based criteria use distances within and
between clusters. The aim is to determine which evaluation criteria perform
well and under what conditions. Different analysis settings, such as the used
method of hierarchical clustering, and various dataset properties, such as the
number of variables or the minimal between-cluster distances, are examined. The
experiment is conducted on 810 generated datasets, where the evaluation
criteria are assessed regarding the optimal number of clusters determination
and mean absolute errors. The results indicate that the likelihood-based BIC1
and variability-based BK criteria perform relatively well in determining the
optimal number of clusters and that some criteria, usually the distance-based
ones, should be avoided.
Download the paper
Download the supplementary information (Zip archive)
Mode Effects on Socially Desirable Responding in Web Surveys Compared to Face-to-Face and Telephone Surveys
2018 Nejc Berzelak and Vasja Vehovar ; p. 15(2):21-43
This paper elaborates upon differences in socially
desirable responding as being the result of mode effects between web,
telephone, and face-to-face survey modes. Social desirability is one of the
main threats to comparability of data between different modes. The paper
conceptualises socially desirable responding as a specific type of mode effect,
which is not only a result of inherent characteristics of a survey mode, but is
also mediated and moderated by complex interdependencies of specific survey
implementations, contextual factors, and characteristics and behaviours of
respondents. While web surveys are generally less prone to socially desirable
responding, it is essential to be wary of circumstances that may reduce the
perceived privacy of the survey situation and lead to biased reporting. The
presented empirical study analyses the answers to a large number of items used
in a pilot implementation of the Generations and Gender Survey across the three
modes to gain insights into the incidence of socially desirable responding and
its role in the observed differences in estimates. The comparison of means,
distributions, and proportions of extreme responses to scale questions is
performed across 89 survey items. The results are inline with the previous
findings on lower susceptibility of web surveys to social desirability bias.
More importantly, the findings suggest that the problem of socially desirable
responding is likely to be a major contributor to the differences in mean
estimates, response distributions, and the level of extreme responding between
the studied modes.
Download the paper
Download the supplementary information (PDF file)
Estimation of Power Function Distribution Based on Selective Order Statistic
2018 Mohd T. Alodat, Mohammad Y. Al-Rawwash and Sameer A. Al-Subh; p. 15(2):45-56
In this article, we present the selective order
statistic sampling scheme as a promising approach to estimate the parameter of
the univariate power function distribution. We derive the maximum likelihood
estimator and the method of moments estimator of the power function
distribution parameter as well as the reliability parameter and the ratio of
two means. Moreover, we derive the asymptotic properties of the proposed
estimators. Finally, we conduct simulation studies to investigate the
performance of the selective order statistic scheme and concluded that it suits
the power function distribution and we found that the maximum likelihood
estimator is better than the method of moments estimator under the selective
order statistic sampling scheme.
Download the paper
Download the supplementary information (PDF file)