ACM DL

ACM Journal of

Data and Information Quality (JDIQ)

Menu
Latest Articles

On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports

In the last five years there has been a flurry of work on information extraction from clinical documents, that is, on algorithms capable of... (more)

Cluster-Based Quality-Aware Adaptive Data Compression for Streaming Data

Wireless sensor networks (WSNs) are widely applied in data collection applications. Energy efficiency is one of the most important design goals of... (more)

Challenges of Open Data Quality

Data Quality Challenges in Social Spam Research

Information Quality Challenges in Shared Healthcare Decision Making

Challenge Paper

NEWS

March, 2017 -- Call for Papers: Special issue on Reproducibility in Information Retrieval Extended Submission deadline: October 6, 2017 

Feb. 2017 -- Call for Papers: 
Special Issue on Improving the Veracity and Value of Big Data 
Extended Submission deadline: April  1st, 2017

Jan. 2016 -- New Book Announcement
Carlo Batini and Monica Scannapieco have a new book:

Data and Information Quality: Dimensions, Principles and Techniques 

Springer Series: Data-Centric Systems and Applications, soon available from the Springer shop

The Springer flyer is available here


Experience and Challenge papers:  JDIQ now accepts two new types of papers. Experience papers describe real-world applications, datasets and other experiences in handling poor quality data. Challenges papers briefly describe a novel problem or challenge for the IQ community. See Author Guidelines for details.

Forthcoming Articles

Foreword from the New JDIQ Editor-in-Chief

Comparative analysis of sequence clustering methods for de-duplication of biological databases

The massive volumes of data in biological sequence databases provide a remarkable resource for large-scale biological studies. However the underlying data quality of these resources is a critical concern. A particular is duplication, in which multiple records have similar sequences, creating a high level of redundancy that impacts database storage, curation, and search. Biological database de-duplication has two direct applications: for database curation, where detected duplicates are removed to improve curation efficiency; and for database search, where detected duplicate sequences may be flagged but remain available to support analysis. Clustering methods have been widely applied to biological sequences for database de-duplication. Given high volumes of data, exhaustive all-by-all pairwise comparison of sequences cannot scale, and thus heuristics have been used, in particular use of simple similarity thresholds. This heuristic introduces a trade-off between efficiency and accuracy that we explore in this paper: if the similarity threshold is very high, the methods are accurate but slow; if the similarity threshold is too low, the methods are fast but inaccurate. We study the two best-known clustering tools for sequence database de-duplication, CD-HIT and UCLUST. Our contributions include: a detailed assessment of the redundancy remaining after de-duplication; application of standard clustering evaluation metrics to quantify the cohesion and separation of the clusters generated by each method; and a biological case study that assesses intra-cluster function annotation consistency, to demonstrate the impact of these factors in practical application of the sequence clustering methods. The results show that the trade-off between efficiency and accuracy becomes acute when low threshold values are used and when cluster sizes are large. The evaluation leads to practical recommendations for users for more effective use of the sequence clustering tools for de-duplication.

Requirements for Data Quality Metrics

Data quality and especially the assessment of data quality have been intensively discussed in research and practice alike. To adequately support an economically oriented management of data quality and decision making under uncertainty, it is essential to assess the data quality level by means of well-founded metrics. However, if not adequately defined, these metrics can lead to wrong decisions and economic losses. Therefore, based on a decision-oriented framework, we present a set of five requirements for data quality metrics. If these requirements are met, the respective metric and its values are capable of supporting an economically oriented management of data quality and decision making under uncertainty. We further demonstrate the applicability and efficacy of these requirements by evaluating two well-known data quality metrics.

Editor in Chief (January 2014 - May 2017) Farewell Report

Ontological Multidimensional Data Models and Contextual Data Quality

Data quality assessment and data cleaning are context-dependent activities. Motivated by this observation, we propose the Ontological Multidimensional Data Model (OMD model), which can be used to model and represent contexts as logic-based ontologies. The data under assessment is mapped into the context, for additional analysis, processing, and quality data extraction. The resulting contexts allow for the representation of dimensions, and multidimensional data quality assessment becomes possible. At the core of a multidimensional context we include a generalized multidimensional data model and a Datalog+/- ontology with provably good properties in terms of query answering. These main components are used to represent dimension hierarchies, dimensional constraints, dimensional rules, and define predicates for quality data specification. Query answering relies upon and triggers navigation through dimension hierarchies, and becomes the basic tool for the extraction of quality data. The OMD model is interesting per se, beyond applications to data quality. It allows for a logic-based, and computationally tractable representation of multidimensional data, extending previous multidimensional data models with additional expressive power and functionalities.

Validating data quality actions in scoring processes

Data Quality is gaining momentum among organizations from when they realized that poor data quality might cause failures and/or inefficiencies, thus compromising business processes and application results. However, enterprises often adopt data quality assessment and improvement methods based on practical and empirical approaches, without conducting a rigorous analysis of the data quality issues and the outcome of the enacted data quality improvement practices. In particular, data quality management, and especially the identification of the data quality dimensions to be monitored and improved is up to knowledge-workers on the basis of their skills and experience. Control methods are therefore designed on the basis of expected and evident quality problems and thus they may not be effective in dealing with unknown and/or unexpected problems. This paper aims to provide a methodology, based on fault injection, for validating the data quality actions used by organizations. We show how it is possible to check if the adopted techniques properly monitor the real issues that may damage business processes. At this stage we focus on scoring processes, i.e., processes in which the output represents the evaluation or ranking of a specific object. We show the effectiveness of our proposal by means of a case study in the financial risk management area.

Experience: Learner Analytics Data Quality for an eTextbook System

We present lessons learned related to data collection and analysis from over four years of experience with the eTextbook system OpenDSA. The use of such cyberlearning systems is expanding rapidly in both formal and informal educational settings. While the precise issues related to any such project are idiosyncratic based on the data collection technology and goals of the project, certain types of data collection problems will be common. We first describe several problems that we encountered with syntactic-level data collection. We then discuss fundamental issues with relating events to users, and tracking users over time, which are both prerequisites to converting syntactic-level interaction streams to semantic-level behavior needed for higher-order analysis of the data. We then present examples of such behavior-level analysis, which in turn lead to changes in the OpenDSA system needed to to replace undesirable learning behavior with more productive behavior.

Bibliometrics

Publication Years 2009-2017
Publication Count 129
Citation Count 231
Available for Download 129
Downloads (6 weeks) 1429
Downloads (12 Months) 12348
Downloads (cumulative) 81550
Average downloads per article 632
Average citations per article 2
First Name Last Name Award
Peter Aiken ACM Senior Member (2011)
Mikhail Atallah ACM Fellows (2006)
Ahmed Elmagarmid ACM Fellows (2012)
ACM Distinguished Member (2009)
Wenfei Fan ACM Fellows (2012)
Matthias Jarke ACM Fellows (2013)
Daniel S Katz ACM Senior Member (2011)
Beth A. Plale ACM Senior Member (2006)

First Name Last Name Paper Counts
Yang Lee 4
John Talburt 3
Stuart Madnick 3
Peter Edwards 3
Nan Tang 3
G Shankaranarayanan 3
Peter Christen 3
Daisyzhe Wang 2
Kewei Sha 2
Carolyn Matheus 2
Xiaobai Li 2
Ali Sunyaev 2
Vassilios Verykios 2
Wenfei Fan 2
Felix Naumann 2
Roman Lukyanenko 2
Wolfgang Lehner 2
Dinusha Vatsalan 2
Roger Blake 2
Christan Grant 2
Arnon Rosenthal 2
Sherali Zeadally 2
Eitel LauríA 2
Ross Gayler 2
Ali Khenchaf 1
Aleksandra Mojsilović 1
Trent Rosenbloom 1
Shawn Hardenbrook 1
D Elizabeth 1
Subhash Bhalla 1
Kaushik Dutta 1
Jeffrey Parsons 1
Valerie Sessions 1
Kresimir Duretec 1
Leena Al-Hussaini 1
Pim Dietz 1
Eric Nelson 1
Manoranjan Dash 1
M Kaiser 1
Floris Geerts 1
Thomas Redman 1
David Becker 1
Xiaoming Fan 1
Giannis Haralabopoulos 1
Kyle Niemeyer 1
Arfon Smith 1
Archana Nottamkandath 1
Darryl Ahner 1
Hongwei Zhu 1
Claudio Hartmann 1
Cihan Varol 1
Coşkun Bayrak 1
David Robb 1
Rosella Gennari 1
Mark Braunstein 1
Marta Zárraga-Rodríguez 1
Peter Elkin 1
C Raj 1
Matteo Magnani 1
Hema Meda 1
Amitava Bagchi 1
Craig Fisher 1
Sufyan Ababneh 1
Jiannan Wang 1
Jianing Wang 1
Ezra Kahn 1
Adam Kriesberg 1
Sebastian Neumaier 1
Norbert Ritter 1
R Greenwood 1
Ayush Singhania 1
George Moustakides 1
Bernd Heinrich 1
Mathias Klier 1
Bing Lv 1
Paul Mangiameli 1
Marcos Gonçalves 1
Dirk Ahlers 1
Alberto Bartoli 1
Hongwei Zhu 1
James McNaull 1
Kelly Janssens 1
Judith Gelernter 1
Mouhamadoulamine Ba 1
Ciro D'Urso 1
Hua Zheng 1
Ahmed Elmagarmid 1
Michael Mannino 1
Fiona Rohde 1
Theodore Speroff 1
Elliot Fielstein 1
Yang Lee 1
Josh Attenberg 1
Judee Burgoon 1
Marco Valtorta 1
Sean Goldberg 1
Andreas Rauber 1
Sabrina Abdellaoui 1
Catherine Burns 1
David Corsar 1
Subbarao Kambhampati 1
Jeff Heflin 1
Alun Preece 1
Boris Otto 1
Alan March 1
Marilyn Tremaine 1
Christian Skalka 1
Anja Klein 1
Marco Cristo 1
Andrea Lorenzo 1
Maurizio Murgia 1
Richard Wang 1
Mario Mezzanzanica 1
Roberto Boselli 1
Luvai Motiwalla 1
Sandra Geisler 1
Daniel Katz 1
Douglas Hodson 1
Dov Biran 1
Edward Anderson 1
Pierpaolo Vittorini 1
Karthikeyan Ramamurthy 1
Ralf Tönjes 1
Laurent Lecornu 1
Shelly Sachdeva 1
Stuart Madnick 1
Foster Provost 1
Monica Tremblay 1
Debra VanderMeer 1
Nicola Ferro 1
Christian Becker 1
Chintan Amrit 1
Aseel Basheer 1
Sören Auer 1
Christoph Lange 1
Sharad Mehrotra 1
Sandra Sampaio 1
Dustin Lange 1
Therese Williams 1
Jianyong Wang 1
Chris Baillie 1
Beth Plale 1
Banda Ramadan 1
John Krogstie 1
John O’Donoghue 1
Wenjun Li 1
Davide Ceolin 1
Khoi Tran 1
Lan Cao 1
Payam Barnaghi 1
Jean Caillec 1
Arputharaj Kannan 1
Anupkumar Sen 1
Rashid Ansari 1
Fahima Nader 1
Philip Woodall 1
Shuai Ma 1
Nigel Martin 1
Diego Marcheggiani 1
Axel Polleres 1
Venkata Meduri 1
Suzanne Embury 1
Hubert Österle 1
Lizhu Zhou 1
Jeffrey Vaughan 1
Melanie Herschel 1
Huizhi Liang 1
Erhard Rahm 1
Paolo Coletti 1
Mirko Cesarini 1
Hongjiang Xu 1
Vincenzo Maltese 1
Xiaoping Liu 1
Fred Morstatter 1
Paul Groth 1
Valentina Maccatrozzo 1
A Borthick 1
Mohamed Yakout 1
Sara Tonelli 1
Kush Varshney 1
Rahul Basole 1
Jimeng Sun 1
Dmitry Chornyi 1
Danilo Montesi 1
Omar Alonso 1
Ashfaq Khokhar 1
Alan Labouseur 1
Alexandra Poulovassilis 1
Fabrizio Sebastiani 1
Peter Arbuckle 1
Yuheng Hu 1
Yi Chen 1
Robert Meusel 1
Maurice Van Keulen 1
Irit Askira Gelman 1
Stephen Chong 1
Edoardo Pignotti 1
Eric Medvet 1
Fabiano Tarlao 1
John Herbert 1
Juan Augusto 1
Maurice Mulvenna 1
Paul Mccullagh 1
Fabio Mercorio 1
Fei Chiang 1
J Jha 1
Siddharth Sitaramachandran 1
Laure Berti-Équille 1
Sven Weber 1
Richard Briotta 1
Johann Freytag 1
María Bermúdez-Edo 1
Maria Alvarez 1
Panagiotis Ipeirotis 1
Milan Markovic 1
Wenyuan Yu 1
Justin St-Maurice 1
Jürgen Umbrich 1
Fabian Panse 1
Fumiko Kobayashi 1
Paolo Missier 1
Kristin Weber 1
Paul Glowalla 1
Wenyuan Yu 1
Xu Pu 1
Benjamin Ngugi 1
Beverly Kahn 1
Fausto Giunchiglia 1
Christoph Quix 1
Matthias Jarke 1
Wan Fokkink 1
Jeffrey Fisher 1
Jeremy Millar 1
Adriane Chapman 1
Hilko Donker 1
Heiko Müller 1
Steven Brown 1
Terry Clark 1
H Nehemiah 1
Matthew Jensen 1
Adir Even 1
Jay Nunamaker, 1
Rachid Chalal 1
Fons Wijnhoven 1
Jeremy Debattista 1
Sushovan De 1
Dominique Ritze 1
Heiko Paulheim 1
Dezhao Song 1
Rabia Nuray-Turan 1
Dmitri Kalashnikov 1
Yinle Zhou 1
Youwei Cheah 1
Daniel Dalip 1
Pável Calado 1
Tobias Vogel 1
Arvid Heise 1
Uwe Draisbach 1
Olivier Curé 1
Claire Collins 1
Ioannis Anagnostopoulos 1
Patricia Franklin 1
Huan Liu 1
Willem Van Hage 1
Gilbert Peterson 1
Hongwei Zhu 1
Peter Aiken 1
Len Seligman 1
Robert Ulbricht 1
Martin Hahmann 1
Michael Zack 1
Nitin Joglekar 1
Mikhail Atallah 1
Yanjuan Yang 1
Paul Bowen 1
Ulf Leser 1
Irit Gelman 1
Dennis Wei 1
Ion Todoran 1

Affiliation Paper Counts
University of Padua 1
University of Illinois at Urbana-Champaign 1
Federal University of Amazonas 1
Florida State University 1
Virginia Commonwealth University 1
University of Amsterdam 1
Vanderbilt University 1
Instituto Superior Tecnico 1
University of Houston 1
Google Inc. 1
University of Leipzig 1
Hospital Universitario Austral 1
Harvard University 1
University of Colorado at Denver 1
Oklahoma City University 1
University of Rhode Island 1
State University of New York at Albany 1
Georgia State University 1
University of Antwerp 1
University of Texas at Austin 1
Oregon State University 1
Beihang University 1
University of Massachusetts System 1
Indian Institute of Science 1
Elsevier 1
University of Augsburg 1
Vienna University of Technology 1
University of South Carolina 1
Simon Fraser University 1
Memorial University of Newfoundland 1
Boston University 1
Technical University of Munich 1
Butler University 1
University of Maryland 1
Italian National Research Council 1
New Jersey Institute of Technology 1
National Institute of Standards and Technology 1
Cardiff University 1
Sam Houston State University 1
University College Cork 1
Microsoft Corporation 1
Ben-Gurion University of the Negev 1
Charleston Southern University 1
Commonwealth Scientific and Industrial Research Organization 1
Rutgers, The State University of New Jersey 1
University of Cambridge 1
University of Patras 1
Hellenic Open University 1
University of Baghdad 1
Universite Paris-Est 1
Lehigh University 2
USDA ARS Beltsville Agricultural Research Center 2
Humboldt University of Berlin 2
Fraunhofer Institute for Applied Information Technology 2
Nanyang Technological University 2
Old Dominion University 2
Suffolk University 2
Free University of Bozen-Bolzano 2
University of Innsbruck 2
University of Arizona 2
Norwegian University of Science and Technology 2
University of Waterloo 2
University of Kentucky 2
University of Trento 2
RWTH Aachen University 2
University of Toronto 2
University of Surrey 2
Indiana University 2
New York University 2
Massachusetts Institute of Technology 2
University of Massachusetts Boston 2
University of Bologna 2
University of Hamburg 2
Federal University of Minas Gerais 2
University of Oklahoma 2
University of Queensland 2
University of Aizu 2
McMaster University 2
Universidad de Navarra 2
Indian Institute of Management Calcutta 2
Vienna University of Economics and Business Administration 3
University of Massachusetts Medical School 3
University of Mannheim 3
University of California, Irvine 3
University of Bonn 3
Birkbeck University of London 3
Purdue University 3
Telecom Bretagne 3
Georgia Institute of Technology 3
University of Cologne 3
Babson College 3
University of Thessaly 3
University of St. Gallen 3
Northeastern University 3
Ecole nationale superieure d'Informatique 3
University of Manchester 4
Vrije Universiteit Amsterdam 4
University of Milan - Bicocca 4
University of Florida 4
IBM Thomas J. Watson Research Center 4
Technical University of Dresden 4
University of Trieste 4
United States Air Force Institute of Technology 4
University of Twente 4
University of Ulster 4
Anna University 4
United States Department of Veterans Affairs 4
University of Edinburgh 4
University of Illinois at Chicago 4
Qatar Computing Research institute 4
Florida International University 5
University of Massachusetts Lowell 5
Marist College 5
Tsinghua University 5
Arizona State University 5
MITRE Corporation 5
Hasso-Plattner-Institut fur Softwaresystemtechnik GmbH 6
University of Aberdeen 7
University of Arkansas at Little Rock 8
Australian National University 9
 
All ACM Journals | See Full Journal Index

Search JDIQ
enter search term and/or author name