ACM Journal of

Data and Information Quality (JDIQ)

Latest Articles

The Challenge of Quality in Social Computation


Hunspell is a morphological spell checker and automatic corrector for Macintosh 10.6 and later versions. Aspell is a general spell checker and automatic corrector for the GNU operating system. In this experience article, we present a benchmarking study of the performance of Hunspell and Aspell. Ginger is a general grammatical spell checker that is... (more)


In the big data era, data integration is becoming increasingly important. It is usually handled by data flows processes that extract, transform, and clean data from several sources, and populate the data integration system (DIS). Designing data flows is facing several challenges. In this article, we deal with data quality issues such as (1)... (more)

An Exploratory Case Study to Understand Primary Care Users and Their Data Quality Tradeoffs

Primary care data is an important part of the evolving healthcare ecosystem. Generally, users in primary care are expected to provide excellent... (more)

Dependable Data Repairing with Fixing Rules

One of the main challenges that data-cleaning systems face is to automatically identify and repair data errors in a dependable manner. Though data dependencies (also known as integrity constraints) have been widely studied to capture errors in data, automated and dependable data repairing on these errors has remained a notoriously difficult... (more)


March, 2017 -- Call for Papers: Special issue on Reproducibility in Information Retrieval. Submission deadline: September 8, 2017 

Feb. 2017 -- Call for Papers: 
Special Issue on Improving the Veracity and Value of Big Data 
Extended Submission deadline: April  1st, 2017

Jan. 2016 -- New Book Announcement
Carlo Batini and Monica Scannapieco have a new book:

Data and Information Quality: Dimensions, Principles and Techniques 

Springer Series: Data-Centric Systems and Applications, soon available from the Springer shop

The Springer flyer is available here

Experience and Challenge papers:  JDIQ now accepts two new types of papers. Experience papers describe real-world applications, datasets and other experiences in handling poor quality data. Challenges papers briefly describe a novel problem or challenge for the IQ community. See Author Guidelines for details.

Forthcoming Articles
Cluster-based Quality-Aware Adaptive Data Compression for Streaming Data

Wireless sensor networks are widely applied in data collection applications. Energy efciency is one of the most important design goals. In this paper, we propose QAAC, Quality-Assured Adaptive data Compression, to reduce the amount of data communication so that to save energy. QAAC rst builds clusters from dataset using an adaptive clustering algorithm; then a code for each cluster is generated and stored in a Huffman encoding tree, which is used to encode the original dataset in an encoding algorithm with improvement approach. After the encoded data, the Huffman encoding tree and parameters used in the improvement algorithm have been received at the sink, a decompression algorithm is used to retrieve the approximation of the original dataset. The performance evaluation shows that QAAC is efcient and achieves much higher compression ratio than compared lossy and lossless compression algorithms and much less information loss than compared lossy compression algorithms.

On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports

In the last five years there has been a flurry of work on information extraction from clinical documents, i.e., on algorithms capable of extracting, from the informal and unstructured texts that are generated during everyday clinical practice, mentions of concepts relevant to such practice. Most of this literature is about methods based on supervised learning, i.e., methods for training an information extraction system from manually annotated examples. While a lot of work has been devoted to devising learning methods that generate more and more accurate information extractors, no work has been devoted to investigating the effect of the quality of training data on the learning process. Low quality in training data often derives from the fact that the person who has annotated the data is different from the one against whose judgment the automatically annotated data must be evaluated. In this paper we test the impact of such data quality issues on the accuracy of information extraction systems as applied to the clinical domain. We do this by comparing the accuracy deriving from training data annotated by the authoritative coder (i.e., the one who has also annotated the test data, and by whose judgment we must abide), with the accuracy deriving from training data annotated by a different coder. The results indicate that, although the disagreement between the two coders (as measured on the training set) is substantial, the difference is (surprisingly enough) not always statistically significant.

Challenge Paper: Challenges to Sharing Data and Models for Life Cycle Assessment

Life Cycle Assessment is a modeling approach to address the environmental aspects and potential environmental impacts (e.g. use of resources and the environmental consequences of releases) throughout a product's life cycle from raw material acquisition through production, use, end-of-life treatment, recycling and final disposal (i.e. cradle-to-grave). The LCA community is faced with a major challenge in its capacity to produce sufficient documentation and metadata to determine representation of LCA models and to reuse them correctly. This challenge in capacity is driven by two factors: the nascent state of standardization in LCA modeling and the strong focus on research and publishing results for funded LCA work. The USDAs National Agricultural Library (NAL) is dedicated to data management, access, and preservation. Its mission enables it to focus on informatics related challenges that others may not have the expertise, capacity or funding to address. The NAL is contributing solutions to LCAs documentation challenge by implementing a synthesis of the most complete LCA formats into a balanced metadata structure. The NAL also publishes a repository of LCA research data at Building capacity to develop high quality data, supported by comprehensive metadata and documentation, requires a community of LCA researchers and practitioners that are dedicated to following best practices, and an appreciation of the value realized through well described datasets. As a government organization with a mission dedicated to providing access to quality data, the NAL will continue to develop and support this community of practice.

Challenges of Open Data Quality: More Than Just License, Format, and Customer Support

As the number of open data initiatives continues to increase, there is a growing recognition within the open data community of a need to shift from focusing on data publication to also consider issues such as data coverage, openness, and quality. Here we outline challenges related to the quality of open data, including: assisting data publishers with understanding and utilising quality dimensions and assessment methods, as well as how to use the results of quality assessment; and exploring the sharing and reuse of quality metrics across datasets, tools, and publishers.


Publication Years 2009-2017
Publication Count 123
Citation Count 209
Available for Download 123
Downloads (6 weeks) 1114
Downloads (12 Months) 12408
Downloads (cumulative) 79039
Average downloads per article 643
Average citations per article 2
First Name Last Name Award
Peter Aiken ACM Senior Member (2011)
Mikhail Atallah ACM Fellows (2006)
Ahmed Elmagarmid ACM Fellows (2012)
ACM Distinguished Member (2009)
Wenfei Fan ACM Fellows (2012)
Matthias Jarke ACM Fellows (2013)
Daniel S Katz ACM Senior Member (2011)
Beth A. Plale ACM Senior Member (2006)

First Name Last Name Paper Counts
Yang Lee 4
Peter Christen 3
John Talburt 3
G Shankaranarayanan 3
Stuart Madnick 3
Nan Tang 3
Ross Gayler 2
Dinusha Vatsalan 2
Wolfgang Lehner 2
Daisyzhe Wang 2
Ali Sunyaev 2
Vassilios Verykios 2
Wenfei Fan 2
Peter Edwards 2
Felix Naumann 2
Roman Lukyanenko 2
Roger Blake 2
Arnon Rosenthal 2
Sherali Zeadally 2
Eitel LauríA 2
Carolyn Matheus 2
Xiaobai Li 2
Christan Grant 2
John Krogstie 1
Banda Ramadan 1
John O’Donoghue 1
Wenjun Li 1
Davide Ceolin 1
Khoi Tran 1
Lan Cao 1
Jean Caillec 1
Payam Barnaghi 1
Arputharaj Kannan 1
Anupkumar Sen 1
Rashid Ansari 1
Fahima Nader 1
Philip Woodall 1
Shuai Ma 1
Axel Polleres 1
Nigel Martin 1
Venkata Meduri 1
Suzanne Embury 1
Hubert Österle 1
Erhard Rahm 1
Lizhu Zhou 1
Jeffrey Vaughan 1
Melanie Herschel 1
Huizhi Liang 1
Paolo Coletti 1
Mirko Cesarini 1
Hongjiang Xu 1
Vincenzo Maltese 1
Xiaoping Liu 1
Paul Groth 1
Valentina Maccatrozzo 1
Fred Morstatter 1
A Borthick 1
Mohamed Yakout 1
Kush Varshney 1
Rahul Basole 1
Jimeng Sun 1
Sara Tonelli 1
Dmitry Chornyi 1
Danilo Montesi 1
Omar Alonso 1
Ashfaq Khokhar 1
Alan Labouseur 1
Alexandra Poulovassilis 1
Yuheng Hu 1
Yi Chen 1
Robert Meusel 1
Maurice Van Keulen 1
Irit Askira Gelman 1
Stephen Chong 1
Edoardo Pignotti 1
Fabiano Tarlao 1
Eric Medvet 1
John Herbert 1
Paul Mccullagh 1
Juan Augusto 1
Maurice Mulvenna 1
Fabio Mercorio 1
Fei Chiang 1
Siddharth Sitaramachandran 1
Laure Berti-Équille 1
J Jha 1
Sven Weber 1
Richard Briotta 1
Johann Freytag 1
María Bermúdez-Edo 1
Maria Alvarez 1
Panagiotis Ipeirotis 1
Milan Markovic 1
Wenyuan Yu 1
Jürgen Umbrich 1
Fabian Panse 1
Fumiko Kobayashi 1
Paolo Missier 1
Kristin Weber 1
Paul Glowalla 1
Wenyuan Yu 1
Xu Pu 1
Benjamin Ngugi 1
Beverly Kahn 1
Fausto Giunchiglia 1
Wan Fokkink 1
Jeffrey Fisher 1
Christoph Quix 1
Matthias Jarke 1
Jeremy Millar 1
Hilko Donker 1
Adriane Chapman 1
Heiko Müller 1
Terry Clark 1
H Nehemiah 1
Steven Brown 1
Matthew Jensen 1
Jay Nunamaker, 1
Rachid Chalal 1
Fons Wijnhoven 1
Jeremy Debattista 1
Sushovan De 1
Dominique Ritze 1
Heiko Paulheim 1
Rabia Nuray-Turan 1
Dezhao Song 1
Dmitri Kalashnikov 1
Yinle Zhou 1
Daniel Dalip 1
Pável Calado 1
Tobias Vogel 1
Arvid Heise 1
Uwe Draisbach 1
Youwei Cheah 1
Olivier Curé 1
Claire Collins 1
Ioannis Anagnostopoulos 1
Patricia Franklin 1
Willem Van Hage 1
Huan Liu 1
Gilbert Peterson 1
Robert Ulbricht 1
Martin Hahmann 1
Hongwei Zhu 1
Peter Aiken 1
Len Seligman 1
Michael Zack 1
Nitin Joglekar 1
Yanjuan Yang 1
Mikhail Atallah 1
Paul Bowen 1
Ulf Leser 1
Dennis Wei 1
Aleksandra Mojsilović 1
Irit Gelman 1
Ion Todoran 1
Ali Khenchaf 1
D Elizabeth 1
Trent Rosenbloom 1
Shawn Hardenbrook 1
Subhash Bhalla 1
Adir Even 1
Valerie Sessions 1
Kresimir Duretec 1
Kaushik Dutta 1
Jeffrey Parsons 1
Leena Al-Hussaini 1
Pim Dietz 1
Eric Nelson 1
Manoranjan Dash 1
M Kaiser 1
Floris Geerts 1
Thomas Redman 1
David Becker 1
Xiaoming Fan 1
Giannis Haralabopoulos 1
Archana Nottamkandath 1
Kyle Niemeyer 1
Arfon Smith 1
Darryl Ahner 1
Claudio Hartmann 1
Hongwei Zhu 1
Cihan Varol 1
Coşkun Bayrak 1
David Robb 1
Mark Braunstein 1
Rosella Gennari 1
Marta Zárraga-Rodríguez 1
Peter Elkin 1
C Raj 1
Amitava Bagchi 1
Hema Meda 1
Matteo Magnani 1
Craig Fisher 1
Sufyan Ababneh 1
Jiannan Wang 1
Sebastian Neumaier 1
Jianing Wang 1
Norbert Ritter 1
R Greenwood 1
Ayush Singhania 1
George Moustakides 1
Bernd Heinrich 1
Mathias Klier 1
Marcos Gonçalves 1
Hongwei Zhu 1
Bing Lv 1
Paul Mangiameli 1
Dirk Ahlers 1
Alberto Bartoli 1
James McNaull 1
Kelly Janssens 1
Mouhamadoulamine Ba 1
Judith Gelernter 1
Ciro D'Urso 1
Hua Zheng 1
Michael Mannino 1
Ahmed Elmagarmid 1
Fiona Rohde 1
Kewei Sha 1
Elliot Fielstein 1
Theodore Speroff 1
Yang Lee 1
Josh Attenberg 1
Marco Valtorta 1
Andreas Rauber 1
Judee Burgoon 1
Sean Goldberg 1
Sabrina Abdellaoui 1
Subbarao Kambhampati 1
Jeff Heflin 1
Alun Preece 1
Anja Klein 1
Boris Otto 1
Alan March 1
Marco Cristo 1
Richard Wang 1
Marilyn Tremaine 1
Christian Skalka 1
Andrea Lorenzo 1
Maurizio Murgia 1
Mario Mezzanzanica 1
Roberto Boselli 1
Luvai Motiwalla 1
Daniel Katz 1
Sandra Geisler 1
Douglas Hodson 1
Dov Biran 1
Edward Anderson 1
Karthikeyan Ramamurthy 1
Pierpaolo Vittorini 1
Ralf Tönjes 1
Laurent Lecornu 1
Shelly Sachdeva 1
Stuart Madnick 1
Foster Provost 1
Nicola Ferro 1
Christian Becker 1
Monica Tremblay 1
Debra Vandermeer 1
Chintan Amrit 1
Sören Auer 1
Christoph Lange 1
Sharad Mehrotra 1
Sandra Sampaio 1
Dustin Lange 1
Therese Williams 1
Jianyong Wang 1
Chris Baillie 1
Beth Plale 1

Affiliation Paper Counts
University of Padua 1
University of Illinois at Urbana-Champaign 1
Federal University of Amazonas 1
Florida State University 1
Virginia Commonwealth University 1
Vanderbilt University 1
Instituto Superior Tecnico 1
Google Inc. 1
University of Leipzig 1
Hospital Universitario Austral 1
Harvard University 1
University of Colorado at Denver 1
Oklahoma City University 1
University of Rhode Island 1
State University of New York at Albany 1
Georgia State University 1
University of Antwerp 1
University of Texas at Austin 1
Oregon State University 1
Beihang University 1
University of Massachusetts System 1
Indian Institute of Science 1
Elsevier 1
University of Augsburg 1
Vienna University of Technology 1
University of South Carolina 1
Simon Fraser University 1
Memorial University of Newfoundland 1
Boston University 1
Technical University of Munich 1
Butler University 1
New Jersey Institute of Technology 1
National Institute of Standards and Technology 1
Cardiff University 1
Sam Houston State University 1
University College Cork 1
Microsoft Corporation 1
Ben-Gurion University of the Negev 1
Charleston Southern University 1
Commonwealth Scientific and Industrial Research Organization 1
Rutgers, The State University of New Jersey 1
University of Cambridge 1
University of Patras 1
Hellenic Open University 1
Universite Paris-Est 1
Lehigh University 2
Humboldt University of Berlin 2
Fraunhofer Institute for Applied Information Technology 2
Nanyang Technological University 2
Old Dominion University 2
Suffolk University 2
Free University of Bozen-Bolzano 2
University of Innsbruck 2
University of Arizona 2
Norwegian University of Science and Technology 2
University of Kentucky 2
University of Trento 2
RWTH Aachen University 2
University of Toronto 2
University of Surrey 2
Indiana University 2
New York University 2
Massachusetts Institute of Technology 2
University of Massachusetts Boston 2
University of Bologna 2
University of Hamburg 2
Federal University of Minas Gerais 2
University of Oklahoma 2
University of Queensland 2
University of Aizu 2
McMaster University 2
Universidad de Navarra 2
Indian Institute of Management Calcutta 2
Vienna University of Economics and Business Administration 3
University of Massachusetts Medical School 3
University of Mannheim 3
University of California, Irvine 3
University of Bonn 3
Birkbeck University of London 3
Purdue University 3
Telecom Bretagne 3
Georgia Institute of Technology 3
University of Cologne 3
Babson College 3
University of Thessaly 3
University of St. Gallen 3
Northeastern University 3
Ecole nationale superieure d'Informatique 3
University of Manchester 4
Vrije Universiteit Amsterdam 4
University of Milan - Bicocca 4
University of Florida 4
IBM Thomas J. Watson Research Center 4
Technical University of Dresden 4
University of Trieste 4
United States Air Force Institute of Technology 4
University of Twente 4
University of Ulster 4
Anna University 4
United States Department of Veterans Affairs 4
University of Edinburgh 4
University of Illinois at Chicago 4
Qatar Computing Research institute 4
Florida International University 5
University of Massachusetts Lowell 5
Marist College 5
Arizona State University 5
MITRE Corporation 5
University of Aberdeen 5
Tsinghua University 5
Hasso-Plattner-Institut fur Softwaresystemtechnik GmbH 6
University of Arkansas at Little Rock 8
Australian National University 9

Journal of Data and Information Quality (JDIQ) - Challenge Papers, Experience Paper and Research Papers

Volume 8 Issue 3-4, July 2017 Challenge Papers, Experience Paper and Research Papers
Volume 8 Issue 2, February 2017 Challenge Papers and Research Papers

Volume 8 Issue 1, November 2016 Special Issue on Web Data Quality
Volume 7 Issue 4, October 2016 Challenge Papers and Regular Papers
Volume 7 Issue 3, September 2016 Research Paper, Challenge Papers and Experience Paper
Volume 7 Issue 1-2, June 2016 Challenge Papers, Regular Papers and Experience Paper

Volume 6 Issue 4, October 2015 Challenge Papers and Regular Papers
Volume 6 Issue 2-3, July 2015
Volume 6 Issue 1, March 2015
Volume 5 Issue 4, February 2015
Volume 5 Issue 3, February 2015 Special Issue on Provenance, Data and Information Quality

Volume 5 Issue 1-2, August 2014
Volume 4 Issue 4, May 2014

Volume 4 Issue 3, May 2013
Volume 4 Issue 2, March 2013 Special Issue on Entity Resolution

Volume 4 Issue 1, October 2012
Volume 3 Issue 4, September 2012
Volume 3 Issue 3, August 2012
Volume 3 Issue 2, May 2012
Volume 3 Issue 1, April 2012
Volume 2 Issue 4, February 2012

Volume 2 Issue 3, December 2011
Volume 2 Issue 2, February 2011

Volume 2 Issue 1, July 2010

Volume 1 Issue 3, December 2009
Volume 1 Issue 2, September 2009
Volume 1 Issue 1, June 2009
All ACM Journals | See Full Journal Index

Search JDIQ
enter search term and/or author name