ACM DL

ACM Journal of

Data and Information Quality (JDIQ)

Menu
Latest Articles

An Introduction to Dynamic Data Quality Challenges

The Challenge of Test Data Quality in Data Processing

From Content to Context

Research in data and information quality has made significant strides over the last 20 years. It has become a unified body of knowledge incorporating techniques, methods, and applications from a variety of disciplines including information systems, computer science, operations management, organizational behavior, psychology, and statistics. With... (more)

A Probabilistically Integrated System for Crowd-Assisted Text Labeling and Extraction

The amount of text data has been growing exponentially in recent years, giving rise to automatic information extraction methods that store text... (more)

NEWS

March, 2017 -- Call for Papers: Special issue on Reproducibility in Information Retrieval. Submission deadline: September 8, 2017 

Feb. 2017 -- Call for Papers: 
Special Issue on Improving the Veracity and Value of Big Data 
Extended Submission deadline: April  1st, 2017

Jan. 2016 -- New Book Announcement
Carlo Batini and Monica Scannapieco have a new book:

Data and Information Quality: Dimensions, Principles and Techniques 

Springer Series: Data-Centric Systems and Applications, soon available from the Springer shop

The Springer flyer is available here


Experience and Challenge papers:  JDIQ now accepts two new types of papers. Experience papers describe real-world applications, datasets and other experiences in handling poor quality data. Challenges papers briefly describe a novel problem or challenge for the IQ community. See Author Guidelines for details.

Forthcoming Articles
On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports

In the last five years there has been a flurry of work on information extraction from clinical documents, i.e., on algorithms capable of extracting, from the informal and unstructured texts that are generated during everyday clinical practice, mentions of concepts relevant to such practice. Most of this literature is about methods based on supervised learning, i.e., methods for training an information extraction system from manually annotated examples. While a lot of work has been devoted to devising learning methods that generate more and more accurate information extractors, no work has been devoted to investigating the effect of the quality of training data on the learning process. Low quality in training data often derives from the fact that the person who has annotated the data is different from the one against whose judgment the automatically annotated data must be evaluated. In this paper we test the impact of such data quality issues on the accuracy of information extraction systems as applied to the clinical domain. We do this by comparing the accuracy deriving from training data annotated by the authoritative coder (i.e., the one who has also annotated the test data, and by whose judgment we must abide), with the accuracy deriving from training data annotated by a different coder. The results indicate that, although the disagreement between the two coders (as measured on the training set) is substantial, the difference is (surprisingly enough) not always statistically significant.

Challenge Paper: Challenges to Sharing Data and Models for Life Cycle Assessment

Life Cycle Assessment is a modeling approach to address the environmental aspects and potential environmental impacts (e.g. use of resources and the environmental consequences of releases) throughout a product's life cycle from raw material acquisition through production, use, end-of-life treatment, recycling and final disposal (i.e. cradle-to-grave). The LCA community is faced with a major challenge in its capacity to produce sufficient documentation and metadata to determine representation of LCA models and to reuse them correctly. This challenge in capacity is driven by two factors: the nascent state of standardization in LCA modeling and the strong focus on research and publishing results for funded LCA work. The USDAs National Agricultural Library (NAL) is dedicated to data management, access, and preservation. Its mission enables it to focus on informatics related challenges that others may not have the expertise, capacity or funding to address. The NAL is contributing solutions to LCAs documentation challenge by implementing a synthesis of the most complete LCA formats into a balanced metadata structure. The NAL also publishes a repository of LCA research data at www.lcacommons.gov. Building capacity to develop high quality data, supported by comprehensive metadata and documentation, requires a community of LCA researchers and practitioners that are dedicated to following best practices, and an appreciation of the value realized through well described datasets. As a government organization with a mission dedicated to providing access to quality data, the NAL will continue to develop and support this community of practice.

Challenges of Open Data Quality: More Than Just License, Format, and Customer Support

As the number of open data initiatives continues to increase, there is a growing recognition within the open data community of a need to shift from focusing on data publication to also consider issues such as data coverage, openness, and quality. Here we outline challenges related to the quality of open data, including: assisting data publishers with understanding and utilising quality dimensions and assessment methods, as well as how to use the results of quality assessment; and exploring the sharing and reuse of quality metrics across datasets, tools, and publishers.

Bibliometrics

Publication Years 2009-2017
Publication Count 117
Citation Count 199
Available for Download 117
Downloads (6 weeks) 1151
Downloads (12 Months) 12687
Downloads (cumulative) 78297
Average downloads per article 669
Average citations per article 2
First Name Last Name Award
Peter Aiken ACM Senior Member (2011)
Ahmed Elmagarmid ACM Distinguished Member (2009)
Daniel S Katz ACM Senior Member (2011)
Beth A. Plale ACM Senior Member (2006)

First Name Last Name Paper Counts
Yang Lee 4
John Talburt 3
Peter Christen 3
G Shankaranarayanan 3
Stuart Madnick 3
Roger Blake 2
Christan Grant 2
Ross Gayler 2
Dinusha Vatsalan 2
Wolfgang Lehner 2
Daisyzhe Wang 2
Vassilios Verykios 2
Nan Tang 2
Wenfei Fan 2
Felix Naumann 2
Roman Lukyanenko 2
Sherali Zeadally 2
Eitel LauríA 2
Carolyn Matheus 2
Xiaobai Li 2
Ali Sunyaev 2
Arnon Rosenthal 2
Daniel Katz 1
Douglas Hodson 1
Edward Anderson 1
Dov Biran 1
Karthikeyan Ramamurthy 1
Ralf Tönjes 1
Laurent Lecornu 1
Pierpaolo Vittorini 1
Shelly Sachdeva 1
Stuart Madnick 1
Monica Tremblay 1
Debra Vandermeer 1
Foster Provost 1
Nicola Ferro 1
Christian Becker 1
Chintan Amrit 1
Sören Auer 1
Christoph Lange 1
Sharad Mehrotra 1
Sandra Sampaio 1
Dustin Lange 1
Therese Williams 1
Jianyong Wang 1
Chris Baillie 1
Peter Edwards 1
Beth Plale 1
John Krogstie 1
Banda Ramadan 1
John O’Donoghue 1
Wenjun Li 1
Davide Ceolin 1
Khoi Tran 1
Lan Cao 1
Payam Barnaghi 1
Jean Caillec 1
Arputharaj Kannan 1
Rashid Ansari 1
Shuai Ma 1
Nigel Martin 1
Axel Polleres 1
Venkata Meduri 1
Suzanne Embury 1
Hubert Österle 1
Erhard Rahm 1
Lizhu Zhou 1
Jeffrey Vaughan 1
Melanie Herschel 1
Huizhi Liang 1
Paolo Coletti 1
Anupkumar Sen 1
Mirko Cesarini 1
Hongjiang Xu 1
Vincenzo Maltese 1
Xiaoping Liu 1
Fred Morstatter 1
Paul Groth 1
Valentina Maccatrozzo 1
Mohamed Yakout 1
A Borthick 1
Kush Varshney 1
Rahul Basole 1
Jimeng Sun 1
Sara Tonelli 1
Dmitry Chornyi 1
Danilo Montesi 1
Ashfaq Khokhar 1
Alan Labouseur 1
Alexandra Poulovassilis 1
Yuheng Hu 1
Yi Chen 1
Robert Meusel 1
Maurice Van Keulen 1
Irit Askira Gelman 1
Stephen Chong 1
Edoardo Pignotti 1
Eric Medvet 1
Fabiano Tarlao 1
Omar Alonso 1
John Herbert 1
Juan Augusto 1
Maurice Mulvenna 1
Paul Mccullagh 1
Fabio Mercorio 1
Fei Chiang 1
Siddharth Sitaramachandran 1
J Jha 1
Laure Berti-Équille 1
Sven Weber 1
Johann Freytag 1
Richard Briotta 1
María Bermúdez-Edo 1
Maria Alvarez 1
Panagiotis Ipeirotis 1
Wenyuan Yu 1
Jürgen Umbrich 1
Fabian Panse 1
Fumiko Kobayashi 1
Paolo Missier 1
Kristin Weber 1
Paul Glowalla 1
Wenyuan Yu 1
Xu Pu 1
Benjamin Ngugi 1
Beverly Kahn 1
Fausto Giunchiglia 1
Christoph Quix 1
Matthias Jarke 1
Wan Fokkink 1
Jeffrey Fisher 1
Adriane Chapman 1
Jeremy Millar 1
Hilko Donker 1
Heiko Müller 1
Steven Brown 1
Terry Clark 1
H Nehemiah 1
Matthew Jensen 1
Jay Nunamaker, 1
Adir Even 1
Fons Wijnhoven 1
Jeremy Debattista 1
Sushovan De 1
Dominique Ritze 1
Heiko Paulheim 1
Dezhao Song 1
Rabia Nuray-Turan 1
Dmitri Kalashnikov 1
Yinle Zhou 1
Daniel Dalip 1
Pável Calado 1
Tobias Vogel 1
Arvid Heise 1
Uwe Draisbach 1
Youwei Cheah 1
Olivier Curé 1
Claire Collins 1
Ioannis Anagnostopoulos 1
Patricia Franklin 1
Huan Liu 1
Willem Van Hage 1
Len Seligman 1
Gilbert Peterson 1
Robert Ulbricht 1
Martin Hahmann 1
Hongwei Zhu 1
Nitin Joglekar 1
Ulf Leser 1
Irit Gelman 1
Mikhail Atallah 1
Yanjuan Yang 1
Peter Aiken 1
Michael Zack 1
Paul Bowen 1
Dennis Wei 1
Aleksandra Mojsilović 1
Ion Todoran 1
Ali Khenchaf 1
Shawn Hardenbrook 1
Trent Rosenbloom 1
Subhash Bhalla 1
D Elizabeth 1
Kaushik Dutta 1
Valerie Sessions 1
Kresimir Duretec 1
Pim Dietz 1
Eric Nelson 1
Manoranjan Dash 1
M Kaiser 1
Floris Geerts 1
Thomas Redman 1
David Becker 1
Xiaoming Fan 1
Jeffrey Parsons 1
Giannis Haralabopoulos 1
Kyle Niemeyer 1
Arfon Smith 1
Archana Nottamkandath 1
Darryl Ahner 1
Claudio Hartmann 1
Hongwei Zhu 1
Cihan Varol 1
Coşkun Bayrak 1
David Robb 1
Mark Braunstein 1
Rosella Gennari 1
Marta Zárraga-Rodríguez 1
Peter Elkin 1
C Raj 1
Matteo Magnani 1
Craig Fisher 1
Sufyan Ababneh 1
Jianing Wang 1
Sebastian Neumaier 1
Norbert Ritter 1
R Greenwood 1
Ayush Singhania 1
George Moustakides 1
Hongwei Zhu 1
Bernd Heinrich 1
Mathias Klier 1
Marcos Gonçalves 1
Bing Lv 1
Paul Mangiameli 1
Dirk Ahlers 1
Alberto Bartoli 1
Hema Meda 1
Amitava Bagchi 1
James McNaull 1
Kelly Janssens 1
Judith Gelernter 1
Mouhamadoulamine Ba 1
Ciro D'Urso 1
Hua Zheng 1
Ahmed Elmagarmid 1
Michael Mannino 1
Fiona Rohde 1
Kewei Sha 1
Elliot Fielstein 1
Theodore Speroff 1
Yang Lee 1
Judee Burgoon 1
Josh Attenberg 1
Marco Valtorta 1
Sean Goldberg 1
Andreas Rauber 1
Subbarao Kambhampati 1
Jeff Heflin 1
Alun Preece 1
Anja Klein 1
Boris Otto 1
Richard Wang 1
Alan March 1
Marco Cristo 1
Marilyn Tremaine 1
Christian Skalka 1
Andrea Lorenzo 1
Maurizio Murgia 1
Mario Mezzanzanica 1
Roberto Boselli 1
Luvai Motiwalla 1
Sandra Geisler 1

Affiliation Paper Counts
University of Padua 1
Universite Paris-Est 1
Federal University of Amazonas 1
Florida State University 1
Virginia Commonwealth University 1
Vanderbilt University 1
Instituto Superior Tecnico 1
Google Inc. 1
University of Leipzig 1
Hospital Universitario Austral 1
Harvard University 1
University of Colorado at Denver 1
Oklahoma City University 1
University of Rhode Island 1
State University of New York at Albany 1
Georgia State University 1
University of Antwerp 1
University of Texas at Austin 1
Oregon State University 1
Beihang University 1
University of Massachusetts System 1
Indian Institute of Science 1
Elsevier 1
University of Augsburg 1
Vienna University of Technology 1
University of South Carolina 1
Memorial University of Newfoundland 1
Boston University 1
Technical University of Munich 1
Butler University 1
New Jersey Institute of Technology 1
National Institute of Standards and Technology 1
Cardiff University 1
Sam Houston State University 1
University College Cork 1
Microsoft Corporation 1
Ben-Gurion University of the Negev 1
Charleston Southern University 1
Commonwealth Scientific and Industrial Research Organization 1
Rutgers, The State University of New Jersey 1
University of Patras 1
Hellenic Open University 1
University of Illinois at Urbana-Champaign 1
Lehigh University 2
Humboldt University of Berlin 2
Fraunhofer Institute for Applied Information Technology 2
Nanyang Technological University 2
Old Dominion University 2
Suffolk University 2
Free University of Bozen-Bolzano 2
University of Innsbruck 2
University of Arizona 2
Norwegian University of Science and Technology 2
University of Kentucky 2
University of Trento 2
RWTH Aachen University 2
University of Toronto 2
University of Surrey 2
Indiana University 2
New York University 2
Massachusetts Institute of Technology 2
University of Massachusetts Boston 2
University of Bologna 2
University of Hamburg 2
Federal University of Minas Gerais 2
University of Oklahoma 2
University of Queensland 2
University of Aizu 2
McMaster University 2
Universidad de Navarra 2
Indian Institute of Management Calcutta 2
Vienna University of Economics and Business Administration 3
University of Massachusetts Medical School 3
Qatar Computing Research institute 3
Northeastern University 3
University of St. Gallen 3
University of Edinburgh 3
University of Thessaly 3
Babson College 3
University of Cologne 3
Georgia Institute of Technology 3
University of Aberdeen 3
Telecom Bretagne 3
Purdue University 3
Birkbeck University of London 3
University of Bonn 3
University of California, Irvine 3
University of Mannheim 3
IBM Thomas J. Watson Research Center 4
University of Manchester 4
University of Twente 4
University of Ulster 4
Anna University 4
Vrije Universiteit Amsterdam 4
United States Department of Veterans Affairs 4
University of Illinois at Chicago 4
University of Milan - Bicocca 4
Technical University of Dresden 4
University of Trieste 4
United States Air Force Institute of Technology 4
University of Florida 4
University of Massachusetts Lowell 5
Marist College 5
MITRE Corporation 5
Tsinghua University 5
Florida International University 5
Arizona State University 5
Hasso-Plattner-Institut fur Softwaresystemtechnik GmbH 6
University of Arkansas at Little Rock 8
Australian National University 9
 
All ACM Journals | See Full Journal Index

Search JDIQ
enter search term and/or author name