Computational Research Integrity Conference
CRI-CONF 2021

March 23-25, 2021

Center for Strategic and International Studies, Washington, DC

Online

Enter virtual conference

About the Computational Research Integrity Conference (CRI-CONF)

This conference will bring researchers from Biomedical Sciences and Ethics as well as Computer Scientists, AI researchers, and statisticians to discuss how research integrity investigations can be made faster, more accurate, and systematic with the use of computational methods

Any of the following topics will be discussed in this conference:

  • Computational or best-practices methods for detecting fabrication, falsification, or plagiarism of text, images, statistics, or other research outcomes
  • The role of research integrity offices at the institutional (e.g., ARIO member) and funders' levels (e.g., ORI)
  • The role of publishers and whistleblowers' websites (e.g., PubPeer)
  • Ethical dimensions of automating research integrity
  • Case studies
  • Any other research broadly related to non-computational or computational research integrity

Where

Online

When

Tuesday to Thursday
23-25 March, 2021

Invited Speakers

(in alphabetical order)

Daniel Acuna

Daniel Acuna (organizer)

Syracuse University

Boris Barbour

Boris Barbour

The PubPeer Foundation

Thorsten Beck

Thorsten Beck

HEADT Centre - Humboldt University of Berlin

Elisabeth Bik

Elisabeth Bik

Harbers-Bik LLC

Jennifer Byrne

Jennifer Byrne

The University of Sydney

Edward Delp

Edward J. Delp

Purdue University

Ivan Oransky

Ivan Oransky

Retraction Watch

Lauran Qualkenbush

Lauran Qualkenbush

Northwestern University

Corinna Raimondo

Corinna Raimondo

Northwestern University

Walter Scheirer

Walter Scheirer

University of Notre Dame

Debora Weber-Wulff

Debora Weber-Wulff

HTW Berlin - University of Applied Sciences

Panelists

(in alphabetical order)

IJsbrand Jan Aalbersberg (Scopus), Wenda Bauchspies (NSF), Erica Boxheimer (EMBO Press), Paul Brookes (University of Rochester), Jana Christopher (FEBS Press; Image-Integrity), James Heathers (Cipher Skin), Renee Hoch (PLOS), Wanda Jones (ORI), Stephanie Lee (Buzzfeed News), Benyamin Margolis (ORI), Bernd Pulverer (EMBO), Maria Kowalczuk (Springer Nature), Amit K. Roy-Chowdhury (UC Riverside), William C. Trenkle (USDA), Richard Van Noorden (Nature), Wouter Vandevelde (KU Leuven), Mary Walsh (Harvard University)

Event Schedule

The current event schedule is subject to changes

All times in New York, US time.

Welcome & remarks

Ranjini Ambalavanar

Speaker Ranjini Ambalavanar, ORI

Examining Questioned Data: Technology as a Detection Tool
Research misconduct in the federal regulation (42 C.F.R. Part 93) means fabrication, falsification, or plagiarism (FFP) in proposing, performing, or reviewing research, or in reporting research results (§ 93.103). This session will discuss the types of questioned data and tools that currently are used to identify and confirm FFP with special emphasis on the need for additional tools and automation. Examples of different types of falsified/fabricated (FF) data from closed misconduct cases at the Office of Research Integrity (ORI) and forensic tools used by ORI to detect and confirm intentional FF will be presented.
Jennifer Byrne

Speaker Jennifer Byrne, The University of Sydney

Computational research integrity and cancer research: building tools and narratives to improve the health of the research literature
Computational research integrity and cancer research are young and old research fields, respectively, and yet they have much in common. Cancer research involves the discovery of biological features that reliably distinguish cancer cells from normal cells. These features are targeted by drug developers to create cancer therapies that are tested by researchers and then applied by clinicians to patients. Similarly, computational research integrity involves the identification of publication features that reliably depart from established norms or standards. These features inform the creation of automated tools that are then tested and applied by researchers and publishers to manuscripts and papers. Based upon our experience of applying the semi-automated tool Seek & Blastn to the molecular cancer research literature, we will describe how the skill to employ automated literature screening tools needs to be matched by the will to apply these tools and then act upon their results. Beyond developing the skills to apply automated literature screening tools within different user groups, we propose that achieving the necessary willingness to tackle pervasive research integrity problems will require the development of positive narratives that speak to shared aspirations and values.

Break

Lauran Qualkenbush; Corinna Raimondo

Speakers Lauran Qualkenbush & Corinna Raimondo, Northwestern University

An inside look at real challenges in research misconduct investigations
Institutions face many challenges in managing complex research misconduct cases. From the increased public nature of allegations to the rapidly evolving technical landscape, research integrity officers have to develop expertise across many areas and identify tools to ensure the integrity of these complicated investigations. We will discuss common issues in institutional research misconduct proceedings and the growing need for technical solutions, for example record sequestration, forensic image and data analysis, and other variables that complicate institutional reviews. We will shine a light from the inside out, on how institutional investigations are more than spotting problem images and the critical work done to protect the integrity of the research record.

Social activity: breakout rooms

Contributed presentations

Ghazal Mazaheri, Kevin Urrutia Avila and Amit K. Roy-Chowdhury,

"Learning to Identify Image Duplications in Scientific Publications"

Kyle Siler, Philippe Vincent-Lamarre, Cassidy R. Sugimoto and Vincent Larivière,

"The Lacuna Database: Empirical Data to Identify Obscure, Unconventional, Questionable and/or Predatory Journals"

Edward J. Delp

Speaker Edward J. Delp, Purdue University

A System for Forensic Analysis of Scientific Images
In this talk I will describe a system that we are developing for the forensic analysis of images and other media extracted from a scientific publication. This system uses many modern media forensic methods to examine images and determine if the image has been likely altered or modified. The tools that are available include duplication detection, copy/move detection, provenance analysis and media forensics tools. The current system has methods for extracting images, figures, and captions and maintaining the relative relationships of the figures in a paper. The system has a simple and intuitive web-based user interface, a sophisticated database, and is easily extensible using Docker containers.

              

Panel Institutional investigators

  • Wanda Jones, ORI (panel chair)
  • William C. Trenkle, USDA
  • Wouter Vandevelde, KU Leuven
  • Mary Walsh, Harvard University

Break

Panel Publishers

  • Bernd Pulverer, EMBO (panel chair)
  • Renee Hoch, PLOS
  • IJsbrand Jan Aalbersberg, Scopus
  • Maria Kowalczuk, Springer Nature

Social event: breakout rooms

Remarks

Elisabeth Bik

Speaker Elisabeth Bik, Harbers-Bik LLC

Image duplication detection tools — insights from a human spotter
Despite peer-review and editorial screening, science papers can still contain images or other data of concern. A visual scan of 20,000 papers published in 40 biomedical journals showed that 4% contained inappropriately duplicated images. Papers containing incorrect or even falsified data could lead to wasted time and money spent by other researchers trying to reproduce those results. Thorough image screening before publication would be beneficial for editors, publishers, and readers, and act as a deterrent for fraudulent submissions. There is a great need for high-throughput computational tools to find image duplications and manipulations in scientific manuscripts, and to help detect the growing number of fabricated manuscripts produced by paper mills. Elisabeth Bik will present some case examples, insights, and challenges that she has encountered as a human visual duplication detector.
Debora Weber-Wulff

Speaker Debora Weber-Wulff, HTW Berlin - University of Applied Sciences

Responsible Use of Support Tools for Plagiarism Detection
Many academic institutions are of the opinion that they can simply solve the problem of plagiarism by purchasing the use of so-called plagiarism detection software. But as a recent test of such support tools shows, the systems don't find all plagiarism and will report text overlap that is not plagiarism as if it were. Institutions that rely only on some similarity measure for determining sanctions need to be aware of how meaningless the numbers these systems report are.
In this talk the results of the recent test of support tools for detecting plagiarism conducted by the European Network of Academic Integrity will be presented, followed by a discussion of what constitutes the responsible use of such tools.

Break

Michael Lauer

Speaker Michael Lauer, NIH

Roles and Responsibilities for Promoting Research Integrity
Dr. Lauer, the NIH Deputy Director for Extramural Research, will discuss one funding agency’s perspectives on the roles and responsibilities of different stakeholders in addressing varying types of research and professional misconduct.
Matt Turek

Speaker Matt Turek, DARPA

Challenges and Approaches to Media Integrity
Advances in machine learning technologies have led to the rapid proliferation of automated and semi-automated media (image, video, text, audio) manipulation and synthesis technologies. These technologies make it easier for unskilled individuals to create compelling media manipulations and may reduce our trust in many forms of information. DARPA has made significant investments in the development of media forensics tools that can help defend against such falsified media. This talk will discuss trends in media falsification technologies and the work that has been developed by DARPA programs to defend against falsified media.

Contributed presentations

Zubair Afzal, Marleen Sta, Marialaura Martinico, Daniel Gregory, Lekhraj Sharma, Ramsundhar Baskaravelu and George Tsatsaronis,

"Improving reproducibility by automating key resource tables"

Colby Vorland, David Allison and Andrew Brown,

"Semi-automated Screening for Improbable Randomization in PDFs"

Panel Funders

  • Benyamin Margolis, ORI (panel chair)
  • Wenda Bauchspies, NSF
  • Michael Lauer, NIH
  • Matt Turek, DARPA

Competition launch

Break

Panel Tool developers

  • Daniel Acuna, Syracuse University
  • Jennifer Byrne, The University of Sydney
  • James Heathers, CSO of Cipher Skin. Denver CO
  • Amit K. Roy-Chowdhury, UC Riverside

Break

Closing remarks

Remarks

Boris Barbour

Speaker Boris Barbour, The PubPeer Foundation

PubPeer, past, present and future
The PubPeer website, dedicated to facilitating public discussion of the scientific literature, launched in late 2012. Using open metadata, it creates a dynamic page for every publication with a Digital Object Identifier (DOI) or arXiv ID. Users can post and read comments about each publication. This short-circuits the many potentially conflicted actors stifling the correction of science: authors, journals and institutions. By offering strong anonymity, regulated by strict content guidelines and moderation, PubPeer has created a protected space where even serious criticism could be aired without risk of professional or legal reprisals. The site has helped reveal an unsuspected volume of research misconduct. Users can be alerted to discussion about articles as they work by installing the PubPeer browser and Zotero plugins (which you should all do). Taking stock after 8 years of operation, we stand by our sometimes-controversial decision to allow regulated anonymous commenting, which we believe has reinforced what we see as the primary function of PubPeer - to serve as an early-warning system for readers and users of articles. Official procedures for correcting science remain too slow, unreliable and inefficient; this was especially apparent during the pandemic. We aim to continue integrating sources of high-quality discussion, with recent innovations including comments from preprint servers, scientific Twitter and a sister site for overlay journals called Peeriodicals. Journals and institutions can receive tailored alerts via "PubPeer Dashboards". Our guiding philosophy for the future is to accelerate scientific progress by facilitating rapid, effective and public exchange of discussion about publications, while eschewing all metrics and focusing on the substance of scientific publications.
Walter Scheirer

Speaker Walter Scheirer, University of Notre Dame

Understanding the Provenance of Visual Disinformation Targeting Science
The COVID-19 pandemic has attracted significant attention to scientific matters related to the cause, treatment, and prevention of the disease that has upended our lives. Alarmingly, not all of the information available on the Internet is what it appears to be. Deceptive memes, bogus ads, and fabricated infographics are proliferating, with all threatening to undermine the public's trust in science. Given the vast scale of the problem, an automated capability that can identify new instances of visual disinformation, trace its origin, and ultimately flag it as being problematic is needed. But compared to text, visual content presents unique challenges for media forensics. This talk presents an end-to-end processing pipeline for image provenance analysis, which works at real-world scale. It employs a cutting-edge image filtering solution that is able to find related images, as well as novel techniques for obtaining a provenance graph that expresses how the images, as nodes, are ancestrally connected. Building from provenance analysis, the talk goes on to introduce a scalable automated visual recognition pipeline for discovering meme genres of diverse appearance. This pipeline can ingest meme images from a social network, apply computer vision-based techniques to extract features and index new images into a database, and then organize the memes into related genres. Recent examples of visual disinformation targeting science will be highlighted, including repurposed imagery, parasitic advertising, and pandemic-related memes. Finally, the talk will conclude with thoughts on continued research in this direction.

              

Break

Mario Biagioli

Speaker Mario Biagioli, UCLA

Ignorance or mimicry? Lessons from the merchants of doubt
Agnotology – a new field dedicated to the study of ignorance -- has provided crucial insights into the corporate staging and management of scientific controversies on topics of great public importance ranging from the dangers of tobacco smoking to global warming, and others in between. (The book and eponymous film Merchants of Doubts or the work of Robert Proctor are examples). A shared goal of these corporate interventions, the agnotologists have argued, is the questioning of scientific evidence and claims so as to create the appearance that these issues are not matters of remarkably solid consensus but, to the contrary, the focus of substantial controversies among scientists themselves. The effect of such misinformation campaigns has been to splinter or confuse the public’s response, dangerously delaying urgent regulatory interventions. But while the agnotologists’ goal has been to study the production of non-knowledge or the destruction of knowledge -- from the feeding of false beliefs to the public to actively creating gaps in the availability of knowledge by making it secret -- their most original and interesting contribution has been the thick description of the techniques, strategies, and rhetoric of the production of doubt about science and medicine – something that is exceptionally effective precisely because it is neither truth nor falsehood, neither knowledge nor ignorance. Its power is its elusiveness. Adding new examples to those exposed by the agnotologists, I argue that the production of doubt is not the production of ignorance but an essentially different regime that, while coming to life in the traditional form of scientific controversies, it does not aim at producing or manipulating the content of knowledge but at strategically subverting the norms of science as a profession. The production of doubt is neither the negative mirror image of the production of positive knowledge, nor simply a scientific Potemkin village where politics is camouflaged as science. Given the extraordinarily high stakes involved, it may be comforting to believe that, while highly damaging to the credibility of science, the systematic production of doubt is also completely external to it. It is politics, not science. I share the agnotologists’ sentiments and commitments, and believe that science and the corporate production of doubt are driven by radically different goals. Sadly, however, they are not easily separable in a formal, conceptual sense. The strategic production of doubt is a very specific form of parasitism that cannot exist outside of its host. Its effectiveness derives only from mobilizing the authority of science against itself.
Thorsten Beck

Speaker Thorsten Beck, HEADT Centre - Humboldt University of Berlin

Image Manipulation Detection — From Visual Inspection to Technology Driven Procedures?
This presentation asks whether and how technical tools can help facilitate the inspection of images and what steps need to be taken to support tool development. We will discuss some of the factors that speak for and against the use of technology and have a look at the image integrity database (IIDB), which we have build in Berlin to provide training data for algorithm development.

Contributed presentations

Yury Kashnitsky, Vaishnavi Kandala, Egbert Wezenbeek Van, Ijsbrand Jan Aalbersberg, Catriona Fennell and Georgios Tsatsaronis,

"How near-duplicate detection improves editors' and authors' publishing experience"

Ivan Oransky

Speaker Ivan Oransky, Retraction Watch

From Cancer to COVID-19, Does Science Self-Correct?
Rapid publication of results — particularly on preprint servers — has grown dramatically during the COVID-19 pandemic, and has forced researchers, health care professionals, journalists, and others to grapple with the concept of reliable and actionable information. The pandemic has given rise to more than 80 retractions at the time of this writing. Is that cause for concern? My lens for this talk will be ten years of experience reporting on retractions for Retraction Watch, including creating the world’s most comprehensive database of retractions, with close to 24,000 and counting.

              

Panel Journalists

  • Stephanie Lee, Buzzfeed News
  • Ivan Oransky, Retraction Watch
  • Richard Van Noorden, Nature
  • Daniel Acuna, Syracuse University (moderator only)

Break

Panel Investigators/whistleblowers

  • Paul Brookes, University of Rochester (panel chair)
  • Boris Barbour, The PubPeer Foundation
  • Elisabeth Bik, Harbers-Bik LLC
  • Erica Boxheimer, EMBO Press
  • Jana Christopher, FEBS Press; Image-Integrity

Break

Open discussion and next steps

Sponsors

Main sponsor

This conference is funded by the Office of Research Integrity, Department of Health and Human Services, under grant ORIIR190047

Participate

Registration

The registration fee is $40 and includes access to all sessions.

Eligibility: The registration is open for researchers, funders, research integrity investigators, and senior leadership who have broad interests in research integrity.

The organizers reserve the right to review registrations for eligibility and decline and refund the registration of those who do not meet the eligibility criteria.

Financial support and registration waivers: We have registration waivers available. This will be awarded based on need. Please email computationalresearchintegrity@gmail.com

Media Partners

Retraction Watch
Center for Open Science
Journal of Empirical Research on Human Research Ethics
Science and Engineering Ethics Journal
Center for Computational and Data Science
Syracuse University School of Information Studies