English • 

Good practices in using open source tools for reproducible science
2020-11-21, 16:00–16:30, Κύρια Αίθουσα Ομιλιών
Language: English

The Replication crisis has been characterized as a systemic and methodological problem in academia that stigmatizes research, affects funding and imposes an important hurdle to progress and innovation. In short, the replication crisis is the phenomenon where a large proportion of scientific publications, even these published in high profile journals with ground breaking conclusions cannot be simply reproduced from independent researchers. The main causes of this crisis are deeply rooted in the mindset of academics that treat research as a personal asset that should be communicated with other peers but not shared. Most of the time academics are afraid of being stolen their intellectual property, for losing citations, or for being criticized for inadequacies of their implementations. This phenomenon has been brought forward from many science advocacy groups and has resulted in policy changes in funding bodies that demand the release of data and source code as a requirement for funding research proposals. In this talk we present how although this is a good first step, it does not solve the problem. We demonstrate how researchers can "hide" their methods even by releasing the source code and how they can obscure the replication process. We also present good practices for code and data releases without compromising intellectual rights and without risking losing citations. Finally we present the modern ecosystem for open source tools for sharing code, data, images, workflows, results and hypotheses. As a test case we also present the newly introduced platform for reproducible science, OpenBio.eu .

This talk focuses on the following issues: * The Replication crisis, the problem, the financial effect and the social consequences * The academia mindset, the rational and progress so far. * Current Policies for battling the crisis, how they are enforced, how they are circumvented. * Good practices for releasing research code. Ensuring reproducibility while holding copyright and citations. * Existing open source ecosystem for publishing research. * Research Data Repositories: Figshare, Zenodo, osf.io * Assigning DOIs in open source software repositories * Automating analysis and maximizing reproducibility
* Workflow Management Systems: Nextflow, snakemake, airflow, argo
* Modelling analysis: CWL (Common Workflow Language)
* Virtualization software (singularity vs. Docker)
* Deployment software (Toil, Arvados) * Integration and Keeping the complete research process open and reproducible: openbio.eu

See also: powerpoint presentation

Alexandros Kanterakis was born on 30th July 1978 in Athens, Greece. In 1997 he enrolled in Computer Science Department (CSD) of University of Crete (UoC), Greece. During his studies he showed interest in Machine Learning, Data Mining and Pattern Recognition. He did his MSc in the bioinformatics postgraduate curriculum of CSD in collaboration with the Institute of Computer Science (ICS) of the Foundation for Research and Technology, Hellas (FORTH). During his master studies he developed algorithms for classification and clustering of gene expression microarray data. He also participated in studies involving text mining of biomedical corpora and semantic annotation of biomedical information. In 2010 he started his PhD at the genetics department of the University Medical Center Groningen (UMCG) in Netherlands. During his PhD studies he participated mainly in the imputation part of the analysis of the Genome of the Netherlands. He also conceived the idea of developing a crowdsourcing environment for programming which was developed under the name PyPedia. Since 2014 he is working as a Collaborating Researcher in the Computational BioMedicine Laboratory (CBML) of ICS/FORTH. He is mainly involved in Pharmaco-Genomics studies while he is developing Workflow Management Systems for open and reproducible science.