29905 Introduction to Python for Data Analysis and Automation in Biology

2021/2022

Kursusinformation
Introduction to Python for Data Analysis and Automation in Biology
Engelsk
2,5
Ph.d., Servicekursus (faglige færdigheder)
Kurset udbydes som enkeltfag
August
Three days of interactive teaching, August 24-26, 9:00-16:00, followed by project work period (max. 6 weeks)
Campus Lyngby
18 hours interactive computer exercises, 12 hours reading prior and during the course, 40 hours individual project (application of learned skills to data set/automation task to be picked by the student and teacher together) and preparation of report.
[Kurset følger ikke DTUs normale skemastruktur]
Aftales med underviser, Aftales med underviser, 6 weeks after practical course concluded.
Bedømmelse af opgave(r)/rapport(er)
The students will apply the learned skills to a project of their own choosing. To pass they will have to hand in a report (in the form of a git repository that contains code and data in the form of Jupyter notebooks, scripts etc.) that demonstrates the application of relevant tools and methods covered in the course no later than 6 weeks after the practical course. Students will be assisted in their project work.
bestået/ikke bestået , intern bedømmelse
27827
Minimum 5 Maksimum: 20
Kai Kristof Blin , Lyngby Campus, Bygning 220 , kblin@biosustain.dtu.dk
29 DTU Biosustain
27 Institut for Bioteknologi og Biomedicin
I studieplanlæggeren
Overordnede kursusmål
Get students to adopt Python in their research.
Læringsmål
En studerende, der fuldt ud har opfyldt kursets mål, vil kunne:
  • To use the Unix shell for working with files and directories, pipes and filters, loops, shell scripts, and searching.
  • Use Python for data analysis and task automation, including the import of libraries, reading and plotting of data, selection and filtering of data, writing of conditional statements and functions, and debugging.
  • Utilize basic version control of data and programming code with Git.
  • Adopt a modern development and reporting environment for Python in the form of Jupyter notebooks.
  • Clean, filter, transform and summarize tabular data with Pandas
  • Visualize data using the Python plotting libraries matplotlib and altair
  • Apply scikit-learn for basic Machine Learning such as classification, regression, clustering, PCA etc.
  • Apply biopython for basic DNA sequence handling
  • Simulate and plan of experiments involving the creation of recombinant DNA using pydna
  • Perform basic image processing using scikit-image
Kursusindhold
With data generation and genetic engineering becoming evermore easy in biology, life scientists and bioengineers are increasingly facing challenges in processing and analyzing data and automating experimental workflows in their line of work. For example, simple tasks (such as designing primers) can become a huge drain on scientists’ time as they repetitively copy and paste information into web interfaces instead of running batch operations.
Furthermore, qualifications demanded of biotechnologists in the industry are shifting away from pipetting towards the analysis of data and automation of workflows. Therefore, it is essential that life science and biotechnology PhD students are trained in the computational tools needed for data analysis and task/lab automation.

This PhD course aims to get programming novices (little to no experience) off the ground with adopting Python (instead of Excel and Word) in their daily work. In contrast to many existing Python courses targeting computer scientists and software engineers, this course is specifically tailored towards Biotechnology. It focuses primarily on Python as a tool for data analysis and automation, deemphasizing parts that are relevant to software development only. Furthermore, participants are provided with knowledge about data analytics and relevant machine learning methods, including best practice approaches, troubleshooting and avoiding common pitfalls.

This course is based on the Software and Data Carpentry curricula (https://carpentries.org) and style of teaching (live coding, hands-on exercise etc.). Since 1998, Software Carpentry has been teaching basic lab skills for research computing to scientists and engineers and course materials have continuously been adapted and tailored to their problems and needs. The course materials for this course have been tailored extensively by us towards life science and biotech related problems that can be solved with Python and specifically target life science and biotech PhD students.

The course is 100% interactive and relies on the proven approach of teachers conveying the knowledge through live coding while the participants follow along (supported by teaching assistants). Furthermore, live coding is frequently interrupted by hands-on exercises in which the participants develop programming solutions to appropriate tasks on their own (with the help of the teachers and teaching assistants).

This course will provide you with theoretical and practical knowledge about:

* Obtain a working knowledge of Python basics and fundamentals relevant to data analysis and automation.
* Adopt a modern development and reporting environment for Python in the form of Jupyter notebooks.
* Obtain a good overview of key Python libraries covering Bioinformatics/Sequence analysis (Biopython, pydna), data analysis and statistics (Pandas), machine learning (scikit-learn), and image processing (scikit-image).
Litteraturhenvisninger
https:/​/​software-carpentry.org/​lessons/​
https:/​/​datacarpentry.org/​lessons/​
Sidst opdateret
10. august, 2022