• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Introduction to Python for Data Science

2020/2021
Academic Year
RUS
Instruction in Russian
4
ECTS credits
Course type:
Elective course
When:
3 year, 2 module

Instructor

Программа дисциплины

Аннотация

The course provides students with wide general overview of Python – a general-purpose programming language that is becoming ever more popular for data science. The focus is on the application of Python specifically for data science. The course is about ways to import, store and manipulate data, and helpful data science tools to conducting data analyses. The course is intended for students with little programming background. The learning process is facilitated with DataCamp platform.
Цель освоения дисциплины

Цель освоения дисциплины

  • The main objective of the course is to provide students with the basic concepts of Python, its syntax, functions and packages to enable them to write scripts for data manipulation and analysis. The course develops skills of writing and running a code using Python. The course covers various variables types and their features, basic operators and statements, loops, as well as the main packages for data science: NumPy, Pandas, Matplotlib. At the end of the course, students should be able to write short scripts to import, prepare and analyze data.
Планируемые результаты обучения

Планируемые результаты обучения

  • Know basic data types in Python.
  • Know operators, how to clean and merge datasets.
  • Know pandas library, the main methods for DataFrames.
  • Know how to import data in Python.
  • Know how to work in Jupyter Notebook.
Содержание учебной дисциплины

Содержание учебной дисциплины

  • Introduction to Python for Data Science
    1. Introduction to Python for Data Science. An introduction to the basic concepts of Python. Learn how to use Python interactively and by using a script. Create variables and acquaint with Python's basic data types. Learn to store, access, and manipulate data in lists. Learn how to use functions, methods, and packages. NumPy is a fundamental Python package to efficiently practice data science. Learn to work with powerful tools in the NumPy array, and get started with data exploration. https://www.datacamp.com/courses/intro-to-python-for-data-science
  • Intermediate Python for Data Science
    2. Data types and operators. Building of various types of plots, and customizing them to be visually appealing and interpretable. Learn about the dictionary, an alternative to the Python list, and the pandas DataFrame. Creating and manipulating datasets, access the information from these data structures. Boolean logic is the foundation of decision-making in Python programs. Learn about different comparison operators, how to combine them with Boolean operators, and how to use the Boolean outcomes in control structures. Filtering data in pandas DataFrames using logic. While and for loops. https://www.datacamp.com/courses/intermediate-python-for-data-science 3. Cleaning and merging data. Cleaning and merging data, diagnosing issues such as outliers, missing values, and duplicate rows. https://www.datacamp.com/courses/cleaning-data-in-python
  • Python DataFrames
    4. Analysis, selection, and visualization techniques with Pandas DataFrames. https://www.datacamp.com/courses/pandas-foundations 5. Extracting and transforming DataFrames, advanced indexing, rearranging and reshaping data. https://www.datacamp.com/courses/manipulating-dataframes-with-pandas 6. Merging DataFrames with pandas https://www.datacamp.com/courses/merging-dataframes-with-pandas
  • Importing Data in Python
    7. Importing Data in Python. The ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL. https://www.datacamp.com/courses/importing-data-in-python-part-1
  • Environment for scientific programming in Python
    8. Jupiter Notebook as an environment for scientific programming in Python, its structure and features.
Элементы контроля

Элементы контроля

  • неблокирующий Self-study work
  • неблокирующий Exam
    Final student assessment is a project, that is performed in a team of no more than 2 people. Each team uses provided dataset of collets their own data, define research question and apply one or a combination of the learnt methods of data analysis with Spreadsheets. As a result of the project each team write down the report and prepare working file. The grade for the exam includes the grade for the report, grade for the working file and the grade for answering questions.
Промежуточная аттестация

Промежуточная аттестация

  • Промежуточная аттестация (2 модуль)
    0.5 * Exam + 0.5 * Self-study work
Список литературы

Список литературы

Рекомендуемая основная литература

  • Vanderplas, J. T. (2016). Python Data Science Handbook : Essential Tools for Working with Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1425081

Рекомендуемая дополнительная литература

  • Seemon Thomas. (2014). Basic Statistics. [N.p.]: Alpha Science Internation Limited. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1663598