Data Science Pathways
There are multiple pathways to earning your master’s degree in Data Science. Among the numerous options, two of the most popular pathways are data scientist and data engineer. Note: The Pathways only serve as a guide to support your career planning. Students can also take courses across the University in areas such as Geospatial Analysis (GIS), Applied Economics, Statistics, Biostatistics, Computer Science and more.
Data Scientist Pathway
 DATS 6001

Algorithm Design for Data Science
This course covers Algorithm Design. Unlike the ones offered in most CS departments, this course is particularly tailored for nonCS major students. Specifically, we will only focus on (the theory and implementation of) the most important problems in algorithm design. The main goal of this course is to teach students to write code that is bugfree and has the lowest time complexity (i.e., uses the minimum time) and space complexity (uses the minimum space). In this course we will cover Data Structures (Array, Stack, Queue, and Tree) and Algorithms (Search, Sort, and Dynamic Programming).
Prerequisites: This course will use Python exclusively. It is assumed that students have used Python previously hence we will not discuss the syntax of the language in class.
 DATS 6101

Introduction to Data Science (Required)
Called the “Sexiest Job Title of the 21st Century” by the Harvard Business Review, Data Science and analytics are a booming industry. But what is a Data Scientist, what do they do, and how do you become one? These questions and more will be discussed and answered in this introductory course. This class covers the basic ideas and techniques of data science, including its definition and the context in datadriven computation and practical applications.
Prerequisites: None.
 DATS 6102

Data Warehousing (Required)
The emergence of big data storage needs has driven adoption and development of a new class of nonrelational databases commonly referred to as NoSQL databases. This course will explore the origins and the characteristics of NoSQL that distinguishes them from the traditional relational database management systems (RDMS). We will take a closer look at one database from each of the following NoSQL data models (Keyvalue, document and graph). We will also cover Hadoop infrastructure and the HDFS storage system.
Prerequisites: None.
 DATS 6103

Introduction to Data Mining (Required)
This course is an introductory course on data mining. It introduces the basic concepts, principles, methods, implementation techniques, and applications of data mining, with a focus on Python and data mining algorithms for the Data Science Program. The objective of the course is to give students an overview of data mining techniques and skills to explore, analyze, and leverage data. Due to the diversity of subjects that comprise this emerging field, the class will necessarily have more breadth than depth. At the beginning of the course we will cover python to perform preprocessing and data wrangling then in the next half of the course we will cover ’core’ data mining topics, such as regression and classification techniques. Students will use Python to complete the homework, assignments and projects through the course.
Prerequisites: None.
 DATS 6202

Machine Learning I: Algorithm Analysis (Required)
In this course we will discuss the idea, practice and math of popular Machine Learning methods. While we will dive deep into the math behind some shallow and deep models, the real focus of this course is to teach students how to use popular Machine Learning tools to solve realword problems. In this course we will use Jupyter Notebook for coding and Google Collaboratory for running the code.
Prerequisites: DATS 6101 and DATS 6103.
 DATS 6203

Machine Learning II
The main focus of this course will be the implementation of deep learning techniques on GPUs. Three key deep learning architectures will be covered. Multilayer Perceptrons, Convolution Networks and Long Short Term Memory are the main three deep network architectures. Some time will be spent on the background of each network, but the primary focus will be on implementation. In addition to discussing the three network architectures, the course will concentrate on three of the most popular deep learning frameworks: Keras, Tensorflow and Pytorch. The strategy will be to present a deep network architecture, and then describe how that network can be trained and analyzed within a particular framework. Each network will be trained in a different framework.
Prerequisites: DATS 6202.
 DATS 6312

Natural Language Processing
This course is an introduction to Natural Language Processing and its basic techniques and methods. The objective of the course is to provide students an overview of natural language processing techniques that can be used to explore, analyze, and leverage natural language data stored in text. This course covers commonly used text analysis techniques and tools. Students will use Python and various packages to complete the projects through the course.
Prerequisites: DATS 6202.
 DATS 6401

Visualization of Complex Data (Required)
Today vast amounts of raw and refined information can be supplied and accessed to support analysis and decisions. Indeed, information access and retrieval are considered less a problem and at times a burden. The most pressing need now is to be able to present the information in a manner that is usable. This requires that cogent information be provided in context. To the maximum extent possible the information must be displayed in an intuitive manner that supports not only analytical but cognitive processes (Source: Taylor Connor Associates LLC, all rights reserved). To support this burgeoning technology requirement designers and developers of information systems need to stay current not only with the technology but the unique aspects of information visualization design. This course is intended to develop an awareness of the design concepts; an understanding of the underlying technology; introduce students to some of the currently available technologies; guide students in design protocols; and examine typical applications of those technologies.
Prerequisites: DATS 6101, 6102 and DATS 6103.
 DATS 6501

Data Science Capstone (Required)
The goal of the Capstone Project is for the students to apply the knowledge acquired during the Data Science program to a project involving actual realworld problems and data in a realistic setting. During the project, students engage in the entire process of solving a realworld data science project, from establishing a problem statement and project plan to collecting and processing actual data to applying suitable and appropriate analytic methods to the problem. Course Instructor will need to approve the problem statement and project plan before students proceed to the data collection phase. The goal of the Capstone Project is to apply theoretical knowledge gained during the time at Data Science program into a realistic project that involves real datasets. During the project, students are heavily involved in the process of finding realworld data science problems and solving them. The Capstone Project begins from collecting data and processing it in order to implement the appropriate analytic methods that they learned in the program to the realworld problems. In this process, problem statements and definitions play a major role in the Capstone and the datasets can be collected from industry, government, nongovernmental organizations (NGOs), or academic research. Students will work individually on a problem statement, typically specified by a faculty or the sponsor. The sponsor will usually be responsible for supplying the relevant data set. Research groups at GWU may propose projects. A list of possible projects will be posted on Blackboard so students can familiarize themselves with problems and find their interests. By approval of the Course Instructor, students are free to find their own problem statement and use their own data set. The final problem statements and the datasets will need approval by the Course Director.
Prerequisites: Students may enroll in the Capstone course upon completion of all GWU Curriculum courses or during the same semester a student is completing the last of these requirements.
Data Engineer Pathway
 DATS 6101

Introduction to Data Science (Required)
Called the “Sexiest Job Title of the 21st Century” by the Harvard Business Review, Data Science and analytics are a booming industry. But what is a Data Scientist, what do they do, and how do you become one? These questions and more will be discussed and answered in this introductory course. This class covers the basic ideas and techniques of data science, including its definition and the context in datadriven computation and practical applications.
Prerequisites: None.
 DATS 6102

Data Warehousing (Required)
The emergence of big data storage needs has driven adoption and development of a new class of nonrelational databases commonly referred to as NoSQL databases. This course will explore the origins and the characteristics of NoSQL that distinguishes them from the traditional relational database management systems (RDMS). We will take a closer look at one database from each of the following NoSQL data models (Keyvalue, document and graph). We will also cover Hadoop infrastructure and the HDFS storage system.
Prerequisites: None.
 DATS 6103

Introduction to Data Mining (Required)
This course is an introductory course on data mining. It introduces the basic concepts, principles, methods, implementation techniques, and applications of data mining, with a focus on Python and data mining algorithms for the Data Science Program. The objective of the course is to give students an overview of data mining techniques and skills to explore, analyze, and leverage data. Due to the diversity of subjects that comprise this emerging field, the class will necessarily have more breadth than depth. At the beginning of the course we will cover python to perform preprocessing and data wrangling then in the next half of the course we will cover ’core’ data mining topics, such as regression and classification techniques. Students will use Python to complete the homework, assignments and projects through the course.
Prerequisites: None.
 DATS 6202

Machine Learning I: Algorithm Analysis (Required)
In this course we will discuss the idea, practice and math of popular Machine Learning methods. While we will dive deep into the math behind some shallow and deep models, the real focus of this course is to teach students how to use popular Machine Learning tools to solve realword problems. In this course we will use Jupyter Notebook for coding and Google Collaboratory for running the code.
Prerequisites: DATS 6101 and DATS 6103.
 DATS 6311

Bayesian Methods
This course is an introduction of Bayesian data analysis. Topics include Markov chain Monte Carlo, Hierarchical Models, Generalized Linear Models, and JAGS. Lectures will include some theory but the emphasis of the course will be on implementing these models using R and JAGS and applying the models to solve realworld problems.
Prerequisites: DATS 6101 or permission of the instructor.
 DATS 6313

Time Series Analysis & Modeling
The main focus of this course is to understand, analyze, model & predict timeseries dataset. In this course fundamental concepts of stochastic systems, estimation theory, time series analysis and model validation will be discussed. The course has several important keys as follows: Random variables, random processes and density function, conditional density, biased & unbiased estimators, time series analysis, model validation, autocorrelation function, partial autocorrelation and generalized partial autocorrelation function and implementation nonlinear optimization of algorithm. Python will be mainly used throughout the course.
Prerequisites: DATS 6101 or permission of the instructor.
 DATS 6401

Visualization of Complex Data (Required)
Today vast amounts of raw and refined information can be supplied and accessed to support analysis and decisions. Indeed, information access and retrieval are considered less a problem and at times a burden. The most pressing need now is to be able to present the information in a manner that is usable. This requires that cogent information be provided in context. To the maximum extent possible the information must
be displayed in an intuitive manner that supports not only analytical but cognitive processes (Source: Taylor Connor Associates LLC, all rights reserved). To support this burgeoning technology requirement designers and developers of information systems need to stay current not only with the technology but the unique aspects of information visualization design. This course is intended to develop an awareness of the design concepts; an understanding of the underlying technology; introduce students to some of the currently available technologies; guide students in design protocols; and examine typical applications of those technologies.
Prerequisites: DATS 6101, 6102 and DATS 6103.
 DATS 6450

Cloud Computing
Analyze the different cloud computing service and deployment models and the capabilities provided by major cloud providers. You will learn how to evaluate key cloud computing services, apply cloud computing services in data science projects, and analyze the different types of cloud computing services, deployment models, and cloud vendors.
Prerequisites: DATS 6101 or permission of the instructor.
 DATS 6450

Network Data Science
Analyze critically the use of networks in data science, synthesize new concepts in network science, apply network concepts to realworld datasets, evaluate the value added by network science thinking in Data Science.
Prerequisites: DATS 6101 or permission of the instructor.
 DATS 6501

Data Science Capstone (Required)
The goal of the Capstone Project is for the students to apply the knowledge acquired during the Data Science program to a project involving actual realworld problems and data in a realistic setting. During the project, students engage in the entire process of solving a realworld data science project, from establishing a problem statement and project plan to collecting and processing actual data to applying suitable and appropriate analytic methods to the problem. Course Instructor will need to approve the problem statement and project plan before students proceed to the data collection phase. The goal of the Capstone Project is to apply theoretical knowledge gained during the time at Data Science program into a realistic project that involves real datasets. During the project, students are heavily involved in the process of finding realworld data science problems and solving them. The Capstone Project begins from collecting data and processing it in order to implement the appropriate analytic methods that they learned in the program to the realworld problems. In this process, problem statements and definitions play a major role in the Capstone and the datasets can be collected from industry, government, nongovernmental organizations (NGOs), or academic research. Students will work individually on a problem statement, typically specified by a faculty or the sponsor. The sponsor will usually be responsible for supplying the relevant data set. Research groups at GWU may propose projects. A list of possible projects will be posted on blackboard so students can familiarize themselves with problems and find their interests. By approval of the Course Instructor, students are free to find their own problem statement and use their own data set. The final problem statements and the datasets will need approval by the Course Director.
Prerequisites: Students may enroll in the Capstone course upon completion of all GWU Curriculum courses or during the same semester a student is completing the last of these requirements.