Student Projects

Intersecting colorful lines in a chart representing different data science programs in the United States. Chart includes the categories Computer Science, Databases, Data Mining, Statistics, Topic and Visualization.

Section of a chart from the Overview of Data Science Programs project

Data Science Program students and faculty frequently collaborate on in-depth analyses of data releases. One project, for example, compared data science programs among the U.S. News and World Report's Top 100 national universities. Our study found that the GW Data Science Program holds more than 10 percent of the nation’s total data science student enrollment — the second-highest in the country. Results also placed GW’s curriculum among the most diverse, with courses spanning business, math, physics and statistics.

Learn more about recent projects from Data Science Program students.

We began our overview by compiling a list of universities with Data Science Masters programs. To get a representative sample, we checked each university in US News and World Report's Top-100 National Universities. Of these Top-100 universities, we found that 27 offered Data Science Masters programs. Note we do not include universities unless the program is explicitly labeled as "data science." This means Data Analytics, Business Analytics, and Business Intelligence programs are not considered in our overview. After compiling a list of universities with Data Science Masters programs, we went to each university's website and gathered program enrollment statistics. These statistics allowed us to estimate each university in the Top 100's share of the Data Science enrollment. In the graphic below, the size of the circles correspond to the proportion of the total estimated Data Science enrollment in the United States.
In addition to looking at the geographic distribution and enrollment statistics of Data Science programs in the US, we also researched the curriculum design of these programs. We gathered each school's curriculum information from its corresponding website and categorized all courses into different groups/topic areas based on the course descriptions. When we originally split up the courses, we created a hierarchy of topic areas which included three main areas (fundamental theory/Intro,  Intermediate, and Special Topics) and numerous subcategories per topic area. Since it would be distracting to visually display this granularity, we further simplified the course classifications to nine total categories (math, special topics, statistics, computer science, database engineering/management, data mining, machine learning, visualization, and other). The dashboard below shows what proportion of data science course curriculum offerings the specified class categories make up of the total. Note that within each grouping, the elements are sorted from high-to-low and there is a filter off to the right that allows the user to select a subset of the categories. The pie chart shows the summed proportions that each broad category grouping makes up of the total whereas the size of the word in the the word cloud shows what percent of the course offering a specific class makes up of the total.


 

In the next chart, you can see the relationship between classes and college curriculum offerings. On the left side, we have listed all of the course topic areas and these get mapped to their higher-level class type in the middle. Then, these high-level class categories are mapped to corresponding schools on the right side. The width of the bars is proportional to the percent of the total a school's curriculum makes up of a specific topic area. This allows us to see how diverse a school's course offering is and whether or not they offer several courses from a given category. To get an even better understanding of what departments offer data science courses and the overall composition of data science courses, we created another dashboard. The heat map below shows the percentage of data science courses a given department at a given university offers. A more concise summary can be seen in the pie chart whereas the tree chart details each specific course offered. This dashboard shows contribution of departments to Data Science program in each university.

 

 

Note: Scroll over the space above to show the scroll bar and view the full graphic.

We began our research on the data science industry by gathering some general information on the current labor market. We estimated working the age population in each state by compiling the most recent census statistics on the population over 18 years old. [1. "Table 4c - Reported Voting and Registration by Age, for States: November 2016." United States Census Bureau. May 2017. accessed on August 4, 2017. https://www.census.gov/data/tables/time-series/demo/voting-and-registrat... .] Using these population estimates in conjunction with Indeed estimates on the number of full time data scientist jobs, we calculated the number of data science openings per working age individual.[2. The job data was extracted from Indeed on September 26, 2017.] The dashboard below breaks these numbers down by state, job type, and salary.

After a holistic overview of the data science job market, we decided to scrape all available job data from Indeed to get a better understanding of the most in-demand skills. We decided to focus only on the top 10 cities for data science jobs and jobs where the exact position title was "Data Scientist." Then, based on the basic and preferred qualifications for each job, we list all the class types you would need to take in order to learn those skills. In the chart below you can see that about 38 percent of all data science jobs in Washington, D.C. require knowledge of machine learning.

On the other hand, we see that about 56 percent of all data science jobs in the New York area require knowledge of machine learning. Moreover, it appears that backgrounds in optimization theory and statistics are both about 8 percent more common requirements in New York than Washington, DC. While a first glance seems to indicate that data science jobs in New York require a stronger mathematical background than in Washington, DC, the remainder of the skill requirements look similar across the board.