IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

Predictive Analytics ID Youth Risk Factors, Improve Outcomes

Together with Fairfax County, Va., Health and Human Services, the Mason DataLab at George Mason University is building an analytics model to increase the likelihood of physically, mentally and socially healthier youth.

MetroLab Network has partnered with Government Technology to bring its readers a segment called the MetroLab Innovation of the Month Series, which highlights impactful tech, data and innovation projects underway between cities and universities. If you’d like to learn more or contact the project leads, please contact MetroLab at for more information.

In this installment of the Innovation of the Month series, we explore how the Mason DataLab at George Mason University, in partnership with Fairfax County, Va., is leveraging data analytics to identify risk and improve health in youth communities.

MetroLab’s Executive Director Ben Levine discussed the project with Dr. Nektaria Tryfona, DataLab director and executive director of Digital Innovation and Strategy at Mason; Dr. Dieter Pfoser, professor and chair of the Geography and Geoinformation Science Department; Shiyang Ruan, PhD student at Mason; Michelle Gregory from Fairfax County’s Department of Neighborhood and Community Services; and Sophia Dutton from Fairfax County’s Office of Strategy Management for Health and Human Services.

Ben Levine: Can you please describe the Data Analytics for Youth Risk and Protective Factors project? Who is involved in this effort? 

Nektaria Tryfona: DataLab from George Mason University and the Fairfax County Health and Social Services Department partnered to use data analytics to address critical social issues. 

Mason DataLab is focusing on applied data science solutions by solving complex societal, scientific and economic problems using advanced data analytics. We work with stakeholders in addressing data-driven issues, and we use the opportunity to train our students on these real-world problems and to connect them to local government and industry. 

The purpose of the Data Analytics for Youth Risk and Protective Factors project is to use data analytics to identify factors that affect healthy behaviors. This information would help enhance the findings of the annual Fairfax County Youth Survey and provide more precise data to inform investment decisions, policies and strategies to increase the likelihood of healthier youth in the local community. This project will enable Fairfax County to pilot a predictive analytics model, providing insights about current community challenges and helping assess the value of routinely incorporating predictive analytics into the county’s data analytics approach.  

Levine: Can you describe what motivated the county and university to address this particular priority?

Tryfona: Since Mason’s main campus is in Fairfax County, it was a strategic decision to partner with our local community as it is in the best interest for the area surrounding the university to prosper and to succeed. In addition to adding to the academic and private-sector fabric of the community, we seek to engage on the county's social services priorities.

Of course, proximity helped us to meet very often with Fairfax County executives and analysts, making sure we deliver solutions to meet the county’s needs. 

Michelle Gregory: Fairfax County Health and Human Services (HHS) faces the growing challenge of delivering more services with limited funds. The ability to provide detailed data for planning and decision-making to guide the allocation of limited resources is critical in a highly diverse and dynamic jurisdiction like Fairfax County. Data must be used across a range of programs and thus be in a readily usable standard format with sufficient granularity to analyze a range of social needs. Historically, time-consuming, ad hoc data research and analysis were required to provide relevant data, hindering the timely inclusion of strategic information to facilitate successful outcomes. This project would transform data into actionable predictive analytics to improve long-term outcomes.  

One of the six objectives for HHS is to have successful children and youth. The Fairfax County Youth Survey is a data source that is heavily used to assess youth behaviors and identify ways to continuously improve results. As part of the analysis, we identify youth’s assets and protective factors, and data shows that when three or more of these are present, there is a decrease in the prevalence of risky behaviors. Determining which factors have the greatest correlation with positive behaviors, and/or a decrease in risky behaviors, would enhance the ability to target the most effective strategies and investments. 

The goal is to move beyond descriptive analytics toward predictive analytics, measure the relationships between variables, create a model that can be leveraged to address other issues, and, ultimately, increase the likelihood of physically, mentally and socially healthier youth. 

Levine: What have been some of your initial findings, and is this changing how you view this project? 

Dieter Pfoser: We developed a methodology to assess and display correlations between several variables. This effort provided a proof of concept that lays the foundation for a set of replicable methodologies to identify factors that affect healthy behavior related to areas like nutrition, activities and physical health, family and friends, and behavioral health. Figure 1 summarizes the results and shows a set of indicators and how they are correlated.

The factors mentioned in the figure capture the responses to questions in the Fairfax County Youth Survey and, in this example, include assets or protective factors and factors related to risky behavior.

For example, the "Extracurricular_Regularly" factor captures the responses to the question, “How many times have you participated in school or non-school extracurricular activities (i.e, sports, student government, student newspaper, scouting, etc.)?” And "Teacher_Recognition" relates to the statement, “My teacher notices when I am doing a good job and lets me know about it.”

Both are considered assets.

The factor "Aclohol_30" records responses to the question, “On how many occasions (if any) have you had beer, wine, or hard liquor during the past 30 days?”

The correlation matrix below shows that a factor such as regular extracurricular activities shows a strong positive correlation to other factors. A student with regular extracurriculars is more likely to:

  • Have adults (other than parents) with whom to talk
  • Spend more than three hours on homework
  • Be less likely to be depressed
While some insights were expected, like having non-parental adults present, some others were more surprising, like spending more time on homework and experiencing mental health benefits.


 Figure 1: Correlation heatmap of factors captured in youth survey; the dark red color shows strong positive correlation, while the light red color shows strong negative correlation.  

Levine: How are some of the initial findings being used and implemented by the county or community? 

Sophia Dutton: Discussion about how the findings can inform future analysis and planning processes is currently underway and the correlations are being reviewed to refine the methodology.

Fairfax County government has various programs that support youth outside of the school environment. We need to make the best investment decisions as it relates to keeping them healthy. The survey data has been valuable thus far and this work can enhance knowledge and effective actions toward identifying and addressing the most impactful social variables. Long-term, this methodology could be applied to other issues and become a common approach for tackling complex, multi-faceted community challenges.   

Tryfona: These initial findings introduce a couple of key priorities. First, we have been working with highly aggregated data and we need to consider geographic and demographic aspects to provide actionable insight and inform county policymaking and service delivery. Additionally, we need to invest in ways to engage the county beyond data analysis and to ensure that findings have maximum impact. That includes working side by side with county officials, urban planners, social scientists, and strategists to identify additional insights and next steps.  

Levine: What was the most surprising thing you learned during this process?

Pfoser: We live in a highly dynamic world, and with Fairfax being part of the greater Washington, D.C., metro area, we are highly dependent on what is happening around us. Schools are a microcosm of the changes to this fabric of society, and it was very interesting to see the rapid change underway over the years. For example, students have been spending a lot of time in front of screens; however, the nature of screens has changed from TV to computers and now to phones. Overall, it will be essential for the county to have a sense of its residents’ needs in support of decision-making, and that is where data can help. 

Gregory: This is not exactly a surprise, but the study underscores the opportunity for deeper insights when the data is aligned with specific communities. This will support targeted investment strategies.  

Fairfax County is committed to identifying and addressing inequities as part of our One Fairfax policy. To support this policy, data must be used to inform decision-makers about variations in communities or subpopulations and where subsequent investment strategies should exist. While more work on the model is needed before it can be fully implemented for decision-making, it is important to remember that investment involves resources as well as dollars, which may impact programming, placement of services, targeted information campaigns, etc. The benefit is that we will have the ability to customize the response or strategy based on readily available analytics. As we consider national issues related to vaping, for example, it would be advantageous to understand if the issue is more prevalent among certain groups or communities and the factors that strongly correlate with those findings. 

While we are committed to finding solutions, there are security issues that require dedicated attention. Analytics cannot compromise anonymity. The validity of the results rests on ensuring this agreement with the students is upheld and that even the summarized results are supported by information that tells the story.

Levine: Where will this project go from here?

Tryfona: We expect to use data-driven results to help our community raise awareness about serious issues. This effort is just an example of our activities. This first project helped us to define a methodology and workflow for the collaboration with local governments. It allowed us to understand areas that can be improved by data science. Also, coming from the academic background and more specifically the engineering side, Dr. Pfoser and I had to find a common language when communicating with our Fairfax County counterparts. Yes, data is a universal language, but perspectives differ and we had to find a way to align our expectations in terms of usage, methods and outcomes across the multidisciplinary team of domain experts and data scientists. It was the best lesson we can now share with our students.  

We are now ready to fine-tune our methodology and algorithms for better results and explore opportunities in relation to other project areas and to re-apply our knowledge.

Pfoser: The county has been collecting large amounts of data and providing them to the public at We are working to come up with a set of indicators that might help us to better understand the county and its residents. While sociodemographic factors might be an obvious choice, we are looking at aspects of built infrastructure (parks and recreation facilities, road density, and even unorthodox metrics such as the fractal dimension of the road network) to characterize, for example, areas in relation to school pyramids. The fractal dimension of a road network can be used as a measure of road complexity. A higher fractal dimension indicates a more uniform distribution of roads, provides better accessibility to suburban areas and thereby incentivizes low-density developments. 

We are still in the process of analyzing various infrastructure-related measures. We are looking at how the age of a development, the area’s road density, and the fractal dimensionality of roads in that area are related. We have already seen that the age of developments certainly affects road density, and therefore population density. 

However, we are still exploring how infrastructure affects socio-demographic factors like the characteristics of the population living in a certain area and their reasons for living there. We’re also looking into how these factors change over time. We are conducting longitudinal studies to explore how the distribution of population has changed over the last decade and how this affects elements of youth health.

Gregory: We will share the findings with additional stakeholders and develop a sustainable approach for analyzing the correlations between protective factors for youth and positive outcomes to prevent issues that adversely affect overall health and well-being for future generations. 

It is valuable to consider how data related to the built environment will further enhance the learning. This type of information will be useful for this study and other work as well. Fairfax County is roughly 10 square miles. Knowing where our resources exist provides another layer of information to add to the analysis to enhance our understanding of the community, as well as the potential options for continuous improvement. 

Lauren Harrison is the managing editor for Government Technology magazine. She has a degree in English from the University of California, Berkeley, and more than 10 years’ experience in book and magazine publishing.
Special Projects
Sponsored Articles