Key roles and responsibilities of a Data Scientist at BCG X¶
Background¶
BCG X is transforming businesses using data science to help companies generate competitive advantage. To do this, we typically follow a 5-step methodology:
Business understanding
&problem framing
: what is the context of this problem and why are they trying to solve it?Exploratory data analysis
&data cleaning
: what data are we working with, what does it look like and how can we make it better?Feature engineering
: can we enrich this dataset using our own expertise or third party information?Modeling
andevaluation
: can we use this dataset to accurately make predictions? If so, are they reliable?Insights
&Recommendations
: how we can communicate the value of these predictions by explaining them in a way that matters to the business?
The tasks in this program will be focused on using different parts of this methodology at different times, so you’ll get a taste of the overall process.
It’s a really exciting time to be working with BCG X as more clients are needing data to drive key decisions. So, let’s check out what case you’ll be working on!
The brief from PowerCo¶
The Associate Director (AD) of the Data Science team held a team meeting to discuss the client brief. You’ll be working closely with Estelle Altazin, a senior data scientist on your team.
Here are the key takeaways from the meeting:
- Your client is PowerCo - a major gas and electricity utility that supplies to small and medium sized enterprises
- The energy market has had a lot of change in recent years and there are more options than ever for customers to choose from
- PowerCo are concerned about their customers leaving for better offers from other energy providers
- When a customer leaves to use another service provider, this is called churn
- This is becoming a big issue for PowerCo and they have engaged BCG to help diagnose the reason why their customers are churning
During the meeting your AD discussed some potential reasons for this churn, one being how “sensitive” the price is.
- In other words, how much is price a factor in a customer’s choice to stay with or leave PowerCo?
- So, now it’s time for you to investigate this hypothesis
Your task - we need to understand PowerCo’s problem in detail¶
First things first, you and Estelle need to understand the problem that PowerCo is facing at a deeper level and plan how you’ll tackle it. If you recall the 5 steps in the Data Science methodology, this is called “business understanding & problem framing”.
Your AD wants you and Estelle to email him by COB today outlining:
- the data that we’ll need from the client, and
- the techniques we’ll use to investigate the issue.
Use the text field below to write your email, here’s what you’ll need to include:
You must formulate PowerCo’s issue as a problem using the 5 step data science process and lay out the major steps needed to test it.
- What do you think are the key reasons for a customer deciding to stay with or switch energy providers? For example: price, is it clean energy, customer service, location etc.
- What data do you think would be useful in order to investigate these key reasons? E.g. customer purchasing trends over past 5 years, location of business etc.
- If you were to get this data, how could you analyse or visualize it to test whether these reasons may have an impact on churn?
Email¶
Hi [AD],
In order to test the hypothesis of whether churn is driven by the customers’ price sensitivity, we would need to model churn probabilities of customers, and derive the effect of prices on churn rates.
We would need the following data to be able to build the models.
- Customer data - which should include characteristics of each client, for example, industry, historical electricity consumption, date joined as customer etc.
- Churn data - which should indicate if customer has churned
- Historical price data – which should indicate the prices the client charges to each customer for both electricity and gas at granular time intervals
Once we have the data, the work plan would be:
- We need to define what price sensitivity is and calculate it
- We need to prepare the data and engineer features
- Then, we can test our hypothesis using a binary classification model (e.g. Logistic Regression, Random Forest, Gradient Boosted Machines to name a few)
- We would choose a model from one of the tested algorithms based on the model complexity, the explainability, and the accuracy of the models.
- With the trained model, we would be able to extrapolate the extent to which price sensitivity influences churn
Regards, [Your name]