Welcome back to our continuing series on Volantsys Analytics’s recent study on data science methodologies and improvements, in partnership with a team from George Brown College. In Part 4 of this series, we discussed some of the standard methods of communication and business collaboration within a data science team. In Part 5, we will present some of our findings from interviews with subject-matter experts within the field of data science.
Interview Methodology
Four people were interviewed for the project. The selected subjects represented a wide range of professionals working in data science and analytics teams and included employees from multinational companies, startups, and self-employed consultants. Such selection allowed us to find similarities between data science methodologies followed by various companies and to draw comparisons of work done by various professionals and companies in different stages of employing data science departments within their organization.
The conversation generally covered the areas outlined below:
- What are the main stages of a data science project you follow? Are these stages an industry standard or something you and your company developed?
- When you work on data science projects, how is your team generally structured? How does the team communicate and track the progress?
- What are the pain points that you currently see in the data science / data analytics teams while addressing the business problems? What are some of the technical issues you also run into? Are there industry standard approaches that you used to solve them?
- Could you provide examples when industry-standard methodologies have failed to address bottlenecks and you had to change the approach or remodel the process or activities?
- How in your opinion the teams that have started a data science / data analytics project can identify the clues that something will go wrong soon and mitigate the risks?
- What has been your experience communicating across cross-functional teams on data science projects? If there are challenges, how can they be resolved?
Interview Insights – Challenges
One common trend that was observed across all interviews is the communication challenges and misconceptions that exist between data science teams and business leaders they collaborate with. One example is that people tend to view data science projects, like machine learning, as being similar to web or software development. However, the final goals for software projects are often well established prior to pursuit whereas data project goals are dynamic; even if you have an idea of what your solution might look like at the end, it is not always realistic to know the full possibilities beforehand.
While businesses like to simplify discussions around data science by asking straightforward questions like “Is it possible?”, data scientists are not equipped to answer them immediately. When awareness about data science methods does not exist within organizations, tension can increase within cross-functional teams and further worsen communication.
Our experts also pointed to issues that arose when the business does not give sufficient time to data science teams to develop a project. They stressed the need for data science teams to spend a lot of time in the exploratory stage in order to develop a solid Proof of Concept (POC). Machine learning and data science often have a large number of dependencies and multiple algorithms, which therefore result in many changes of direction throughout the project. So if a team starts to build end infrastructure at the beginning of the project before thorough testing, it is likely they will have to undo most of it. And the worse the relationships are within the organization, and the less fairly the data scientists are compensated, the more likely they are to be dishonest about the state of the project.
Example Project Approaches
While there seemed to be no standard project methodology when it comes to running data science projects, some of the experts the team interviewed outlined specific steps to follow in a project, in order to standardize the approach and achieve a higher success rate.
Mathias Edman, co-founder of KAMIN AI, outlined this plan that has allowed his company to move 50% of his projects from the discovery stage into production and completion:
- Discovery stage
- Proof-of-concept (POC) – build the simplest version of the feature or solution
- Minimum viable product (MVP) – build a more robust product, based on POC
- Production stage – fully-featured version of the solution
Benedict Augustine, Director of Machine learning at Urbint, stated that the success rate of data science projects within his organization is 90% while employing an agile approach that approximately mirrors these five stages of development:
- Ideation. Check use cases to find out how to add value to the customer
- Quick feasibility test. Build a simplified model and check if it is feasible
- The transition from a business problem to a data science problem
- Model building, training, and model validation
- Final deliverable – add final features and UI elements
Interview Insights – Best Practices
Finally, we want to cover some of the best practices and tips for success that our experts shared. The interviewed professionals identified that because data science projects are often complex and demanding, completed products that generate value should always be expanded and deployed elsewhere where possible. Doing data science is costly and any company trying to implement it not only has to have a long-term plan not to waste resources, but also a plan to scale data science benefits.
Data science is also a diverse and complex subject which requires people of various backgrounds and expertise to work together. Having one data scientist within an organization will not bring any value. Depending on the size and the nature of the product, the team would consist of a data scientist, data engineer, sometimes a social engineer or a DevOps engineer. It is also advisable to have a technical project manager or a business analyst – a team member who would be responsible for communicating the project and progress to the business leaders.
If you want to have successful data science deployments, then education should not be just limited to data science teams. Abhishek Singh, Lead Machine Learning Engineer at Cardinal Health, explained how his organization launched a digital university a few years ago, offering courses for all employees to complete. Topics included business analytics, digital marketing, software engineering, data science, AI, RPA, etc. People who studied the use cases given in these courses were then are encouraged to discuss the learnings within their teams, disseminating knowledge across the organization.
That completes our blog on expert interview insights! Join us in Part 6, the final iteration of this blog as we explore some data science methodology improvement proposals!
Comments are closed