by Petr Paščenko, Head of Data Science at Profinit
In the last few years, I have seen several attempts to build a good data science/machine learning team both at Profinit as well as with our clients. Over time, I have observed a number of notorious anti-patterns, which tend to be very common in the field. Here are the three most noteworthy examples of such pitfalls:
1. The ‘Let’s clone myself’ mindset
Being the first Data Scientist in your company means that you set standards for almost everything. What is data science about? How should you work? What is the required skill set to do the job?
Are you strong in math and stats? Surely all your followers must be too, otherwise, they are not worthy. Are you a tech geek? How can you possibly collaborate with someone not knowing what ssh is? Is your background in business? After all the geeky talk about neural whatever networks, surely the Excel is where most of the actual work gets done.
If you are in a position of pioneering data science in a company and trying to build a team from scratch, you will often encounter a big bias in everything you do: yourself.
Sanity check no. 1: What is the origin of most of your colleagues? If it is your own alma mater, you have a problem.
2. Lone wolf cave
Data Science is now, where software development was 30 years ago. Most experts are solitaires living on the edge between genius and autism. Happy to do all the work ‘til late night on their laptops with zero or minimal interaction with the rest of the universe. This attitude is highly efficient for small PoCs but completely useless to any serious production development. The spirit of collaboration, sharing, and adoption of the best practices known from the software development world must be systematically cultivated and actively enforced by you. No exceptions.
Sanity check no. 2: What is the bus factor of your team? If it is 1, you have a problem.
3. It’s still science, you know
No matter how smoothly your team crunches all the tasks you throw at it, the painful truth is that Data Science is still a science. In practical terms, this means that a large proportion of the job is done by the good old trial-fail principle. As a result, you and your team are quite dependent on good ideas, rather than experience and perfect craftsmanship. Good ideas are rarely spoon-fed to you in books or online courses. They are the product of a special kind of mind – the creative one.
Every Data science team needs at least one such mind. Ideally more than one. They tend to be a rare species. If you have found one, acknowledge and treat them well, however peculiar or eccentric they might be.
Sanity check no. 3: What is the source of all of your team’s best ideas? If the answer is you, you have a problem.