
If, as science fiction writer Arthur C. Clarke posited, โany sufficiently advanced technology is indistinguishable from magic,โ then it sometimes feels like data science is viewed as pulling rabbits out of hats. This narrative is, of course, misguided. Itโs the perception of what I call โdata magic,โ where people believe data can be pumped into one end of the โdata-science machineโ and the perfect widget (the solution to everyoneโs problems) will emerge on the other side.
To a considerable degree, this is because the vast majority of people donโt understand the workings of data science, and when you get into the more advanced areas, such as deep learning, even many data scientists will acknowledge they donโt understand the many levels of complexity. But if youโre a data scientist, you (mostly) know enough to know when youโre out of your depth. Outside the data science community, however, thatโs not always the case. And thatโs understandable.
In part, itโs our own fault. Itโs a tough club; historically difficult for โoutsidersโ to penetrate, much less understand. The languages of data science have been held close to the chest. Just as the Roman Catholic Church selected Ecclesiastical Latin as the core communication language to control messaging, particularly through the Middle Ages and into the early-Modern period, we data scientists could be accused of similar actionsโthough obviously not on the same scale nor having such a direct impact on entire populations. But, just as the Reformations of the 16th century led to unshackling language, making the tenets of the various Christian churches more available to the masses, data science must now further extend its vernacular.
A shared language would allow us to move beyond believing that data scientists have mystical capabilities to solve any problem, by running data through an AI environment to produce the desired results, as if by magic. It would help people understand that data science isnโt a magical panacea.
In fact, if you truly want advanced data science, one of the worst things to do is assign a data scientist to solve isolated or ad hoc problems, as this will silo communication by keeping data science in the back room.
Rather, the best way to proliferate data science is to expose enterprise-level problems, understanding that, if done right, data science is a team sport. Having multi-disciplinary teams dedicated to products or customers yields superior business results and develops cross-functional understanding. A cohort including a commercial associate, a product manager, an engineer, a data scientist and representatives from other key functional organizations should be locked in an uninterrupted room for a meeting of the minds focused on the biggest needs and opportunities. This is where the true magic happens.
Still, a traveler on this journey should be cognizant of the warning signs. If, in working with clients or other third parties in the spirit of collaboration, you begin searching for solutions explainable to absolutely everyone, take pause. Just as a magician does not limit their performance to elementary tricks, the audience understands that data scientists should not default to easily explainable solutions. Deliberately watering down the process may have the collateral effect of delivering a less-than-optimal solution for the problem. Itโs a balance.
That balance relies on companies trusting in the capabilities of their data scientists. Trusting data scientists will share our language as much as we can, but not dilute solutions when things turn too technical, and that weโll always remain true to our discipline This is the kind of trust and balance enabling technologically advanced companies to get at their respective truth sets.
This article was originally published on Medium.



