![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Two-thirds of data practitioners publicly share their data analysis or machine learning applications, according to The New Stack’s analysis of Kaggle’s latest annual survey of machine learning and data science.
Of those collaborating publicly, 76% said they do so using GitHub. Despite its critics, the platform continues to be one of the most critical parts of the tech stack for developers and non-developers building data and artificial intelligence-enabled applications.
In 2021, over 25,000 people took the survey. Since many of the participants were using the Google-owned Kaggle platform to learn how to become data scientists, The New Stack’s analysis only looked at the 17,182 respondents that reported being employed.
Of the 840 machine learning engineers in the study, 61% said they use GitHub for sharing, the highest percentage of any profession in the report to do so. While only 40 developer relations/advocates took part in the study, it is noteworthy that only 45% said they use GitHub to share their applications or analysis.
👁 Where do you publicly share your data analysis or machine learning applications?
Data scientists, software developers and data analysts represented the largest portion of the study’s participants. Here are a few more takeaways from the study:
Collaboration is also taking place in and between notebooks, which have taken on a life of their own as integrated development environments (IDEs). Just like most developers, the average data practitioner uses more than one IDE, but some flavor of a Juypter or JuypterLab is most common, with Visual Studio Code placing second. Yet, many types of hosted notebooks are struggling to catch on in a crowded field:
We are still in the early days of data-enabled applications. Most data analysts are not interested in software licensing or which code repository they use. They want to go where the data is and where people are most likely to be sharing their models. According to Meltano, a company spun off by GitLab itself, that’s GitHub.
I could provide a huge list of low-code platforms, DataOps pipeline integrations, collaboration tools, and next-generation Airtables, many with strong followings. But few, if any of them are truly close to mass adoption. Some have reached viability as niche products, in niche industries, but only variations of Juypter notebooks and GitHub seem to be familiar enough to non-technical audiences, data pros and developers to become a breakthrough hit.
What do you think? How can the modern data stack break out of the pattern without stifling collaboration? You can reach out here.