What REALLY is Data Science? Told by a Data Scientist
TLDRThe video argues that the essence of data science is not about sophisticated models or advanced AI, but rather using data to drive business impact. Data science encompasses collecting, storing, analyzing data to generate insights, metrics, experiments that guide product decisions. The most in-demand data science skills depend on company size and priorities. While media focuses on AI, companies often need more basic analytics and testing that yield results efficiently. The video calls for a pragmatic view of data science rooted in solving real business problems, not chasing technical complexity.
Takeaways
- π Data science is about using data to create business impact, not just models or visualizations
- π©βπ» Data science emerged to draw insights from big, unstructured data using sophisticated computing
- π William Cleveland coined 'data science' to combine statistics and computer science
- π Web 2.0 generated huge amounts of user data, enabling new data science applications
- π Machine learning and AI became practically feasible with abundant data after 2010
- π The media focuses on advanced ML, while companies want business analytics and experimentation
- π Data science activities align with a hierarchy of needs for companies
- π·ββοΈ Data scientists take on different roles based on company size and resources
- π― For startups, data scientists are end-to-end, doing everything related to data
- π Requested topics: AI/ML, A/B testing, experimentation
Q & A
What is the main purpose of data science according to the speaker?
-The main purpose of data science is to use data to create as much impact or value as possible for a company, such as through insights, data products, or product recommendations.
How did the abundance of data in 2010 help spark the rise of data science?
-The rise of big data in 2010 meant there was a massive amount of unstructured data that opened new possibilities for gaining insights. However, it also required sophisticated data infrastructure to handle the volume, necessitating fields like data science.
Why is there a misalignment between what data science is portrayed as in media versus what companies hire data scientists to do?
-The media focuses heavily on advanced machine learning and AI, but most companies have simpler business problems to solve first where analytics and experimentation can drive impact before needing complex models.
What are some examples of low hanging fruits or simple ways data scientists create value at big tech companies?
-Conducting exploratory analysis to find insights, building metrics to measure product success, and running A/B tests to determine which product versions perform the best.
How do data science responsibilities differ between startups, medium-sized companies, and large companies?
-Startups have one data scientist doing everything, medium companies separate data engineers and scientists, and large companies have specialized roles like analytics, engineering, and research.
What developments facilitated the rise of web 2.0 and what was the impact?
-Emerging platforms like MySpace, Facebook and YouTube allowed millions of users to interact, contribute, and share, generating huge amounts of data and shaping the modern internet ecosystem.
How did William Cleveland expand the possibilities of data mining?
-By combining computer science and statistics to take advantage of computing power, calling this field data science, which enabled more advanced analytics.
What are some examples of data science applications according to the early definition?
-Collecting, analyzing, modeling data, and most importantly applications of this analysis to solve real-world problems in areas like business, science, and society.
What technologies enabled the handling of big data?
-Parallel computing technologies like MapReduce, Hadoop, and Spark allowed for distributed processing of huge datasets across clusters of commodity servers.
What advice does the speaker offer for learning more about data science?
-He invites viewers to leave comments indicating what aspects of data science they want to learn about so he can create more videos or find experts on those topics to share insights.
Outlines
π What is data science and the misconceptions about it
The paragraph discusses what data science actually is - using data to create impact and solve company problems, not just building models or visualizations. It highlights the misconceptions about data science perpetuated on YouTube and explains the misalignment between what's popular to discuss and what the industry needs. The origins of the term 'data science' are traced back to bring statistics and computer science together to expand data mining capabilities. The rise of web 2.0 and big data in the 2000s made data science critical to draw insights from massive datasets.
π Evolution of data science with the growth of AI and machine learning
This paragraph discusses the evolution of data science with the rise of AI and machine learning in 2010. It became feasible to train machines with a data-driven approach. Deep learning moved from an academic concept to a critical practical application. However, media coverage has overshadowed traditional data science skills like exploratory analysis and business intelligence. Industry demand remains high for analysts vs advanced machine learning.
π Real-world data science roles and activities based on company size
The paragraph examines data science roles across startups, medium and large companies. Startups have constrained resources so data scientists handle everything from infrastructure to analysis. Medium companies separate data engineering and data science; the latter focusing on models and recommendations. Large companies have specialized roles - engineers handle pipelines, analytics focus on insights, while research scientists do deep learning.
Mindmap
Keywords
π‘data science
π‘analytics
π‘experimentation
π‘metrics
π‘machine learning
π‘predictions
π‘data infrastructure
π‘statistical knowledge
π‘low-hanging fruit
π‘hierarchical
Highlights
Data science is about using data to create impact for your company
Impact can be in the form of insights, data products or product recommendations
As a data scientist, your job is to solve real company problems using data
There's a huge misalignment between what's popular to talk about and what's needed in industry
Rise of big data sparked the rise of data science to support business needs
Being a good data scientist is about how much impact you can have
Companies give data scientists the most ambiguous problems to solve
Data engineering is important but less covered in media than AI
For startups, one data scientist may have to do everything
Medium companies can separate data engineers and data scientists
Large companies have specialized roles like analytics and research science
Definition of data science varies depending on the company
Let me know what you want to learn more about regarding data science
Like and subscribe if you found this video helpful
Thanks for watching, peace
Transcripts
5.0 / 5 (0 votes)
Thanks for rating: