What REALLY is Data Science? Told by a Data Scientist

Joma Tech
22 Jun 201811:09
EducationalLearning
32 Likes 10 Comments

TLDRThe video argues that the essence of data science is not about sophisticated models or advanced AI, but rather using data to drive business impact. Data science encompasses collecting, storing, analyzing data to generate insights, metrics, experiments that guide product decisions. The most in-demand data science skills depend on company size and priorities. While media focuses on AI, companies often need more basic analytics and testing that yield results efficiently. The video calls for a pragmatic view of data science rooted in solving real business problems, not chasing technical complexity.

Takeaways
  • πŸ˜€ Data science is about using data to create business impact, not just models or visualizations
  • πŸ‘©β€πŸ’» Data science emerged to draw insights from big, unstructured data using sophisticated computing
  • πŸ“Š William Cleveland coined 'data science' to combine statistics and computer science
  • πŸ” Web 2.0 generated huge amounts of user data, enabling new data science applications
  • πŸš€ Machine learning and AI became practically feasible with abundant data after 2010
  • πŸ˜• The media focuses on advanced ML, while companies want business analytics and experimentation
  • πŸ“ˆ Data science activities align with a hierarchy of needs for companies
  • πŸ‘·β€β™‚οΈ Data scientists take on different roles based on company size and resources
  • 🎯 For startups, data scientists are end-to-end, doing everything related to data
  • πŸ”š Requested topics: AI/ML, A/B testing, experimentation
Q & A
  • What is the main purpose of data science according to the speaker?

    -The main purpose of data science is to use data to create as much impact or value as possible for a company, such as through insights, data products, or product recommendations.

  • How did the abundance of data in 2010 help spark the rise of data science?

    -The rise of big data in 2010 meant there was a massive amount of unstructured data that opened new possibilities for gaining insights. However, it also required sophisticated data infrastructure to handle the volume, necessitating fields like data science.

  • Why is there a misalignment between what data science is portrayed as in media versus what companies hire data scientists to do?

    -The media focuses heavily on advanced machine learning and AI, but most companies have simpler business problems to solve first where analytics and experimentation can drive impact before needing complex models.

  • What are some examples of low hanging fruits or simple ways data scientists create value at big tech companies?

    -Conducting exploratory analysis to find insights, building metrics to measure product success, and running A/B tests to determine which product versions perform the best.

  • How do data science responsibilities differ between startups, medium-sized companies, and large companies?

    -Startups have one data scientist doing everything, medium companies separate data engineers and scientists, and large companies have specialized roles like analytics, engineering, and research.

  • What developments facilitated the rise of web 2.0 and what was the impact?

    -Emerging platforms like MySpace, Facebook and YouTube allowed millions of users to interact, contribute, and share, generating huge amounts of data and shaping the modern internet ecosystem.

  • How did William Cleveland expand the possibilities of data mining?

    -By combining computer science and statistics to take advantage of computing power, calling this field data science, which enabled more advanced analytics.

  • What are some examples of data science applications according to the early definition?

    -Collecting, analyzing, modeling data, and most importantly applications of this analysis to solve real-world problems in areas like business, science, and society.

  • What technologies enabled the handling of big data?

    -Parallel computing technologies like MapReduce, Hadoop, and Spark allowed for distributed processing of huge datasets across clusters of commodity servers.

  • What advice does the speaker offer for learning more about data science?

    -He invites viewers to leave comments indicating what aspects of data science they want to learn about so he can create more videos or find experts on those topics to share insights.

Outlines
00:00
πŸ˜„ What is data science and the misconceptions about it

The paragraph discusses what data science actually is - using data to create impact and solve company problems, not just building models or visualizations. It highlights the misconceptions about data science perpetuated on YouTube and explains the misalignment between what's popular to discuss and what the industry needs. The origins of the term 'data science' are traced back to bring statistics and computer science together to expand data mining capabilities. The rise of web 2.0 and big data in the 2000s made data science critical to draw insights from massive datasets.

05:05
πŸ˜ƒ Evolution of data science with the growth of AI and machine learning

This paragraph discusses the evolution of data science with the rise of AI and machine learning in 2010. It became feasible to train machines with a data-driven approach. Deep learning moved from an academic concept to a critical practical application. However, media coverage has overshadowed traditional data science skills like exploratory analysis and business intelligence. Industry demand remains high for analysts vs advanced machine learning.

10:08
πŸ™‚ Real-world data science roles and activities based on company size

The paragraph examines data science roles across startups, medium and large companies. Startups have constrained resources so data scientists handle everything from infrastructure to analysis. Medium companies separate data engineering and data science; the latter focusing on models and recommendations. Large companies have specialized roles - engineers handle pipelines, analytics focus on insights, while research scientists do deep learning.

Mindmap
Keywords
πŸ’‘data science
Data science refers to the overall process of extracting insights and value from data. As the video explains, the main purpose of data science is not models or visualizations, but rather using data to solve problems and create impact for a business. Examples from the script show data science being applied to improve products and guide business strategy.
πŸ’‘analytics
Analytics refers to exploratory data analysis to uncover insights. The video contrasts this with more advanced techniques like AI, positioning analytics as a critical data science skill for practical business needs. Script examples show analytics being used to understand user behavior.
πŸ’‘experimentation
Experimentation means designing and running controlled tests to evaluate ideas. The video describes A/B testing as an experimentation technique in data science to determine which product versions perform best. This connects to the overall focus on using data science for tangible business impact.
πŸ’‘metrics
Metrics refers to quantitative measures of business or product performance. The video emphasizes the importance of metrics in data science to evaluate success and track results over time. Examples highlight using metrics to monitor progress.
πŸ’‘machine learning
Machine learning is the subset of AI focused on algorithms that can learn from data. While highlighted prominently in media, the video argues machine learning is not the top priority for most companies compared to analytics and experimentation.
πŸ’‘predictions
Though not directly stated, generating predictions is an implicit application of data science throughout the video. For example, recommendations rely on predictive modeling to forecast which items a user may want.
πŸ’‘data infrastructure
Data infrastructure includes the systems for collecting, storing, and processing data. Though less glamorous, the video underscores data engineering as crucial groundwork for advanced data science applications.
πŸ’‘statistical knowledge
Statistical knowledge refers to techniques for mathematically analyzing data. The video traces the evolution to more advanced statistical methods enabled by computing power advancements.
πŸ’‘low-hanging fruit
This metaphor signifies easy changes with big payoff. The video uses it to describe how simple analytics insights can drive major gains without fancy modeling for many companies.
πŸ’‘hierarchical
Hierarchical here means ranked priorities. The video introduces a hierarchy of data science needs, with basic business analytics being more fundamental than advanced AI modeling for practical impact.
Highlights

Data science is about using data to create impact for your company

Impact can be in the form of insights, data products or product recommendations

As a data scientist, your job is to solve real company problems using data

There's a huge misalignment between what's popular to talk about and what's needed in industry

Rise of big data sparked the rise of data science to support business needs

Being a good data scientist is about how much impact you can have

Companies give data scientists the most ambiguous problems to solve

Data engineering is important but less covered in media than AI

For startups, one data scientist may have to do everything

Medium companies can separate data engineers and data scientists

Large companies have specialized roles like analytics and research science

Definition of data science varies depending on the company

Let me know what you want to learn more about regarding data science

Like and subscribe if you found this video helpful

Thanks for watching, peace

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: