From the course: DJ Patil: Ask Me Anything
What should be in a data scientist's toolbox?
From the course: DJ Patil: Ask Me Anything
What should be in a data scientist's toolbox?
(soft techno music) - [Interviewer] Alright DJ, tell us about the Data Scientist's Toolbox. And I don't mean just hardware, software, but also soft skills that every data scientist should have to do a good job. - Well, the first thing that you need in your toolbox is curiosity, deep, profound curiosity. An exploration of how to think about the data. The second thing in the toolbox is a team. As a data scientist, if you're working alone and isolated, it's incredibly tough. Not only is it lonely, but you can't get a different perspective on the data. And so, if you don't have those, all the other tools that you might have aren't going to do anything for you. So you got to start with that. Then once you have that is a question of how do you actually are able to get the data, move it around, access it, clean it and then process it to start looking at something. So what do you need there? Well, it depends on the type of problem. Some data comes in at a high frequency and so then you need a technology like Kafka or something else to look it, maybe Spark or one of these other type of streaming processors. But other data sets may come in on large, periodic intervals like annual basis or maybe decadal basis, and so like the census. So it depends on the problem type. Then you need to be able to clean it, and still, one of the areas that needs massive investment still because we're still in the early days. There are technologies from companies like Trifacta or the Data Wrangler Project and these other type things where we're seeing really great innovation, but it's still not sufficient. Collaboration, got to be able to collaborate with that data, and there's different platforms for that, but it's still also tough. In code, you use GitHub or some other type of similar technology and that allows an unbelievable ability to actually collaborate. We don't have that still yet in data science. It's getting better. There's Jupyter Notebooks and other type things, but it's still early. And then there's a question of the presentation layer. And presentation layer is, do you showcase this in a visualization suite like one of the classic technologies that people are using these days? Could be MATLAB, it could be some open source technology, it could be a Tableau, it could be, there's all these things out there. But it also depends on your environment. Some environments you can use open source, some you can't, some you can use cloud, some you can't. I think those things are going to become easier. Some of the stuff that I'm most bullish on are the open source toolkits 'cause they're just so good right now and many of the companies that support them offer customized versions of that to allow them to be even more effective. But the number one thing that I would tell everyone about that is that, don't think that you have to stay with one tool. You can have many tools to approach a problem and there may be different attempts to try something in one way or another. (soft techno music)
Contents
-
-
-
What were you like as a kid?3m 17s
-
How did your parents influence you?1m 55s
-
How did you navigate college?4m 7s
-
What are some fond memories from grad school?2m 45s
-
How can we foster learning for everyone?5m 9s
-
What's the importance of learning liberal arts?2m 30s
-
What advice do you have for job seekers?3m 23s
-
How did data science come about?4m 8s
-
What does it take to be a data scientist?4m 1s
-
Why is apprenticeship important?3m 41s
-
How can a data scientist influence policy?2m 20s
-
How can I prepare for data science in college?4m 56s
-
How can hackathons benefit me?1m 30s
-
How did you use data in grad school?2m 15s
-
How is data used in the US?3m 55s
-
How is data used worldwide?1m 38s
-
How do you expose holes in cybersecurity?3m 32s
-
How can we educate people about hacking?2m 30s
-
What are the real threats to personal data?4m 6s
-
Should we focus on media headlines?1m 39s
-
How can we educate people about data use?3m 34s
-
How can people fight for data privacy?2m 46s
-
What's the role of the data scientist in 15 years?4m 30s
-
What are you working on currently?3m 31s
-
How can we make data secure?3m 26s
-
How to serve the people with data science?1m 47s
-
What's the difference between wisdom and experience?1m 54s
-
How do you advocate for science?2m 3s
-
What is the role of AI in today's world?2m 54s
-
What's an example of ethical hacking?2m 9s
-
How do you bring data science into the workplace?2m 29s
-
What is the role of AI in human resources and recruiting?3m 3s
-
What are tools every data scientist should own?2m 44s
-
Is there a data science code of ethics?4m 6s
-
What are AI threats in the cybersecurity world?4m 38s
-
How can data scientists better inform the general public?1m 30s
-
How can people participate in data science?2m 31s
-
Why do people fear a machine revolution?2m 18s
-
How can data inform healthcare?1m 31s
-
Why should we democratize data?2m 14s
-
How are you advocating for science?3m 9s
-
Why is the march for science important?3m 42s
-
What is AI?1m 37s
-
What is an example of robust machine learning?4m 31s
-
What is AI's place in healthcare?3m 29s
-
How can AI impact clinical trials?3m 22s
-
How can a data scientist be best leveraged for business?1m 28s
-
What does a data science team need to thrive?2m 56s
-
What are the pros and cons with AI in HR roles?3m 27s
-
What should be in a data scientist's toolbox?3m 23s
-
What makes up a good data science team?2m 3s
-
What new projects are you working on?2m 51s
-
What data science projects are you working on?1m 40s
-
How can AI and machine learning (ML) help cybersecurity?3m 54s
-
How can governments fight back against AI attacks?3m 5s
-
What can the public do to protect against AI attacks?1m 14s
-
What are neural networks (NN)?2m 8s
-
What's the difference between ML and NN?1m 42s
-
Do you have a favorite machine learning technique?1m 7s
-
How does the Internet of Things work?1m 38s
-
What is a connected city?3m 3s
-
What is the fear associated with data?2m 30s
-
How can we address the fear of machines taking jobs?3m 20s
-
What about job loss due to AI?1m 43s
-
What's the reality of bringing back jobs?1m 50s
-
What is a scientific process for data science?2m 46s
-
What is your tip for not getting overwhelmed by big data?1m 40s
-
How do you accept that you're not going to know stuff?2m 38s
-
What is a dynamic range?2m 1s
-
When does data leave holes?3m 8s
-
How important is diversity on a data science team?2m 13s
-
How does data influence people's emotions?4m 15s
-
How do you train yourself to be intellectually curious?2m 20s
-
How do we empower people to foster dialogue?3m 33s
-
What is your philosophy on leadership?2m 56s
-
How can a company retain employees?3m 42s
-
How do you cultivate employee development?3m 18s
-
How do you identify algorithmic biases?2m 46s
-
Can you describe the process of ethical testing?2m 46s
-
How do you feel about machine learning for business decisions?1m 52s
-
Can you talk about your book?4m 10s
-
What are possible solutions for displacement?2m 23s
-
What impact does technology have on the US economy?3m 2s
-
Can you discuss the future of intelligent things?3m 26s
-
What are the current issues with data collection?2m 10s
-
How is technology changing human expectations?1m 7s
-
Wrapping up1m 5s
-