DataTau logo

DataTau

new | ask | show | submit
login
Ask DT: Data Scientists Not Utilizing Data Warehouse (None)
4 points by shicken 170 days ago | web | 3 comments





Hi, Looking for some advice. I have been a DS now for 3 years, but am struggling working with my colleagues. They will quite regularly write tens of thousands of lines of R code (mostly copied and pasted) that takes hours to run, when I approach them to tell them they can just do a sum on the production data warehouse tables rather than dumping the data lake and avoid the need to clean the data, validate and aggregate, they look at me like I am a fool even though it would save them hours and they could have a result within a minute. Has anyone else experienced this?

All the time. Bring proof and examples for them, and show how something that takes hours can really take minutes or seconds. DW are typically OLAP and built for analytical processing.

>*they can just do a sum on the production data warehouse tables rather than dumping the data lake* It's more efficient to do whatever processing you can on the database, for sure. It's both faster and requires much less code to clean the data. I find I can be a bit more clever with my python code if there's some serious munging to do, but I get as far as I can with SQL first.