Good question. “Prepare datasets of any size, megabytes to terabytes, with equal ease.” from Dataprep documentation.

It depends on how you define “efficiently”. The data cleaning and transforming on sample dataset is real-time. The job running on the entire dataset will be executed after you submit it. And the running time could vary due to type of instances, recipes and amount of data, etc.

Data Scientist: Keep it simple.

Data Scientist: Keep it simple.