External reviews
External reviews are not included in the AWS star rating for the product.
Databricks is a very reliable way to run Spark
What do you like best about the product?
Databricks is the most reliable and flexible way to run Spark applications for data engineering workloads.
What do you dislike about the product?
Databricks is at the top end of the market on pricing.
What problems is the product solving and how is that benefiting you?
We use Databricks to run our Spark applications, which process hundreds of terabytes of data, need to be cost-effective, and run in a timely manner.
- Leave a Comment |
- Mark review as helpful
Works well in those grey areas of data management.
What do you like best about the product?
Easy to develop and maintain. Flexibility with transactional integrity.
What do you dislike about the product?
Can be more integrated with DW systems like Snowflake.
What problems is the product solving and how is that benefiting you?
Data Summary tables - I work with vast amounts of raw data. Being from the backend team, I do not need to ingest all of the data, but specific parts. Building summary tables by partly processing the data within the lakehouse framework is the most ideal solution I could find.
Hands down the most versatile and powerful data platform on the market
What do you like best about the product?
If you need to leverage python, spark, and SQL to build ELT pipelines, Databricks offers the most robust and easy-to-use solution for this. It doesn't require a lot of effort to configure and deploy, and allows developers to focus on building pipelines, instead of getting the infrastructure to work.
What do you dislike about the product?
I do wish there was more visibility into individual job cost, and overall cost as well- but this is a relatively minor complaint. Overall, the platform is great!
What problems is the product solving and how is that benefiting you?
I leverage Databricks for a variety of projects both for clients and personally. Anything involving large amounts of data, or streaming solutions and Databricks is my go-to.
Lakehouse is the best
What do you like best about the product?
Lakehouse combines the power of storage of data lake and reliability of warehouse, decoupled storage and compute is the best thing.
What do you dislike about the product?
Not enough resources earlier, but now we have all the required material in databricks academy.
What problems is the product solving and how is that benefiting you?
We are currently using redshift it is very hard to scale if we need extra compute, now we have decoupled compute we can spin up any end point according to our requirement
I am using databricks on my daily routine , experienced a wonderful experience
What do you like best about the product?
I like delta live table the most because of its working and the exposure it gave to the customer like data constraints and data quality check , that is best
What do you dislike about the product?
I dislike the python syntax and code to create the delta live tables , so confusing and need to be change the logic , sql syntax is best
What problems is the product solving and how is that benefiting you?
As Delta live tables comes into picture , we dont have to focused on the data quality just only focus is to read the file and all the work will be done by delta live tables pipelines
Swiss-Army Knife of Data Analytics
What do you like best about the product?
Databricks' versatility is its best feature. The range of languages and functionality afforded by Databricks is impressive. Thus far, I've written code in R, Python, SQL and Scala in Databricks. And im just getting started. But I've composed SQL code in both R and Python, executed in Databricks. And then we come to interoperability. Data written to SQL can be accessed by either R or Python. Parameters can be passed across SQL, R and Python via widgets or environmental variables. If you have an intractable data or analytics problem, Databricks would be my 'go to' to maximise the options as to how you could potentially code your way under, around or over the obstacles standing between your project and successful execution.
What do you dislike about the product?
The options for deployment of Databricks code from dev >> qa >> uat >> prod aren't as intuitive as I might like. This might have more to do with our current use of Azure Data Factory for orchestration. Setting up workflow natively in Databricks was quite straightforward. It seems to be accessing Databricks notebooks from Azure Data Factory in dev >> qa >> uat >> prod where we are perhaps creating problems for ourselves. Perhaps not a shortcoming in Databricks at all. Curious as to how Databricks would operate with AWS rather than Azure. Perhaps a better experience?
What problems is the product solving and how is that benefiting you?
Data migration, data modelling & reporting.
Best Data Engineering, ML, Data Science & analytics lakehouse platform
What do you like best about the product?
Autoloader
Change Data Feed
DLT pipelines
Schema evolution
Jobs Multitask
Integration with leading Git Providers, Data Governance and security tools
MLflow AutoML
Serverless SQL endpoints for analyst
Photon accelerated engine
Change Data Feed
DLT pipelines
Schema evolution
Jobs Multitask
Integration with leading Git Providers, Data Governance and security tools
MLflow AutoML
Serverless SQL endpoints for analyst
Photon accelerated engine
What do you dislike about the product?
No GUI-based drag & drop
Complete Data Lineage visualization at the metadata level is still no there
NO serverless Cluster for data engineering pipelines if you use existing interactive clusters, only available through job clusters through DLT
Every feature has some limitations involved
More work is needed on orchestration workflows
Complete Data Lineage visualization at the metadata level is still no there
NO serverless Cluster for data engineering pipelines if you use existing interactive clusters, only available through job clusters through DLT
Every feature has some limitations involved
More work is needed on orchestration workflows
What problems is the product solving and how is that benefiting you?
Unified Batch & streaming pipeline
Delta Lake
Versioning & History
ACID transaction through delta log
Data curation through Validation & quarantine
Data Ingestion through Autoloader
Delta Lake
Versioning & History
ACID transaction through delta log
Data curation through Validation & quarantine
Data Ingestion through Autoloader
Databricks : Best Unified Platform for Data Engineering
What do you like best about the product?
Delta Table is the best. Spark in a very curated format
What do you dislike about the product?
Nothing as now . Its very good overallll
What problems is the product solving and how is that benefiting you?
We wanted to have a unified platform . The Partner connect is the very good feature of Databricks
Databricks - Best Unified Delta Lakehouse Platform in Data & AI Analytics space
What do you like best about the product?
Unified Batch & Streaming for source systems data
Autoloader capability, along with Schema Evolution
Delta Live Table & orchestrating with Pipelines
CDC Event streams for SCD1 & SCD2 using DELTA apply changes
Databricks Workflows - Multi-task jobs
Serverless SQL Photon cluster along with Re-dash integrated Visualization
Unity Catalog
Delta Sharing & Data MarketPlace
Data Quality expectations
Integration with Collibra, Privacera & other security & governance tools
Autoloader capability, along with Schema Evolution
Delta Live Table & orchestrating with Pipelines
CDC Event streams for SCD1 & SCD2 using DELTA apply changes
Databricks Workflows - Multi-task jobs
Serverless SQL Photon cluster along with Re-dash integrated Visualization
Unity Catalog
Delta Sharing & Data MarketPlace
Data Quality expectations
Integration with Collibra, Privacera & other security & governance tools
What do you dislike about the product?
Issue in running multiple streaming jobs in same cluster
Job clusters can't be reused even for the same retry in PRODUCTION, since shutdown immediately after the job run/fail is set by default - Need to check any options to increase this limit
Multi-Task jobs requires TASK output should be passed to next input TASK and also need to support FAIL on trigger and setting OR dependent predecessors to trigger ,Currently supports only AND
No serverless option for Data engineering jobs outside DLT
DLT need to be matured to handle wide variety of integrating source & target, currently only support DELTA table in databricks. Expecting that be supported for any tool/service/product which supports DELTA format filesystems
Job clusters can't be reused even for the same retry in PRODUCTION, since shutdown immediately after the job run/fail is set by default - Need to check any options to increase this limit
Multi-Task jobs requires TASK output should be passed to next input TASK and also need to support FAIL on trigger and setting OR dependent predecessors to trigger ,Currently supports only AND
No serverless option for Data engineering jobs outside DLT
DLT need to be matured to handle wide variety of integrating source & target, currently only support DELTA table in databricks. Expecting that be supported for any tool/service/product which supports DELTA format filesystems
What problems is the product solving and how is that benefiting you?
Easier integration from various source system starting from IOT, Streaming connectors & batch connectors
Helpful to easily design the lakehosue medallion architecture RAW, REFINED AND GOLD to contextualize the enterpise common data model & warehosue systems
Data quality expectations in DLT is very helful to speed up the quality check process & displays in monitoring dashboard lineage process.
Auto tuning - compaction is helpful along with VACCUM
Able to integarted metastore well with Collibra for data governance
Helpful to easily design the lakehosue medallion architecture RAW, REFINED AND GOLD to contextualize the enterpise common data model & warehosue systems
Data quality expectations in DLT is very helful to speed up the quality check process & displays in monitoring dashboard lineage process.
Auto tuning - compaction is helpful along with VACCUM
Able to integarted metastore well with Collibra for data governance
Scalable, fast and easy to use
What do you like best about the product?
Databricks delta lake is the default storage for databricks which makes it very useful. Time travel, transaction, partitioning makes it very efficient.
What do you dislike about the product?
Until now I have not faced any limitations for my use case.
What problems is the product solving and how is that benefiting you?
We are using databricks lakehouse to manage batch and realtime data pipeline to fetch data from elasticsearch & Azure datalake.
showing 251 - 260