Skip to content

Instantly share code, notes, and snippets.

@jacobben85
Last active August 4, 2016 06:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jacobben85/93e07b4765cb2f249bf172cbaf770472 to your computer and use it in GitHub Desktop.
Save jacobben85/93e07b4765cb2f249bf172cbaf770472 to your computer and use it in GitHub Desktop.
Self-service business intelligence
Self-service business intelligence (SSBI) is an approach to data analytics that enables business users to access and work with corporate data without having to have a background in statistical analysis, business intelligence (BI) or data mining. This helps the bussiness users to make reports and decisions based on their own queries and analyses.
http://www.information-management.com/news/big-data-analytics/4-requirements-for-self-service-big-data-analytics-10027589-1.html
The current trend in the industry is to add a data discovery framework on top of big data analytics platforms to enable even the Decision Scientists to experiment with data or sometimes insights to arrive at better and deeper intelligence while significantly reducing the time taken to get at those decisions. Data Scientists and Analysts would still be required to create algorithms, analytics models and event parsers, however these would be packaged for data discovery, with configurable input data sources, event parsing logic, KPIs and model parameters. Above all, data and IT governance should also be simplified.
1. Big data platforms have to evolve into self-service tools:A big data platform should allow agility, fast time to decisions and an increased productivity for Decision Scientist. To achieve that it has to be self-service. Apart from being robust, the platform should abstract all technology layers from the business user. Data models, technology stack, and analytics algorithms should be abstracted and available to the business users as a service or configurable module. For instance, all data integration workflows and analytics algorithms should be packaged into customizable models. Decision Scientists should themselves be able to customize the inputs to these models and improve performance and results by fine tuning model parameters. Decision Scientists should themselves be able to configure new data sources with little external help, view a catalog of integrated data rather than the data model, experiment with different packaged analytics algorithms and visualization for data discovery and then create dashboards and visual reports.
2. Data governance needs to have more focus than day to day operations:Managing data manually is difficult and is not a scalable option, especially for big data sets. Decisions like what data has to be retained and for how long, what data has to be shelved, what level of access is allowed to users in the chain from data to decisions, accounting for data usage etc. continue to be based on manually configured rules, but bulk of the processes should be automated. Data quality control and correction is another important aspect that should be automated. A full automation might not be always possible, a Data Steward will have to certify data quality or take a decision/action occasionally when an alarm is raised, but the process of collecting information at check points in process flows and reporting them, and further, detecting and alerting a deviation either in data quality SLAs or process guidelines, should be automated.
3. All other IT processes should be automated:All day to day operational processes that follow a predefined standard operating guideline are prime candidates for automation. Any necessary follow up action should also be either fully automated for self-healing, or in the case where manual intervention is inevitable, should be assisted by automated sub-processes or be backed with enough information to aid a quick RCA and correction. This calls for an operations management framework that collects data from check points and analyses them to detect any deviation in service SLA or quality and take or facilitate corrective actions.
4. Simplified workflows for configuration, reports and analytics:An intuitive and powerful front end is an inevitable component of a self-serve platform. A user should be able to do all configurations, analytics, reporting and further actions from an App or a GUI from a hand held device.The front end should also allow collaboration between Decision scientists, where ideas are exchanged and results from analytics are shared among peers for review and discussions. Different business users require different views and reports, the front end should allow customizable dashboards and support a host of visualizations from which to choose from. It should also feature workbenches that aids Decision Scientist in the design for data ingestion, data preparation, analytics and experimentation, visualization and further actions, all in a configurable workflow.
The key to this is having the right infrastructure. Softwares that enable hadoop as a service (big data as a service)
Apache Ambari
Big data infrastructure as a service
Amazon’s Elastic MapReduce
Amazon EMR is a web service that makes it easy to quickly and cost-effectively process vast amounts of data.
Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances. You can also run other popular distributed frameworks such as Apache Spark and Presto in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB.
Amazon EMR securely and reliably handles your big data use cases, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment