Strata Data Conference New York 2019: How to Unlock Your Data’s Full Potential
Strata Conference is one of the largest big data conferences in the world, and it’s happening in New York City this fall (September 23–26).
What major trends will people be talking about at this conference?
DataArt interviewed five of the announced conference speakers to get their insights on how the latest trends in big data, ML, and cloud-based technologies can disrupt the existing business ecosystem.
Strata NYC — Trend #1: Correct Usage of Data Transforms Multiple Domains
Big data, AI, and ML are not just for the science sectors. In fact, they are also used in a variety of industries to streamline processes and improve the accuracy of calculations.
Let’s dive into the details.
Strata Data Conference 2019 Will Highlight the Benefits of ML
ML can heavily influence the development of any industry, from FinTech to travel. There are no boundaries or limits.
Among other topics, the Strata New York speakers are expected to discuss the following:
How will this look in reality?
By clustering your potential consumers, ML will help you target the most valuable segments of the general population.
In the above example, the company should target the customers in clusters #4 and #9 in their emails or AdWords, because those customers greatly outperform the general population.
Squeezing every drop of information from your data will help reduce the time and money spent on customer acquisition.
When the cost of user time and attention becomes too high to ignore, implementing an ML-driven recommendation system will prove invaluable to your organization.
Such systems are widely used in (but not limited to) retail, music streaming, and on-demand video.
Let’s consider a real example.
DataArt, a software development company, has created a recommendation engine for a leading US-based tour and activities distribution network.
DataArt’s solution provides recommendations regarding tours, activities, and events that are potentially relevant to a variety of target audiences. Since offers differ depending on the target audience (groupings of individual travelers or corporate event attendees), this creates new financial opportunities.
Strata Conference NYC 2019 Will Unveil the Influence of ML on Data Quality Management
When it comes to data quality, don’t underestimate the impact of ML.
Because the conference speakers DataArt interviewed generally agree that ML allows for consistent data analysis, effective data distribution and processing, and automated detection of outliers.
Let’s look at how this works in practice.
For one of its enterprise-grade clients, DataArt developed a state-of-the-art outlier detection platform based on ML technology. This system automatically ranks potential anomalies in real time.
This technology can easily be adapted for other uses such as exception management and trend or fraud detection.
Strata New York 2019 Will Feature Successful AI Practices
While attendees at the Strata Data Conference New York will discuss the opportunities that AI can help unleash, some participating companies will be demonstrating their own results in this area.
For example, DataArt has already developed an AI-based pharmaceutical computational platform that can accelerate drug discovery and development.
Screenshot of the AI-led pharmaceutical platform developed by DataArt
Built using Scala, Akka, Mesos, and Kafka, this platform provides fast and reliable computations for networks — with tens of thousands of nodes and millions of observations.
Strata NYC — Trend #2: Streamlining Processes with Serverless Technologies
Another topic at the Strata Data Conference New York will be serverless apps. Speakers will cover such cloud-related topics as:
- Developing a big data app on top of AWS
- Serverless machine learning based on TensorFlow and BigQuery
- Cloud-based multidisciplinary workloads
- Streaming enterprise-grade architecture and algorithms from the cloud
- The advantages of cloud databases over relational databases
- The difference between public clouds and on-premise private clouds
Strata NYC — Trend #3: Maintaining Privacy and Security with ML, AI, and Big Data
Data protection is increasingly important to people and companies. As a result, data privacy is quickly becoming the number one priority for owners of apps and platforms.
This problem became more urgent when the California Consumer Privacy Act (CCPA) was signed into law in 2018.
Under the CCPA, businesses cannot share or sell users’ private information, and users have control over the personal information that businesses collect about them. More importantly, businesses are responsible for securing their users’ private information.
How can businesses do this successfully? And how can data safeguarding be automated?
This is where ML, AI, and big data come into play.
Among other related topics, speakers at the O’Reilly Conference, Strata New York, will talk about:
- How to secure data lakes to prepare yourself for CCPA regulation
- How to guarantee open-source cybersecurity with the help of Apache Metron
- How to use AI and big data to ensure food security
Another big concern in the privacy and security sphere is fraud prevention.
How can ML help with this?
Here is an example. For one of its clients, DataArt built a platform that processes historical transaction data and looks for suspicious activity using predefined rules. Cassandra and Spark guarantee horizontal scalability, and six node clusters are used for benchmark testing.
This fraud detection platform can handle three TB of data and run predefined rules on 500 million records per hour.
What some of the speakers at the upcoming Strata Data Conference New York have to say about this trend:
Carolyn Duby, Solutions Engineer and Cyber Security SME Lead at Cloudera
Question (Q): What are the use cases to adopt Apache Metron?
Сarolyn Duby (CD): Apache Metron is well suited for ingesting, preparing, and triaging log data in real time. It is used by security operations and risk management.
The major use cases are:
1. Security information and event management (SIEM) augmentation or replacement to remove security “blind spots.” Ingest high-volume logs that are helpful for security operations but exceed legacy SIEM event-per-second rates or scalability. For example, Netflow, pcap, DNS, or Windows endpoint logs.
2. Scalable log retention. Retain network data for a longer time to improve efficiency and completeness of investigations, threat hunting, and compliance.
3. Threat hunting. Enhanced analytics, longer retention, and a single repository for log data make threat hunting more productive and help organizations enhance their security posture with proactive detection.
4. Insider threat detection. Advanced data science and profiling embedded capabilities help identify and prioritize anomalous users, traffic, and other entities.
5. Augment security operations center (SOC) resources to prioritize alerts. Use SOC analysts’ time more efficiently by correlating and prioritizing alerts from point solutions to identify the most important alerts.
6. Automated responses to reduce the impact of incidents. Integrate real-time triage with security orchestration, automation and response to act on incidents quickly and reduce their impact.
7. Security analytics and data science. Metron normalizes and organizes data, so it is ready for analytics and data science.
Q: What industries can benefit from Metron?
CD: Cybersecurity is a concern that cuts across every industry. However, Metron is most helpful to organizations with a large network footprint and companies entrusted with private customer data or financial assets attracting adversaries. Examples: telecoms, financial services companies and banks, insurers, and hospitals.
Q: What are the alternatives for Apache Metron?
CD: Alternatives include legacy SIEMs and other big data security platforms such as ELK or Splunk.
Q: What are the benefits of Apache Metron as compared to its alternatives?
CD: Metron’s benefits include:
1. Scalable cost-effective platform to ingest and store years of data including logs, pcap, and Netflow.
2. Complete visibility and control of log data storage formats. Metron is built on an open-source platform. The organization controls the retention time and format. There are no “black boxes” or proprietary formats. Store data as long as you want and optimize it for accessibility and cost control.
3. Flexible, configurable solution. Completely control triaging and enrichments with code-free configurations. Coding extensions are available but are typically not needed.
4. Integration with your favorite visualization and data science platforms. Open data formats support your favorite tools, allowing you to access log data as you are most productive.
5. Security and governance. Complete security and governance with encryption at rest and over the wire. Access controls capable of meeting rigorous privacy requirements such as General Data Protection Regulation (GDPR).
6. Integrated with an end-to-end open-source solution to move, triage, analyze, visualize, and build models with log data. Pair with Apache Nifi, Zeppelin notebooks, Spark, Hive, Solr, and other open-source big data projects for an end-to-end solution.
Mark Donsky, Senior Director of Product Management at Okera
Q: What are the potential privacy issues of data lakes?
Mark Donsky (MD): The potential issues are:
1. They all revolve around unintended access to sensitive data.
2. An unanticipated combination of data can create a view into a data subject’s personal life that is in breach of the intended usage — either to the letter or in spirit. For example, combining someone’s taxicab destinations with their credit card expenditures and their spouse’s travel plans to infer potential infidelity.
3. It’s possible to extrapolate personal medical information based on doctor copays, physician types, and spending habits at a pharmacy.
4. Intruders may know when someone’s home will be empty.
Misuse of personal data can lead to stiff penalties from many emerging privacy regulations, including GDPR and CCPA.
Q: What are the major industries that can benefit from using data lakes? Why?
MD: Major industries using data lakes are all industries that have the ability to collect data about their customers, clients, and members. This includes insurance, pharmaceutical, finance, education, utilities, and retail.
Strata NYC — Trend #4: Deep Learning to Improve Predictions
Deep learning is another big ML topic that people will be discussing at Strata Data Conference 2019.
The conference speakers DataArt interviewed see a bright future for this technology.
They expect that deep learning will be used for natural language processing (NLP), and they plan to use this approach for time series forecasting.
They also expect that deep learning will be on mobile and desktop devices.
Let’s look at how this actually works.
This image was generated with OpenCV and Python using a pre-trained Mask R-CNN model.
Recently, DataArt created a solution for monitoring the health of power lines. The solution was built with a convolutional neural network (CNN) and fully connected (FC) layers on Google TensorFlow and Keras.
With the help of a drone-mounted camera, an operator analyzes a stream of images featuring only power poles, which allows the operator to detect problems much more quickly.
DataArt has also developed a customer relationship management (CRM) digital assistant that makes it easy for salespeople (at Salesforce, for example) to track contacts and accounts. The assistant supports voice and text communication via Skype or Google Assistant. It also incorporates a rich database of queries and two-factor authentication (2FA).
What’s Next? Strata San Jose and Strata London Are Coming Soon
If you can’t make it to Strata Data Conference New York 2019, plan for Strata Data Conference San Jose (March 15–18, 2020) or Strata Data Conference London (April 20–23, 2020).
What will the trends be in 2020? Data vision? Image and voice recognition? Or ML-driven overhauls of outdated enterprise-grade software?
Whatever the latest trends, DataArt can help you realize the potential of AI, ML and big data.
Originally published at https://blog.dataart.com on September 10, 2019.