Think about what technology has done for auto racing over the last decade: Systems for anti-lock braking and launch and traction control mean drivers can mash on the accelerator and carry more speed into the corner without spinning out or turning tires into liquid rubber — with a human still playing a key role in the loop.
We’re at a similar juncture in machine learning. The big data revolution, advances in hardware and open source projects have come together to fuel a tremendous escalation in capability for machine learning:
- Modern GPUs are perfect for training large-scale machine learning models.
- Hadoop and other frameworks are powerful paradigms for distributed computing.
- Efficiencies in cloud computing are re-writing the economics of machine learning as they enable on-demand use of infrastructure.
- Open source communities for software and academic publishing are democratizing its development.
- There have been a steady series of advances in algorithms.
Moving forward aggressively in a well-controlled manner can mean strategic advantage. Amazon, for example, employs machine learning with brutal efficiency across its business to make sure it offers the best recommendations at the best prices and margins.
For many companies, the initial push is on developing and mounting a strategy. In financial services, we are executing in many areas and see potential in even more. Robotic process automation (RPA) can automate processes traditionally done by humans for transactions that happen on a large scale. Utilizing machine learning results in less human error and frees time for human teams to focus on initiatives that can drive the business forward.
In customer service, machine learning can analyze call center conversations, using data mining to look for themes that can lead to improved customer care. Risk management and fraud detection is another sweet spot.
Across industries, we also are approaching a time when machines are making life and death decisions — whether it’s self-driving cars or automated diagnoses in health care. And just as in racing, it’s the guardrails that get built in that can make the difference. For any company looking to advance on the machine learning path, the following are some best practices to consider:
Get the right fit. Logical errors include overfitting a statistical model so that it is too complex to achieve good predictive performance, or underfitting with a similar sub-optimal result. Achieving the proper fit is fundamental, but getting it right is tricky. There’s much interest right now in developing automated model pipelines — retraining, re-fitting and re-deploying models very aggressively. Depending on the use case, such an approach can yield tremendous results. But proceed with caution because an aggressive cycle can have big impacts on your business and your customers.
And pace yourself. Sometimes you can develop a model that operates perfectly in the lab only to release it in the real world to see it get messy.
Watch out for data drift. Over time, inputs to your model are going to change — whether it’s seasonal variations or macro trends such as changing economic conditions or consumer sentiments. Remember that for a variety of reasons the model that’s yielding unbelievable accuracy today likely will not continue to achieve those same great returns over time. Stay vigilant and keep your inputs current.
Document your data. It’s important to understand which data is hard to understand and why. Sometimes a data set is poorly documented, or has no metadata — especially if it comes from a quirky legacy system. For many mature, established businesses the core business systems that are the data source for ML have been around for a long time. Document, and get good at metadata.
Build a well-rounded team. Data scientists and data engineers who are machine learning experts are not interchangeable. Data scientists are statistical and quantitative experts, often with particular domain expertise. Not all have of them have machine learning expertise. Data engineers in machine learning tend to be programmers who are good at working with data — but they’d be coming into a project in a specific line of business cold. So it’s important to approach projects as a partnership. Increasingly, there are also machine learning engineers who bridge the gap: They can write code and also think like a data scientist.
In racing, traction control systems let us take turns as fast as possible, with engineering mitigating risk. Similarly, we can accelerate machine learnings efforts with a solid framework in place to keep us from going over the wall or into the runoff area if we get a little bit too aggressive.
Keeping humans “in the loop” will remain hugely important as machine learning develops and accelerates. I view it as a challenge to us all for now and in the future: How do we invent new ways to responsibly deploy machine learning so that we can move faster, safer and smarter?