Democratizing Machine Learning with Machinebox (and Rust!)

Kevin Hoffman
4 min readFeb 7, 2018

--

I am a huge fan of not doing work. If there is ever anything I can do to remove difficult or high-friction things from my professional or personal life, then I will do it. I also have an insatiable scientific curiosity and I love exploring things like machine learning. Learning these new things doesn’t feel like work to me, it feels like breathing fresh air. I love the beauty and elegance of the math behind how some of these models work and can be trained.

However, there comes a time when scientific curiosity needs to take a back seat to the practical concerns of producing shipping software and making money. For most developers, this means we don’t want to spend our time writing the code for the underbelly of ML, we simply want to consume ML as a service. We want to treat it like a commodity, the same way we should treat our platform when we’re running on something like Kubernetes.

If we need file storage, we provision and consume it.

If we need a database, we provision one and consume it.

In my ideal world, if we need a machine learning model, we provision one and consume it.

I had been struggling with how we can accomplish this kind of thing until I saw machinebox posting about its services on Twitter. They’ve taken the hard work of building the core components of various ML models, wrapped them up in RESTful services, and stuffed them into neatly packaged docker images for you! Spinning up one of their boxes takes less time and effort than buying a movie ticket online.

While there are a bunch of great boxes to experiment with, I started with textbox, a natural language processing box that can tag and categorize written text (I believe it’s English-only at the moment). There are a ton of uses for this, but I thought I’d see if I could make it process someone’s intent to pay an electronic bill through a bot or a text message.

For example, let’s say I texted my “moneybot” the following — “Pay William $200 USD tomorrow”. In the current world where machine learning is a high-friction, bespoke process, we might kick off a long and arduous development cycle involving complex collaboration with data scientists and developers, ultimately producing a one-off solution. I call this “re-inventing the wheel as a square”

I believe in standing on the shoulders of giants and have no grand delusions about my ability to create a better NLP engine than the experts. So, I fired up textbox and fed it my hypothetical text (it even comes with a web-based dashboard you can use to play with it before writing any code), and it identified the following:

  • William was tagged as an entity of type person
  • 200 was tagged as an entity of type money
  • tomorrow was tagged as an entity of type date
  • It found two keywords: “pay william”, and “tomorrow”. I don’t know if “pay” is supposed to be in there, but it’s a start. I think a trainable model might be able to get better at discerning transactional verbs and keywords.

I’ve noticed a few quirks about the machine. For example, if I change $200 USD to $200USD then it fails to detect William as an entity. I’m assuming that these kinds of quirks will be ironed out and the great part is that I don’t have to be the one to fix them.

Now I can build a transaction bot that takes this text, grabs the entities out of it, and then does post-processing based on the analysis. I could do a lookup on a hypothetical payees service and find potential matches for William, and I can then process the word tomorrow and convert that into a timestamp. After doing all that post-processing, I could present the end user with something like:

“Pay $200.00 USD to William Jones on February 8th from your default account Checking?”

The crucial point here is that ML is being used as a tool, a commodity that can be stacked like a lego brick in the larger construction of an application.

Now for the Rust! As I mentioned, the boxes all have a RESTful interface. In my original tinkering, I was just writing one-off throwaway code, but I decided to turn my efforts into a reusable crate, as I hoped to be taking advantage of more machinebox features in the future and want to encourage other Rust developers to do so as well.

The following code shows an example of using a textbox from the Rust crate (there’s a textbox docker image running locally at localhost:8080):

extern crate machinebox;

use machinebox::textbox::Textbox;
use machinebox::BoxClient;
fn main() {
let tb = Textbox::new("http://localhost:8080");
let analysis = tb.check("Pay William $200 tomorrow");
// extract the potential dollar amount...
match analysis {
Ok(results) => {
let money = results.sentences[0]
.entities.iter()
.find(|e| e.entity_type == "money");
match money {
Some(amount) => println!("Pay {}?", amount.text),
None => println!("Couldn't find money in your text."),
}
},
Err(e) => {
println!("Couldn't get results: {}", e);
}
}
}

In this case, the output I get is Pay 200?

I have a lot of refactoring on the crate to do and so the client code is likely to change (for the better, I promise). I plan on building client implementations for all of the boxes, but that’s not really what I want to be the takeaway from this post.

What I’d like you to take away from this post is for you to start thinking about how ML can be used as a tool like any other in your toolbox. Ask yourself — how can my application be made better through the use of ML, and not just for the sake of using the new shiny thing? How can I take advantage of this kind of intelligence and prediction in my application without having to embed data science experts on every one of my development teams?

--

--

Kevin Hoffman
Kevin Hoffman

Written by Kevin Hoffman

In relentless pursuit of elegant simplicity. Tinkerer, writer of tech, fantasy, and sci-fi. Converting napkin drawings into code for @CapitalOne

No responses yet