Fast and efficient slugify, written for Rust and Node

Slugify is an url slug generator, and slug is a part of the URI between 2 slashes:

https://www.example.com/this-is-a-slug/this_is_another_slug

Usually, slugs are made from alphanumeric characters and dash. I rarely see an underscore in URL, but not because it’s forbidden, just that most of us somehow agreed that [a-z0-9\-]+ is best to read and type.

To transform an arbitrary string into a slug, it must go through several transformations:

Transliterate

Some of you probably have some special characters in your language alphabet, and they have to be consolidated into another alphabet, I’d say Latin. Most of us have Latin letters on our keyboard, right?

Lowercase
Sanitize

Sanitize means removing unsupported characters and trimming the extra trailing space. Everything that’s not alphanumeric and a space should be removed.

Replace spaces with dashes

Table of Contents

Native modules

My main motivation for making this is, I needed a faster slugifier in Node!

Then I thought, I know Rust, Rust is ultra fast and efficient, and I can make Rust work in Node!

What’s more important! I wanted to try writing my first native module for npm.

But before explaining you how you can write a node module in Rust, I want to go over my thought process when writing this.

Draft 1: Regexes

First attempt at solving this was with regexes, but how I wrote regexes, slugifying was too slow. Although it was correct for my test cases.

Pseudocode below would be awesome if it worked in Rust #i-need-function-compositions-like-scala

(transliterate and_then lowercase and_then trim_ends and_then sanitize)(value)

I got 3 regexes in my first draft. I think knowing regexes is very handy, but how you write them could slow down replacing of characters in a string.

let re_beginning = Regex::new("^[^a-zA-Z0-9]+"); // matches everything non-alphanumeric from beginning
let re_end = Regex::new("[^a-zA-Z0-9]+$"); // matches everything non-alphanumeric from end
let remove_unwanted = Regex::new("([^a-z0-9]|[^a-z0-9])+") // matches everything in the middle, avoiding duplicates

Running my conversions for simple slugs took up to 2ms, which should be instant!

Draft 2: Iterations

I took a look at what’s popular on crates.io, and found out this crate https://crates.io/crates/slugify.

Then I took a look at the underlying code.

Btw, utility is fast, but I can make it faster.

That utility does what every other slugify utility, no matter the language.

First transliterate, then lowercase and then sanitize what’s left.

I thought why not do it in a single iteration.

In my case, all transformations are done on a character level.

If result of transliteration returns more than one, I call a recursion on the same transformations.

Most of the time, complexity will be O(n).

And then I thought, Why should I call transliteration on entire range of characters?

Maybe I’m wrong, please tell me if I am, but isn’t transliteration for alphabetic range of characters?

If you have a very large slug, transliteration on entire set of characters might be expensive, right?

The code:

fn is_contained_in_limited_set(value: char) -> bool {
    matches!(value, '0'..='9' | 'a'..='z' | 'A'..='Z')
}

/// Removes all non alphanumeric, substitutes to replacement character, without trailing replacement
fn sanitize(value: &str, replacement: char) -> String {
    let mut out = String::new();
    for elem in value.chars() {
        if is_contained_in_limited_set(elem) {
            out.push(elem)
        } else if elem.is_alphabetic() {
            // characters that need to be decoded should already be in the alphabetic range, everything else is for replacement
            let decoded_elem = deunicode_char(elem).map(|d| sanitize(d, replacement));
            if let Some(decoded) = decoded_elem {
                out.push_str(&decoded);
            }
        } else if !out.ends_with(replacement) {
            out.push(replacement)
        }
    }

    out.to_string()
}

By introducing this change, I managed to improve speed for cca 20%. You might not notice or care about speed, because in real world you’ll have very short slugs, and operations should be instant, microseconds.

But I was curious if I could improve the speed, and I did.

You can install the crate el-slugify .

Port to node module

I wanted to embed my crate in a Node module. After some research of what’s out there, I decided to go with Neon. With Neon, you can write high-performance Rust code that can be called directly from your Node.js application, allowing you to take advantage of Rust’s low-level control and speed while still using the familiar JavaScript ecosystem.

Getting started with Neon

It’s simple as hitting in the console

npm init neon node-el-slugify

You can look at my code at https://github.com/eisberg-labs/el-slugify/tree/main/node-el-slugify.

The main part is writing an adapter that transforms rust types in js types.

Internally, it will be built as a C module.

fn slugify_api(mut cx: FunctionContext) -> JsResult<JsString> {
    let value = cx.argument::<JsString>(0)?;

    let value_string = value.value(&mut cx);
    let slug = slugify(value_string.as_str());
    Ok(cx.string(slug))
}

#[neon::main]
fn main(mut cx: ModuleContext) -> NeonResult<()> {
    cx.export_function("slugify", slugify_api)?;
    Ok(())
}

And you can directly test your node module like:

"use strict";

const assert = require("assert");
const slugifier = require(".."); // path to the module

describe("Slugify", () => {
  it('slugifies with default replacement', () => {
    assert.strictEqual(slugifier.slugify('mačka Mački Grize rep!'), 'macka-macki-grize-rep')
  })

  it('slugifies with custom replacement', () => {
    assert.strictEqual(slugifier.slugify_with_replacement('mačka Mački Grize rep!', '_'), 'macka_macki_grize_rep')
  })
});

Troubleshooting

I had some issues with cargo-cp-artifact , I was constantly getting build errors:

Did not copy “cdylib:el-slugify”

cargo-cp-artifact is a subcommand for the cargo package manager, which is used to copy build artifacts to a distribution.

And build errors occur because Cargo.toml and package.json name have to be the same.

Also, there’s no organization prefixing in the package.json.

Publishing the package

After running the build command:

cargo-cp-artifact -nc index.node -- cargo build --message-format=json-render-diagnostics --release

You can just npm publish.