Arduino Library Index Health Check

Or, how well is it with the 6.000+ librarys in the Arduino Library Manager? 🤔

Arduino Library Chart

Let´s do some "Big Data" things, ask ChatGPT, make a couple of thausand API querys, and draw important looking charts! 😀

(And here is also all whats needed to reproduce this by yourself)

First, what?

The Arduino world uses Librarys heavily to simplify projects, like somebody one time figured out how to write and read to a SD card, and now you just have to install the (SD lib) and dont have to worry about the details behind any more, nice!

The Arduino IDE includes the "Arduino Library Manager", a tool that let you search Librarys and install them, also update them later on.

Arduino Library Manager

This Tool has a Register, it is hosted on GitHub so everybody can contribute to it. https://github.com/arduino/library-registry/

Second, and?

Arduino has defined a specification for Librarys (https://arduino.github.io/arduino-cli/0.35/library-specification/)

It defines things like naming convention, folder structure and metadata. So that the Library can be used.

And to make it easy to comply there is also a Tool, Lint, to automatically check if a Library conforms to it.

For every Library there is a Lint Report where you can see if all is well or not. So for the SD Library from above, this is the Report Link: Lint Log

There is even a Github Action that can be used to do it automatically on every change.

So you could think that all the Arduino Librarys follow this? Hint, no, not even close... 🙁

Arduino Librarys Lint check

Let´s take a step back

How can we say the Library Index is healthy? If it only contains healthy Librarys or? But what is a "Healthy" Library?

  • It is used by many people
  • It is actively maintained
  • It follows the specification

Here is my take on answering this 3 points:

  • It is used by many people  -> Stars. More stars on Github/Gitlab/... mean that more people use it.
  • It is actively maintained. -> Last edit timestamp and open Issue count. A recent edit and no open Issues? all good.
  • It follows the specification -> Less Lint errors, better.

Further, if lets say i ask ChatGPT on what a "good Arduino Library" is, it further brings:

  • Documentation
  • License
  • Examples
  • (Some more, but i skip that)

Taking a look

to warm up with the data, lets see some basic things. Like how many librarys are there ecaxtly, and where are they?

Hoster

And who are the top 10 Library writers?

Top 10 Library writers

Going further, are there duplicate entrys?

file1 = open("repositories.txt", "r")
Lines = file1.readlines()
dupes = [item for item, count in Counter(Lines).items() if count > 1]
print(f"Dupes: {dupes}")
Dupes: ['https://github.com/Syncano/syncano-arduino\n', 'https://github.com/thinger-io/ClimaStick\n']

Just 2, lets fix that right away: GitHub Pull Request

And how about not working ones, dead links, 404´s?  There are some but it is not so quick to point them out with 301's and others.

A closer look

How many Lint errors are there?

Lint Error Overview

So LP010 (Name to long) is the highrunner, followed by LS008 (Name-Header mismatch) and LP015 (Name contains spaces).

Data collection

with over 6000 librarys to look up this has to be done with some caution, for example it will require API keys for GitHub and GitLab otherwise you will run into limitations.

The Process:

  1. Download the Library index file
  2. Query the Lint Status for each
  3. Query the Status on the individual hosts
  4. Store all informations locally

Now the local stored info can be used to make querys and charts.

All the code and the collected info used is uploaded here: GitHub Data Host. so you dont need to run the querys and go right into plotting.

Comments powered by CComment