Tutorial

This tutorial will guide you through the core features of this library. By the end of this tutorial, you will have a solid understanding of how to use the library effectively. Let's get started!

Requirements

To run this tutorial, you need to have the following packages installed:

You can install these packages using Julia's package manager. Open the Julia REPL and run:

using Pkg
Pkg.add("MLJ")
Pkg.add("MLJDecisionTreeInterface")
Pkg.add("JLSO")
Pkg.add("DataFrames")

Loading the Data

First, we need to load the dataset that we will be using for this tutorial.

using MLJ
using JLSO
using DataFrames
using DearDiary

iris = DataFrames.DataFrame(load_iris())
train, test = partition(iris, 0.8, shuffle=true)

train_y, train_X = unpack(train, ==(:target))
test_y, test_X = unpack(test, ==(:target))

Initializing the database

Before we start tracking our experiments, we need to initialize the database where the experiment data will be stored.

julia> DearDiary.initialize_database()[ Info: Database initialized successfully.

This will create a local SQLite database file named deardiary.db in the current directory.

Creating a new project and experiment

Projects help you organize your experiments. Let's create a new project for our iris classification experiment.

julia> project_id, _ = create_project("Tutorial project")(1, DearDiary.Created())

Once we have a project, we can create an experiment within that project.

julia> experiment_id, _ = create_experiment(project_id, DearDiary.IN_PROGRESS, "Iris classification experiment")(1, DearDiary.Created())
Note

In the case that something goes wrong during the project or experiment creation, the functions will return nothing and a marker type indicating the type of error. You can check the marker types in the Miscellaneous section of the documentation.

Training the model and tracking the experiment

Now we are ready to train a machine learning model and track the experiment using the library. We will use a decision tree classifier for this example.

DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
dtc = DecisionTreeClassifier()
max_depth_range = range(dtc, :max_depth, lower=2, upper=10, scale=:linear)

model = TunedModel(
    model=dtc,
    resampling=CV(),
    tuning=Grid(),
    range=max_depth_range,
    measure=[accuracy, log_loss, misclassification_rate, brier_score],
)
ProbabilisticTunedModel(
  model = DecisionTreeClassifier(
        max_depth = -1, 
        min_samples_leaf = 1, 
        min_samples_split = 2, 
        min_purity_increase = 0.0, 
        n_subfeatures = 0, 
        post_prune = false, 
        merge_purity_threshold = 1.0, 
        display_depth = 5, 
        feature_importance = :impurity, 
        rng = Random.TaskLocalRNG()), 
  tuning = Grid(
        goal = nothing, 
        resolution = 10, 
        shuffle = true, 
        rng = Random.TaskLocalRNG()), 
  resampling = CV(
        nfolds = 6, 
        shuffle = false, 
        rng = Random.TaskLocalRNG()), 
  measure = StatisticalMeasuresBase.FussyMeasure[Accuracy(), LogLoss(tol = 2.22045e-16), MisclassificationRate(), BrierScore()], 
  weights = nothing, 
  class_weights = nothing, 
  operation = nothing, 
  range = NumericRange(2 ≤ max_depth ≤ 10; origin=6.0, unit=4.0), 
  selection_heuristic = MLJTuning.NaiveSelection(nothing), 
  train_best = true, 
  repeats = 1, 
  n = nothing, 
  acceleration = ComputationalResources.CPU1{Nothing}(nothing), 
  acceleration_resampling = ComputationalResources.CPU1{Nothing}(nothing), 
  check_measure = true, 
  cache = true, 
  compact_history = true, 
  logger = nothing)
julia> mach = machine(model, train_X, train_y)untrained Machine; does not cache data
  model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …)
  args:
    1:	Source @046 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
    2:	Source @337 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
julia> fit!(mach)[ Info: Training machine(ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …), …).
[ Info: Attempting to evaluate 9 models.

Evaluating over 9 metamodels:   0%[>                        ]  ETA: N/A
Evaluating over 9 metamodels:  11%[==>                      ]  ETA: 0:01:37
Evaluating over 9 metamodels:  22%[=====>                   ]  ETA: 0:00:43
Evaluating over 9 metamodels:  33%[========>                ]  ETA: 0:00:25
Evaluating over 9 metamodels: 100%[=========================] Time: 0:00:12
trained Machine; does not cache data
  model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …)
  args:
    1:	Source @046 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
    2:	Source @337 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}

After training the model, we can log the results of the experiment to the database.

model_values = report(mach).history .|> (x -> (x.measure, x.measurement, x.model.max_depth))

for (measure, measurements, max_depth) in model_values
    iteration_id, _ = create_iteration(experiment_id)
    create_parameter(iteration_id, "max_depth", max_depth)

    measures_names = [split(x |> string, "(") |> first for x in measure]
    for (name, value) in zip(measures_names, measurements)
        create_metric(iteration_id, name, value)
    end
end

Viewing the logged data

You can retrieve and check the logged data from the database to ensure everything was logged correctly.

julia> iteration = last(get_iterations(experiment_id)) # Checking only the last iterationDearDiary.Iteration
 ├ id = 9
 ├ experiment_id = 1
 ├ notes = ""
 ├ created_date = 2025-11-05T00:42:29.804
 └ end_date = nothing
julia> get_parameters(iteration.id)1-element Vector{DearDiary.Parameter}:
DearDiary.Parameter
 ├ id = 9
 ├ iteration_id = 9
 ├ key = "max_depth"
 └ value = "9"
julia> get_metrics(iteration.id)4-element Vector{DearDiary.Metric}:
DearDiary.Metric
 ├ id = 33
 ├ iteration_id = 9
 ├ key = "Accuracy"
 └ value = 0.9500000000000001
DearDiary.Metric
 ├ id = 34
 ├ iteration_id = 9
 ├ key = "LogLoss"
 └ value = 1.8021826694558574
DearDiary.Metric
 ├ id = 35
 ├ iteration_id = 9
 ├ key = "MisclassificationRate"
 └ value = 0.05000000000000001
DearDiary.Metric
 ├ id = 36
 ├ iteration_id = 9
 ├ key = "BrierScore"
 └ value = -0.10000000000000002

Save and load the trained model

You can save serialized objects, files, or any other resources related to your experiments.

smach = serializable(mach)
io = IOBuffer()
JLSO.save(io, :machine => smach)

bytes = take!(io)
julia> resource_id, _ = create_resource(experiment_id, "Iris DTC MLJ Machine", bytes)(1, DearDiary.Created())

Then you can load the model back when needed.

julia> resource = get_resource(resource_id)DearDiary.Resource
 ├ id = 1
 ├ experiment_id = 1
 ├ name = "Iris DTC MLJ Machine"
 ├ description = ""
 ├ data = UInt8[0xdb, 0x4e, 0x00, …, 0x72, 0x00, 0x00]
 ├ created_date = 2025-11-05T00:42:36.409
 └ updated_date = nothing
io = IOBuffer(resource.data)
loaded_mach = JLSO.load(io)[:machine]
serializable Machine
  model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …)
  args: 
julia> restore!(loaded_mach)trained Machine; does not cache data
  model: ProbabilisticTunedModel(model = DecisionTreeClassifier(max_depth = -1, …), …)
  args:

Built-in REST API

The library also provides a built-in REST API to allow the outside world to interact with your projects. You can start the API server using the following command:

DearDiary.run(;)

This will start the API server on http://localhost:9000. You can customize the settings by setting an .env file containing the configuration options. For more details, refer to the REST API section of the documentation.

Conclusion

And that's it! You have successfully completed the tutorial and learned how to use the core features of this library. You can now track your machine learning experiments effectively. For more advanced features and options, refer to the rest of the documentation.