Commit graph

44 commits

Author SHA1 Message Date
Konstantin Shalygin
7489f4f7aa
Add support for device types and predictable device paths (rebased) (#257)
* Add better error logging on smartctl exec failure

We will now log a warning if smartctl path passed via command line is invalid.

Signed-off-by: Piotr Dobrowolski <admin@tastycode.pl>
(cherry picked from commit 1c9c6943e8)

* Add support for autoscan device types and predictable device paths

This adds a new command line option allowing for customization of
autodetected device types and enables use of special "by-id" device type
that forces use of predictable device paths (/dev/disk/by-id/...)

Relevant change to device name parsing regular expression is included
now, so predictable device paths are now also usable when directly
specified.

Signed-off-by: Piotr Dobrowolski <admin@tastycode.pl>
(cherry picked from commit 4c5f721e11)

Conflicts:
  - file: 'readjson.go'
    comment: 'manually resolve new logger issues'

* Rework device label, fix SATA discovery, per-device type specification

Signed-off-by: Piotr Dobrowolski <admin@tastycode.pl>
(cherry picked from commit 319184ce66)

Conflicts:
  - file: 'main.go'
    comment: 'manually resolve new logger issues'
  - file: 'readjson.go'
    comment: 'manually resolve new logger issues'

---------

Co-authored-by: Piotr Dobrowolski <admin@tastycode.pl>
2024-12-19 11:02:09 +01:00
TJ Hoplock
2c043b7fcb
chore!: adopt slog, drop go-kit/log (#246)
The bulk of this change set was automated by the following script which
is being used to aid in converting the various exporters/projects to use
slog:

https://gist.github.com/tjhop/49f96fb7ebbe55b12deee0b0312d8434

Other changes include:
- bumping prometheus/{common,client_golang,exporter-toolkit}
- bump minimum go version to go1.22
- remove old go-kit/log linter configs, add sloglint

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-10-18 09:18:48 +02:00
David Randall
2cc2249821
Merge pull request #205 from zxzharmlesszxz/master
Added determining device type and use it at scrape data
2024-05-06 20:32:21 -04:00
Robin H. Johnson
ea8a38384b
fix: correct smartctl_device_bytes_written & smartctl_device_bytes_read for NVMe (#211)
The NVMe specification says that the controller is responsible for
reporting "Data Units Read" & "Data Units Written" converted as needed
for logicial block sizes other than 512-bytes. smartmontools already has
the correct behavior.

What is correct in this case? For now, track what smartmontools does:
take the counter, multiply by 512*1000, report the value.

We should be clear that it means the drive has read/written at most
that many bytes.

This has a few impacts:
- NVME devices will now show these metrics, if they did not before.
- NVME devices with blocksize other than 512-bytes may have previously
  reported inflated metrics, but are now corrected (is this worthy of
  larger notice in changelogs?)

Reference: 11415ee0b9/smartmontools/nvmeprint.cpp (L394-L397)
Closes: https://github.com/prometheus-community/smartctl_exporter/issues/122

Signed-off-by: Robin H. Johnson <rjohnson@coreweave.com>
2024-04-09 09:40:44 +02:00
Denys
7f7c652b0f Add bus name & bus number to disk name, example: bus_0_megaraid_disk_01
Signed-off-by: Denys <zxzharmlesszxz@gmail.com>
2024-03-16 00:37:09 +01:00
mort
3a012b5bb1 Implemented new features - extract raid member disk name.
Modified smartctl.device param - now you can set it as sda, megaraid_disk_01, etc.

Signed-off-by: Denys <zxzharmlesszxz@gmail.com>
2024-03-08 15:40:11 +01:00
Robin H. Johnson
9113c6cf0f feat: Better SCSI/SAS support
Fix the following metrics that were exported as zero because the
exporter did not know how to read them for SCSI devices:
- smartctl_device_bytes_read
- smartctl_device_bytes_written
- smartctl_device_power_cycle_count

New metrics:
- smartctl_read_errors_corrected_by_eccdelayed
- smartctl_read_errors_corrected_by_eccfast
- smartctl_write_errors_corrected_by_eccdelayed
- smartctl_write_errors_corrected_by_eccfast

Fix labels:
- smartctl_device{model_name} is now populated for SCSI/SAS, using
  scsi_model_name.

New labels:
- smartctl_device{} gains:
  scsi_product,scsi_revision,scsi_vendor,scsi_version

Signed-off-by: Robin H. Johnson <rjohnson@coreweave.com>
2023-10-16 10:15:57 -07:00
Robin H. Johnson
d90594ac23 fix: Remove confused metrics
The exporter presently has metrics that are nonsense for a given type of
drive, and remain at zero due to their defaults.

Change the behavior to NOT emit a metric if the underlying JSON field is
not present.

Future related work may include parsing the corresponding metrics for
SATA/SAS SSDs (e.g. `smartctl_device_percentage_used` could derived from
`SSD_Life_Left` on some drives).

Metrics no longer exported for the wrong type of drive:
- `smartctl_device_nvme_capacity_bytes` (NVME-specific)
- `smartctl_device_available_spare` (NVME-specific, ATA possible)
- `smartctl_device_available_spare_threshold` (NVME-specific, ATA
  possible)
- `smartctl_device_critical_warning` (NVME-specific, ATA possible)
- `smartctl_device_interface_speed` (ATA-specific)
- `smartctl_device_media_errors` (NVME-specific, ATA possible)
- `smartctl_device_num_err_log_entries` (NVME-specific, SCSI uses
  distinct metrics, ATA possible)
- `smartctl_device_nvme_capacity_bytes` (NVME-specific)
- `smartctl_device_percentage_used` (NVME-specific, ATA possible)

Signed-off-by: Robin H. Johnson <rjohnson@coreweave.com>
2023-10-16 10:12:40 -07:00
Konstantin Shalygin
1ab518e696
* split block mine to mineBlockSize() from mineCapacity()
* remove redundant meta labels from SCSI metrics
* added `smartctl_device_nvme_capacity_bytes` metric
* for some devices, such as 2.5" NVMe Intel & Micron the `family` field may be empty

The `.user_capacity` exists only when NVMe have single namespace. Otherwise,
for NVMe deivces with multiple namespaces, when device name used witout
namespace number (exporter case) `.user_capacity` will be absent

```
smartctl --info --health --attributes \
--tolerance=verypermissive --nocheck=standby --format=brief --log=error \
/dev/nvme11 --json | jq '.user_capacity'

null

smartctl --info --health --attributes \
--tolerance=verypermissive --nocheck=standby --format=brief --log=error \
/dev/nvme11 --json | jq '.nvme_total_capacity'

3840755982336
```

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
2023-08-26 21:28:31 +03:00
David Randall
8331d7f6a9
Merge pull request #137 from tekert/remove_duplicate2
Remove duplicate smartctl_device_status metric
2023-08-10 13:25:23 -04:00
David Randall
e5bf7aa1b2
Merge pull request #138 from tekert/fix_data_written_ssd
Fix reported Data bytes Read/Written on SSDs
2023-08-10 11:58:18 -04:00
tekert
9698c581b3 Remove duplicate fixes #136
Signed-off-by: tekert <tekert@gmail.com>
2023-07-29 13:13:33 -03:00
tekert
c2e9d33118 Fix reported Data bytes Read/Written on SSDs
This value is reported in thousands (i.e., a value of 1 corresponds to 1000 units of 512 bytes written) and is rounded up.
	 When the LBA size is a value other than 512 bytes, the controller shall convert the amount of data written to 512 byte units.

Current code is using 1024 instead of 1000.

Signed-off-by: tekert <tekert@gmail.com>
2023-07-25 09:30:29 -03:00
John Thiltges
179f37f05d Set SCSI disk metrics to GaugeValue
Signed-off-by: John Thiltges <jthiltges2@unl.edu>
2023-06-29 15:34:57 -05:00
Denys Lemeshko
637ad4223b Critical metrics for SCSI disks added
Signed-off-by: Denys Lemeshko <denys.lemeshko@pm.bet>
2023-06-29 14:50:45 -05:00
Ben Kochie
b832e55e3e
Merge pull request #28 from lahwaacz/master
Skip vendor-specific statistics that lead to duplicate metric labels
2022-10-20 11:42:19 +01:00
Ben Kochie
3a8968a04a
Merge pull request #88 from k0ste/help3
Pruned /dev/ prefix from device label
2022-10-20 11:31:44 +01:00
Ben Kochie
e76e458118
Update smartctl.go
Signed-off-by: Ben Kochie <superq@gmail.com>

Signed-off-by: Ben Kochie <superq@gmail.com>
2022-10-19 08:03:40 +02:00
Konstantin Shalygin
e385b4a351 Pruned /dev/ prefix from device label
Now label matched with `node_exporter` labels

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
2022-10-17 14:44:18 +07:00
Konstantin Shalygin
397a7a55f0 Added disk form_factor meta label
The requirement from field engineers is knowledge of the form factor of the device, i.e. 3.5", 2.5"

* updated EXAMPLE.md
* fixed copy-paste issue `Starting systemd_exporter`

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
2022-10-17 14:28:05 +07:00
Jakub Klinkovský
e10ded530f skip vendor-specific statistics that lead to duplicate metric labels
fixes #3

Signed-off-by: Jakub Klinkovský <j.l.k@gmx.com>
2022-10-14 16:53:52 +02:00
Konstantin Shalygin
82266c0397 Reduced number of meta labels
On test stand with 5 disks data size from exporter reduced from 148KB to 82KB

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
2022-10-14 14:22:41 +07:00
Konstantin Shalygin
56cd874440 Removed doubled NVMe metrics
Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
2022-10-13 19:45:25 +07:00
Ben Kochie
c8d3e48f3d
Refactor exporter config (#68)
Switch exporter over to standard Prometheus exporter flags and logging.
This eliminates the need for a configuraion file.

Signed-off-by: SuperQ <superq@gmail.com>
2022-10-03 11:16:00 +02:00
Ben Kochie
920c3429b1
Release 0.7.0 (#50)
First prometheus-community release.

* [FEATURE] Add various new metrics #14
* [BUGFIX] Fix exit code bit parsing #37

Signed-off-by: SuperQ <superq@gmail.com>
2022-08-05 03:37:13 +02:00
Горлов Максим
e27581d56a apply merges 2020-11-14 18:36:34 +03:00
Горлов Максим
7f4c259c12 chripede-master merge 2020-11-14 18:30:32 +03:00
Горлов Максим
c962031b18 merging tavyc 2020-11-14 17:57:43 +03:00
Горлов Максим
cbc437fea9 Parsing smartctl error code; parsing resulting json for smartctl errors; docker moved to subfolder 2020-10-30 00:35:49 +03:00
Christian Pedersen
315d1538aa Calculate bytes read/written 2020-10-02 15:14:40 +02:00
Christian Pedersen
1b15cbbec2 Add NVMe metrics 2020-10-02 13:30:45 +02:00
Octavian Cerna
6e30737bc3 mineDeviceStatistics: Flag the SATA PHY stats as valid. 2020-07-27 00:40:18 +03:00
Octavian Cerna
9e58dd6fd2 Add a new metric smartctl_device_erc_seconds for reporting the device Error Recovery Control (TLER) setting. 2020-07-27 00:37:43 +03:00
Octavian Cerna
53399a5e73 Add a new metrics smartctl_device_self_test_log_count and smartctl_device_self_test_log_error_count for the device SMART Self-Test Logs. 2020-07-27 00:16:08 +03:00
Octavian Cerna
3e17706839 Add a new metric smartctl_device_error_log_count for the SMART error log counts. 2020-07-26 23:49:09 +03:00
Octavian Cerna
e8d95208d7 Report SATA PHY stats to smartctl_device_statistics. 2020-07-26 23:35:22 +03:00
Octavian Cerna
3feff84fbb Add a new metric smartctl_device_state for the device state from ATA SCT. 2020-07-26 22:54:54 +03:00
Octavian Cerna
9e10465744 Add a new metric smartctl_device_status for the SMART health status. 2020-07-26 22:31:04 +03:00
Zoltan Langi
9666ec9296 fixed format 2019-12-19 12:17:10 +01:00
Zoltan Langi
965204547a I've added support for NVMe drives and also created a docker file so a container can be built.
I've added the following metrics for the NVMe drives:
smart_status, critical_warning, available_spare, media_errors
2019-12-19 11:17:35 +01:00
Горлов Максим
a3cc59ddda short and long flags, device statistics 2019-08-17 13:18:48 +03:00
Горлов Максим
beb765eb1a option for set minimum time period between run smartctl; smartctl info metric 2019-08-16 00:01:16 +03:00
Горлов Максим
de13d91241 systemd service 2019-08-15 00:04:32 +03:00
Горлов Максим
9e6e240e85 First commit 2019-08-14 23:34:49 +03:00