Featured image of post Secrets hunter

Secrets hunter

How to hunt secrets in gitlab (or in a git repository)

Introduction

Searching for secrets such as passwords, authentication tokens, API keys, AWS keys, etc. in Git repositories is of crucial importance for several reasons linked to IT security and the protection of sensitive data:

  • Protecting sensitive information: Secrets, such as passwords and API keys, are used to access sensitive resources or specific services. If they are exposed publicly in a Git repository, this can lead to potential security breaches and compromise data confidentiality and integrity.

  • Attack prevention: Hackers actively seek out secrets exposed in Git repositories to gain illegal access to sensitive systems and data. Proactively searching for these secrets helps prevent potential attacks before they happen.

  • Security compliance: Numerous security regulations and standards, such as the RGPD (General Data Protection Regulation), require organizations to take steps to protect sensitive information. Detecting and properly managing secrets exposed in Git repositories is essential to comply with these requirements.

  • Protecting cloud service access keys: Cloud service access keys, such as AWS keys, are used to access an organization’s cloud resources and services. Exposing these keys in Git repositories can result in high costs due to unauthorized use of cloud services.

  • Securing open source projects: Many open source projects are hosted on version control platforms such as GitHub. Searching for secrets in these projects is essential to ensure their security and prevent malicious actors from exploiting vulnerabilities.

  • Developer security awareness: By actively searching for secrets in Git repositories, developers are made aware of the risks associated with the accidental inclusion of sensitive information in source code. This encourages them to adopt good secret management practices.

Some tools are used to hunt that kind of secrets. Here are 3 tools on which I will focus :

NoseyParker

NoseyParker is an open source tool designed to search and identify sensitive information and secrets in public code repositories on GitHub. It is primarily a code scanning tool that seeks to detect private information that may be inadvertently exposed in public source code. Although NoseyParker is useful for public repositories, it is not suitable for private repositories or other version control systems.

GitLeaks

GitLeaks is another popular open source tool used to search for secrets and sensitive information in Git repositories. It supports both public and private repositories, making it more versatile than NoseyParker. GitLeaks works by performing a static analysis of Git repositories to detect potentially exposed passwords, API keys, authentication tokens and other sensitive information. Detection rules can also be customized to suit the specific needs of each project.

TruffleHog

TruffleHog is an open source security tool designed specifically to detect sensitive secrets that might be exposed in Git repositories. It performs an in-depth search of the entire repository history, enabling it to find sensitive information even if it has been removed from the current code. TruffleHog is capable of detecting a wide range of sensitive information, including encryption keys, passwords, API keys and other types of secrets. It is particularly useful in collaborative development environments where several people may contribute to the code and accidentally introduce sensitive information.

Installation

All tools are installed by using asdf (TBD). More installations possibities are given in detail, for each applications.

NoseyParker

The installation can be done with pre-built binaries, docker images or from source.

Results are stored in a datastore, a SQLite database.

Scan

1
noseyparker scan --datastore ~/tmp/gitlab-dump.db .

An example of output returned by that command

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
 Rule                                       Distinct Matches   Total Matches 
─────────────────────────────────────────────────────────────────────────────
 Generic Password (single quoted)                        142           2,650 
 Generic Secret                                           95           1,409 
 Generic API Key                                          92             826 
 JSON Web Token (base64url-encoded)                       70             407 
 PEM-Encoded Private Key                                  63             141 
 Generic Password (double quoted)                         59             515 
 Generic Username and Password (unquoted)                 31             116 
 AWS API Key                                              30             279 
 bcrypt Hash                                              24              63 
 Generic Username and Password (quoted)                   22              92 
 Sauce Token                                              16              40 
 netrc Credentials                                        13              93 
 AWS Secret Access Key                                    11              71 
 AWS Account ID                                           10             260 
 Credentials in ODBC Connection String                     6              30 
 OpenAI API Key                                            4               5 
 GitLab Personal Access Token                              4              55 
 Slack Webhook                                             3              31 
 Google OAuth Client Secret                                3              73 
 Google Client ID                                          3               4 
 Shopify Domain                                            2               3 
 Google API Key                                            2               5 
 Slack                                                     1               1  
 CodeClimate                                               1               7 

Reporting

Multiple output formats are available : human, json, jsonl and sarif (Static Analysis Results Interchange Format).

1
noseyparker report --datastore ~/tmp/gitlab-dump -f jsonl

The output of the report, in an human format,

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Match: AKIA111111111EXAMPLE

    Occurrence 1/2
    Git repo: gitlab-dump/repo01/.git
    Blob: af5626b4a114abcb82d63db7c8082c3c4756e51b
    Lines: 870:31-870:50

        "UserName": "Alice"
                },
                "output": {
                  "AccessKeyMetadata": [
                    {
                      "AccessKeyId": "AKIA111111111EXAMPLE",
                      "CreateDate": "2018-12-01T22:19:58Z",
                      "Status": "Active",
                      "UserName": "Alice"
             

    Occurrence 2/2
    Git repo: gitlab-dump/repo01/.git
    Blob: af5626b4a114abcb82d63db7c8082c3c4756e51b
    Lines: 699:31-699:50

        "UserName": "Alice"
                },
                "output": {
                  "AccessKeyMetadata": [
                    {
                      "AccessKeyId": "AKIA111111111EXAMPLE",
                      "CreateDate": "2018-12-01T22:19:58Z",
                      "Status": "Active",
                      "UserName": "Alice"
             

Pros / Cons

  • Pros:
    • Time to scan. Here is an example
1
2
3
Found 42.37 GiB from 293,195 plain files and 514,137 blobs from 632 Git repos [00:00:04]
Scanning content  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 100%  42.37 GiB/42.37 GiB  [00:01:25]
Scanned 34.92 GiB from 694,395 blobs in 85 seconds (419.42 MiB/s); 7,176/7,176 new matches
  • Cons: TBC

Issues I met

TBD

Gitleaks

Pros / Cons

TBD

Issues I met

TBD

TruffleHog

Pros / Cons

TBD

Issues I met

TBD

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy