Snake_Byte #22: Keeping Secrets Secret

There is a rise in the number of leaked Electronic Medical Records (EMRs). In 2014-2015, the cost of a partial EMR could fetch around $50. When compared to the going price for credit card data (approx. $1), it's understandable why hackers would choose to target healthcare providers. Attackers can use the stolen data in a variety of ways, including filing fraudulent insurance claims, obtaining prescription medication, or aiding in identity theft [1]. Disregarding for a moment the cost of the records, according to a 2016 report by Ponemon "Cost of Data Breach Study - United States" [2], breaches cost the affected business an average of $221 per record. When you're dealing with large-scale data breaches in the thousands or millions, those numbers can add up quickly.

Electronic Medical Records aren't the only valuable pieces of information, as any information is potentially valuable, especially when dealing with PII/PHI. One repeated theme at PokitDok is "We want to help our customers become (and continue to be) successful." We're trying to help do that. For example, we recently depreciated support for insecure cryptographic protocols (Only TLS 1.2 is currently supported). Additionally, we're helping to protect your data through proactive scanning for sensitive data in public code repositories. Mistakes happen -- sometimes authentication keys get checked in. Maybe someone forked that repo into the public domain. Whatever the cause, early detection and remediation is the best solution.

So, how do we do that? Let's take a look at the Github search API (

We'll be using the /search/code endpoint. Our basic process looks like this:

  • Gather a list of our search parameters. In this case, client_ids.
  • Hit the /search/code endpoint with each client_id.
  • If a result is returned, note its html_url
  • Start off by creating a Personal Access Token in Github:

    Once a personal access token is generated, we'll be using Python and the Requests library.

    With the methods defined, let's iterate through the search parameters:

    At the end, we have output that looks like this:

    Important note! That's a fake client_id, which is kind of the point of this blog post. Afterward (code not published here) we do some internal aggregations and pass that off to the customer success team to make sure the owner of any detected API credentials is aware of the problem and to help them fix it, which basically involves passing along this link and helping to answer any questions.

    There's one other section of code not listed here (for brevity's sake) that ensures we don't exceed Github's API rate limit.

    And if you're interested in seeing just how many healthcare related data breaches there are, the HITECH Act requires that health information breaches involving 500 people or more be publicly reported.