Skip to content

Instantly share code, notes, and snippets.

@timroster
Last active December 3, 2019 03:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save timroster/c55d047e0725e3be37a69ee579964d59 to your computer and use it in GitHub Desktop.
Save timroster/c55d047e0725e3be37a69ee579964d59 to your computer and use it in GitHub Desktop.
Run the IBM Code Pattern "Classify ICD-10 data with Watson" using IBM Cloud and docker containers.

Classify medical diagnosis with ICD-10 code

DISCLAIMER: This application is used for demonstrative and illustrative purposes only and does not constitute an offering that has gone through regulatory review. It is not intended to serve as a medical application. There is no representation as to the accuracy of the output of this application and it is presented without warranty.

This application was built to demonstrate IBM's Watson Natural Language Classifier (NLC). The data set we will be using, ICD-10-GT-AA.csv, contains a subset of ICD-10 entries. ICD-10 is the 10th revision of the International Statistical Classification of Diseases and Related Health Problems. In short, it is a medical classification list by the World Health Organization (WHO) that contains codes for: diseases, signs and symptoms, abnormal findings, complaints, social circumstances, and external causes of injury or diseases. Hospitals and insurance companies alike could save time and money by leveraging Watson to properly tag the most accurate ICD-10 codes.

This application is a Python web application based on the Flask microframework, and based on earlier work done by Ryan Anderson. It uses the Watson Python SDK to create the classifier, list classifiers, and classify the input text. We also make use of the freely available ICD-10 API which, given an ICD-10 code, returns a name and description.

This lab will use docker containers to provide access to tools used for the lab including git, python runtime, and the IBM Cloud command line interface. To use this lab, you will need a copy of Docker for your workstation.

When the reader has completed this pattern, they will understand how to:

  • Create a Natural Language Classifier (NLC) service and use it in a Python application.
  • Train a NLC model using csv data.
  • Deploy a web app with Flask to allow the NLC model to be queried.
  • Quickly get a classification of a disease or health issue using the Natural Language Classifier trained model.

Flow

  1. CSV files are sent to the Natural Language Classifier service to train the model.
  2. The user interacts with the web app UI running either locally or in the cloud.
  3. The application sends the user's input to the Natural Language Classifier model to be classified.
  4. The information containing the classification is returned to the web app.

application flow diagram

Included Components

Featured Technologies

  • Artificial Intelligence: Artificial intelligence can be applied to disparate solution spaces to deliver disruptive technologies.
  • Cloud: Accessing computer and information technology resources through the Internet.
  • Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.

Initial setup

Here we create the classifier with our ICD-10 dataset.

  1. Create a directory, open a terminal (bash on Mac, Linux, PowerShell on Windows) and change into this directory.

  2. Clone the project to this directory

docker run -ti --rm -v "$(pwd):/git" alpine/git clone https://github.com/IBM/nlc-icd10-classifier.git

  1. Change into this directory cd nlc-icd10-classifier

  2. We'll be using ICD-10-GT-AA.csv dataset in the data folder

    Note that this is a subset of the entire ICD-10 classification set, which allows faster training time

  3. Go to the IBM Cloud dashboard and Create a Natural Language Classifier service instance by selecting Catalog and then typing in "Natural Language Classifier" in the search panel. Select the tile and create the service using the Standard plan, make a note of the service name used in the catalog, we'll need this later.

    Note: The NLC service only offers a Standard plan, which allows:

    1 Natural Language Classifier free per month.
    1000 API calls free per month
    4 Training Events free per month

    After that, there are charges for the use of the service when using a paid account.

  4. When the instance is created you will see a screen where you can copy the service credentials. Copy the API key for later use. Service Credentials panel

  5. Export the username and password as environment variables and then load the data using the command below. If you have an API key, use apikey for the username and the API key for the password. This will take around 4.5 hours. Since this takes some time, for testing your application, you can use a pre-trained copy of the service instance.

    docker run -it --rm -v "$(pwd):/repo" timrodocker/mydev bash
    export USERNAME=apikey
    export PASSWORD=<apikey_from_credentials>
    export FILE=repo/data/ICD-10-GT-AA.csv
    
    curl -i --user "$USERNAME":"$PASSWORD" -F training_data=@$FILE -F training_metadata="{\"language\":\"en\",\"name\":\"ICD-10Classifier\"}" "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers"
  6. After running the command to create the classifier, note the classifier_id in the json that is returned:

    {
      "classifier_id" : "ab2aa6x341-nlc-1176",
      "name" : "ICD-10Classifier",
      "language" : "en",
      "created" : "2018-04-18T14:09:28.403Z",
      "url" : "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/ab2aa6x341-nlc-1176",
      "status" : "Training",
      "status_description" : "The classifier instance is in its training phase, not yet ready to accept classify requests"
    }

    and export that as an environment variable:

    export CLASSIFIER_ID=<my_classifier_id>

    Now you can check the status for training your classifier:

    curl --user "$USERNAME":"$PASSWORD" "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/$CLASSIFIER_ID"

    You can exit the developer container by typing exit in the bash prompt, or keep the terminal open to check on the training status. The model will need to finish training before performing the Run on IBM Cloud steps below.

Steps to run the application

This application can be run locally or hosted on IBM Cloud, for today's workshop, we will use the local option.

Run locally

  1. Using a text editor open the env.example and update using the NLC credentials of the pre-trained service provided by the instructor. Comment out the other credential environment variables.

    # Replace the credentials here with your own using either USERNAME/PASSWORD or IAM_APIKEY
    # Comment out the unset environment variables
    # Rename this file to .env before running welcome.py.
    
    #NATURAL_LANGUAGE_CLASSIFIER_USERNAME=<add_NLC_username>
    #NATURAL_LANGUAGE_CLASSIFIER_PASSWORD=<add_NLC_password>
    
    NATURAL_LANGUAGE_CLASSIFIER_IAM_APIKEY=<add_NLC_iam-apikey>
  2. Save this file in the same directory as .env

  3. From a terminal in the nlc-icd10-classifier directory, start a developer container with the bash prompt and then change the directory in the container into the repository.

    docker run -it --rm -v "$(pwd):/repo" -p 5000:5000 timrodocker/mydev bash
    cd repo
  4. Run pip install -r requirements.txt to install the app's dependencies

  5. Run python welcome.py

  6. From your workstation access the running app in a browser at http://localhost:5000

Run on IBM Cloud

Before using these steps, you will need to have a trained instance of NLC deployed in your IBM Cloud account. Use the service name from the Initial setup section for the service alias creation command below.

  1. Using a text editor, update manifest.yml with the NLC service name (your_nlc_service_name_alias), a unique application name (your_app_name) and unique host value (your_app_host)

    applications:
      - path: .
      memory: 256M
      instances: 1
      domain: mybluemix.net
      name: your_app_name
      host: your_app_host
      disk_quota: 1024M
      services:
      - your_nlc_service_name_alias
      buildpack: python_buildpack
  2. Start the developer container and change into the respository folder:

    docker run -it --rm -v "$(pwd):/repo" -p 5000:5000 timrodocker/mydev bash
    cd repo
  3. Log in to the IBM Cloud cli, set the command line environment to the Cloud Foundry organization and space where you will deploy the application and create an alias for the service instance in the space:

    this example uses the IBM Cloud US-South API endpoint, adjust as needed for the target IBM Cloud instance or skip providing the -a option to be prompted for the region.

    ibmcloud login -a https://api.ng.bluemix.net
    ibmcloud target --cf
    ibmcloud resource service-alias-create your_nlc_service_name_alias --instance-name your_nlc_service_name
    ibmcloud service list
    
  4. Deploy the application using ibmcloud app push from the repo directory

  5. Access the running app by going to: https://<host-value>.mybluemix.net/

    The domain 'mybluemix.net' may need to be adjusted if you are using a different IBM Cloud endpoint from the public US South instance.

Sample Output

The user inputs information into the Text to classify: box using a sentence and the Watson NLC classifier will return ICD10 classifications with confidence scores. Here is the output for the input Patient experienced a spontaneous pneumothorax:

{
 "url": "https://gateway.watsonplatform.net/natural-language-classifier/api/v1/classifiers/122656x456-nlc-2030",
 "text": "Patient experienced a spontaneous pneumothorax",
 "classes": [
     {
         "class_name": "J93",
         "confidence": 0.9675855932655525
     },
     {
         "class_name": "E50",
         "confidence": 0.0023213856437448227
     },
     {
         "class_name": "M66",
         "confidence": 0.0021802446968000374
     },
     {
         "class_name": "J95",
         "confidence": 0.0018309330492616877
     },
     {
         "class_name": "B15",
         "confidence": 0.0017647055823487574
     },
     {
         "class_name": "H91",
         "confidence": 0.0016124259542083194
     },
     {
         "class_name": "M26",
         "confidence": 0.0014885345465333328
     },
     {
         "class_name": "E67",
         "confidence": 0.0011335815247201268
     },
     {
         "class_name": "J09",
         "confidence": 0.0010108590850576899
     },
     {
         "class_name": "H90",
         "confidence": 0.0005063800880195048
     }
 ],
 "classifier_id": "122656x456-nlc-2030",
 "top_class": "J93"
}

Links

Learn more

  • Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our other AI Code Patterns.
  • AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos
  • With Watson: Want to take your Watson app to the next level? Looking to utilize Watson Brand assets? Join the With Watson program to leverage exclusive brand, marketing, and tech resources to amplify and accelerate your Watson embedded commercial solution.

License

Apache 2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment