clemens/pipedrive-api-notes.md Secret

## pipedrive-api-notes.md

      
    Raw
  

              pipedrive-api-notes.md
            
          
    Here are a few things that I found odd when working extensively with the Pipedrive API.
Brief description of use case

For context, here's a brief description of our use case:

Our BI team runs various reports across the company. For some reports, we need certain data from
Pipedrive.
Therefore we've built a tool that on a regular basis fetches data from the Pipedrive API and puts
it into a Postgres database (which Tableau can easily interface with).
Part of the data fetching involves dynamically building the schema for the tables, so that we
properly handle custom fields added to Pipedrive.

Here, we've run into problems because the Pipedrive API seems to not be entirely consistent when it
comes to field types and also have other issues.
Field type-related issues

Inconsistent handling of enum fields

enum fields seem to generally have integer values and string labels. Notable exceptions in the
activities resource:

done has boolean IDs (true and false)
busy_flag – same thing
type has string values (e.g. "call", "meeting")

Consequently, this needs manual remapping when scripting around field types.
Inconsistent handling of date/time fields

date and time fields are handled inconsistently. The most obvious examples are in the
activities resource:

due_date says it's a date and it actually is (YYYY-MM-DD)
add_time says it's a time, but it really is a datetime (YYYY-MM-DD HH:MM:SS)
due_time says it's a time and it is, if you consider "time" to be a "time of day" (HH:MM)
rather than a more database-like definition of time (= datetime)
duration says it's a time but apparently it goes beyond values of 23:59 (e.g. we have an
activity with a value of 48:00 here, which presumably means 2 days)

I think this should be handled consistenly: If there's a type date, then it shouldn't sometimes
be used as a date and sometimes as a datetime; if there's a type time, it's fine that it's
treated as a "time of day", but then it shouldn't be used to represent a duration that can go
beyond valid ranges for a time of day (= 00:00 to 23:59).
Inconsistent handling of blank values

Usually, blank values seem to be set as null which is what I'd expect. However, some fields in activities seem to contain an empty string instead (e.g. "duration": "", "lead_title": "", "marked_as_done_time": "" etc.).
Missing boolean type

I'm missing a distinct boolean field type – both for custom fields as well as Pipedrive's own fields (e.g. done and busy_flag in the activities resource). I think true/false concepts are sufficiently common to deserve their own data type.
Field Option IDs

IDs of field options are usually integers – except for status of deals where they seem to be
strings.
Other field-related oddities

Some other minor oddities that I came across:

Several fields that are called *_id counterintuitively return full objects (including the ID)
instead of only the ID (as the name would suggest)
lack of naming conventions, e.g.

user_id given as a parameter references entries of the users resource, but for
organizations it's org_id
a reference to the users resource has field type user (e.g. creator_user_id in
activities), but one to persons has type people (e.g. person_id in activities)


Pipedrive's own numerical fields seem to be int, but our custom fields are generally seem to
be double
All fields have IDs except for monetary fields' currency

Other issues

Lack of granularity when filtering dates/times

It's a bummer that the "time ago" filters stop at 1 week ago and I can't thus filter for values > 6 hours ago, < 30 minutes or similar.
Pagination on "active" resources

There seems to be no "stable" way to handle pagination on large, "active" resources. Example:

I want to fetch multiple pages of activities.
I fetch page 1 from the API.
While fetching page 1, a new activity gets added.
When I fetch page 2, I get a duplicate activity that I've already fetched at page 1 and the newly
added activity never gets fetched.
The problem only becomes worse as the result set and/or number of active users increases.

Of course, it's possible to work around this using a filter, i.e. before running the paginated
requests, create a filter with update_time = the-start-time-of-the-process and then delete the
filter once it's done. But this is quite cumbersome => there should be a better way (e.g. something
that functions like a cursor in databases).
Cumbersome to get Deal Flow

To get data regarding deal flow, we need to fetch the flow for every single deal individually which
is cumbersome, inefficient and time-consuming. It would be beneficial to have at least one of the
following options:

optionally include flow when fetching deals, i.e. something like /deals?include_flow=true
(potentially even with additional settings like /deals?include_flow=true&flow_items=change)
fetch flows for multiple deals in one request, i.e. something like /deals/flow?deal_ids=1,2,3
or alternatively for all deals matching a certain filter
batch requests (similar to what e.g. Salesforce has with its Batch API, see
https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_composite_batch.htm)

Filters API

Creating filters via the API is a bit odd. Why is the complexity of the mandatory and / or root filters not hidden behind some abstraction?
Exports via API

It's a bummer that I can't programatically create and download exports.