Handling Incoming Webhooks in PHP
Receive and Respond
The majority of problems I’ve seen or created when working with incoming webhooks is to try to do too much in a synchronous way – so doing all the processing as the hook arrives. This leads to issues for two reasons:
- the incoming web connection stays open while the processing is taking place. There are a limited number of web connections, so once we run out, the next connection has to wait, making the system slower …. you get the idea. This sort of thing is what makes the “hockey stick” graph shapes we see on the web, where things get slower and then make everything else slower and it all snowballs
- if something goes wrong in the middle, you have no way of retrying that piece of data
So my advice is to immediately store and then acknowledge incoming data, then process it asynchronously. The best solution here is to use a queue but if it’s not straightforward to add new dependencies to your application then you can absolutely start off with a simple database. Store a record for each incoming webhook, with some sort of unique identifier, a timestamp of when it arrived, probably some status field to say if it’s been processed, and the whole webhook data payload as you received it. It’s probably also helpful to put some of the key fields from the incoming payload into their own columns such as account number or event type, depending what sort of data you’re handling.
Quick Code Example
Here’s a quick piece of code I use in one of my talks on this topic, using PHP to receive an incoming webhook and store it to CouchDB (adapt as required if you’re not using CouchDB, this would work perfectly well with MySQL as well, this is just from a project that uses CouchDB).
1 <?php 2 3 if($json = json_decode(file_get_contents("php://input"), true)) { 4 print_r($json); 5 $data = $json; 6 } else { 7 print_r($_POST); 8 $data = $_POST; 9 } 10 11 echo "Saving data ...\n"; 12 $url = "http://localhost:5984/incoming"; 13 14 $meta = ["received" => time(), 15 "status" => "new", 16 "agent" => $_SERVER['HTTP_USER_AGENT']]; 17 18 $options = ["http" => [ 19 "method" => "POST", 20 "header" => ["Content-Type: application/json"], 21 "content" => json_encode(["data" => $data, "meta" => $meta])] 22 ]; 23 24 $context = stream_context_create($options); 25 $response = file_get_contents($url, false, $context);
This script starts by trying to guess if we have incoming JSON data or an ordinary form post – and either way creates a $data
array which is the incoming payload of the webhook. It also outputs this for debugging purposes, which helps to see what arrived. If there is any uncertainty about the reliability of the data format, or if you are integrating with a third-party system, you might also want to store the actual contents of file_get_contents("php://input")
verbatim in case they are needed for debugging or debate about who broke what!
With the data in hand, this script sets up a $meta
variable as well, with the additional fields to store (in this case, just a status and the user agent. The database itself will give our record a unique identifier. Finally the POST
request that is set up on line 18 will be how we insert the data to our database.
It isn’t called out explicitly here but when a PHP script completes successfully, it will return a 200 OK response. Note that there’s no additional steps here, no validation or checking of fields, or fetching of extra data. Just accept, and once it’s successfully stored, return a “Thanks!” (or rather, a 200 OK status).
Planning for Processing
With this data in place, then you can process the webhooks asynchronously. If you used a queue rather than the database, then you’ll set up a few workers to process the incoming data. With a solution like the one above, I’d recommend a cron job to pick up unprocessed jobs and actually process the data. You can always webhook back when they are finished if you need to offer notifications of whether the data was successfully received and processed. One more word of advice here: put a limit on how many unprocessed jobs are picked up, and mark them as “being processed”. If the system is under a lot of load then more than one of these processes will be useful so being able to pick up a few waiting jobs each will be useful!
Hopefully the example here helps to illustrate the point I tried to make about the incoming webhooks. For a scalable system, each part of a system wants to be as independent as possible and the tactics outlined here have worked well for me in the past – hopefully they’re useful to you too.
This is a great approach that most people don’t consider.
One piece to keep in mind is what you return to the webhook initially. While most people would think a “200 OK” makes sense, a “202 Accepted” is probably a better fit. Since it explicitly means “we’ve accepted this but haven’t processed it yet” it maps exactly.
This is brilliant advice! Definitely worth a mention so thanks for adding it here :)
Hey Lorna, good read!
Here’s a trick I’ve found to achieve something similar without using workers. In one of my apps, I also need to call some webhooks and given the nature of these webhooks they usually take forever to complete so doing them synchronously is definitely not an option. Also maintaining a set of concurrent workers for this app was not an option either as it would require some supervisord (or similar) which we didn’t want to maintain. What I found that works surprisingly well is the following setup:
You PHP code sends an SNS notification to a topic. There is a lambda function that is subscribed to this topic that makes the actual HTTP calls and in turn publishes the results to another SNS topic that our app is subscribed to (for processing the output of the HTTP calls).
The key thing here is that the initial SNS call take two-digit milliseconds to complete and you can hammer the SNS endpoint massively. In our load tests we ran thousands of concurrent calls to SNS (the AWS PHP SDK handles concurrency for you) and there NEVER were any delays. In this setup it doesn’t matter if your code has 1 or 1000 webhooks to process; it’s always going to be very fast.
A sweet side benefit is that if the webhook fails (some transient HTTP burp, for example), SNS will retry the call for you automatically for as many times as you want with configurable retry attempts and retry-wait (like linear/exponential backoff). Also Amazon will limit the concurrency depending on how your Lambda function and VPC are configured so that’s also something you don’t have to worry about.
This way you get speed when “queuing” the webhooks, you get the output of the HTTP calls, you get automatic retries, and you never had to set up and maintain any queues or daemons.
What about the ones with authentication
I’m not sure what you’re asking here. Add the level of security needed by your application as it makes sense. I still really like to log everything and check signatures or whatever when I’m processing the message later on – but I’ve done those steps on receiving in some applications where it made sense, such as where there was a lot of malicious traffic. Hope that helps!
Here is the webhook data to my website(say, http://www.mydomain.com/incoming-hook.php
My question is: How do i read all these data in php?
This is JSON data, try the json_decode() function in PHP
Can’t you just receive the JSOn and copy it to a folder localy to be able with a cron to later work on it.
If the answer is yes, can you provide a quick exemple of code? :)
That’s what I did here:
[code]
echo “Saving data …\n”;
$response = file_get_contents(“php://input”);
$fp = fopen(‘C:\somefolder\filename_’.date(‘m-d-Y_his’).’.json’, ‘w’);
fwrite($fp, json_encode($response));
fclose($fp);
[/code]
If your webhook is receiving data more often than once per second, the code you provided is ONLY going to store the LAST data received by the webhook during that second since the filename it’s being saved to includes the time the data was saved as part of the filename.
If you use ‘w+’ instead of ‘w’ for your second argument when calling fopen to create the file handle, additional data will be be appended to the file instead of overwriting the contents of the file with the same name. You may want to consider using filenames without a chance of collision to save your data, to ensure you’re not losing data if there is even a remote chance that your webhook could receive more than one request per second.
Also, you’re not going to get a PHP array that can be parsed from the php://input stream by json_encode, unlike the contents of the $_POST superglobal. The data that is being sent to your webhook through the php://input stream is more than likely already going to be JSON, so you can just serialize the stream like this:
// name subsequent files as webook_DATE_TIME-1.json, webhook__-2.json, etc.
$fileCount = 1;
while (file_exists(__DIR__ . ‘/webhook_’ . date(‘mdY_his’) . ‘-‘ . $fileCount . ‘.json’)) {
$fileCount++;
}
// one-liner to save the received JSON to a new file
file_put_contents(__DIR__ . ‘/webhook_’ . date(‘mdY_his’) . ‘-‘ . $fileCount . ‘.json’, file_get_contents(‘php://input’));
Simplified:
[code]$data =($json = json_decode(file_get_contents(“php://input”))) ? $json : $_POST;[/code]
Your post helped me a lot as it did seem client I was testing for was coming in via _POST not json & led me to simplify the logic.