Webthumb Wrapper for Python
Wednesday, June 13th, 2007Rossp.org has a API Wrapper for Webthumb written in Python. If your a python user take a look.
The weblog of Joshua Eichorn, AJAX, PHP and Open Source
Rossp.org has a API Wrapper for Webthumb written in Python. If your a python user take a look.
If your a WebThumb user and would like to help beta test the new server please let me know.
The new server offers higher capacity and shouldn’t require any changes to your apps, but download urls have changed so i’d like to do some testing to verify that before I do the switch over.
Send me an email if you’d like to get involved.
The WebThumb contest is now closed, the winners will be announced in the next couple days.
The WebThumb content ends on the 15th. I know there are couple projects out there that haven’t officially entered. If you want to have a chance to win an upgraded Webthumb account make sure to get your entries in, the contest website has all the details.
Also don’t forgot about the EasyThumb api, its a very simple to use API and would be great for Wordpress search plus thumbnails or thumbnails for comment urls.
Updated
WebThumb added a new feature this week called Easythumb. Like the rest of webthumb its still in beta but it makes adding thumbnails too a site much easier.
Easythumb gives you a simple interface to WebThumb letting people with minimal PHP skills integrate WebThumb into there site. It does have a couple drawbacks, thumbnails are always cached (if someone else requested that same url within 24 hours then you get that thumb instead of making a new request), the output sizes are smaller, and some other options won’t be available.
Easythumb lets you generate a thumbnail using an normal http GET call, letting you put the request right in an img src. Easythumb thumbs are always cached, the default cache is 1 day, you can also use a 7 day and 30 day cache.
If a thumbnail is served from the cache it only use 1/5 of a credit
If the thumbnail is generated it uses 1 credit like normal
Note that currently Easythumb only generates thumbnails of sizes small, medium, and medium2
I have a small example script that shows Easythumbs usage it uses a little JavaScript trick to add a loading message. To use it on your site you just need to register a webthumb account. Copy this sample where you want it and update the parameters.
Official contest rules for the WebThumb content are now up.
The contest runs until November 30th, and the top 5 submitters will all win upgraded WebThumb accounts. The top submitter will also win a copy of my book, Understanding AJAX.
I’ve you already commented on my last post about the contest make sure to get an official submission in before the 30th.
Cal Evans over at the Zend Developer Zone has written a very nice PHP5 wrapper and article about WebThumb. Read it for details and to get the code.
Oh and the WebThumb contest will be running until November 30th, all you have to do to win a free WebThumb account is to make something cool with WebThumb and release the code under a Open Source license. I’ll have more details and an official page up about it latter in the week.
If you wondered an API that requires polling isn’t a very good thing for scalability. On my current setup I can pretty easily handle about 20 status requests per second on top of my normal traffic, the problem is its not hard for a bad polling implmentation being run by one user to make that many requests.
To solve this problem im adding an addition to the Webthumb API that will allow you to skip polling all together. The basic idea is that your make an API request and when your thumbnail is complete i’ll make a GET request back too your server telling you that the request is complete.
So on the request side its just a matter of including an notify tag in your request block. An example is shown below:
<webthumb>
<apikey>apikeyhere</apikey>
<request>
<url>webthumb.bluga.net</url>
<notify>http://webthumb.bluga.net/sample/notify.php?secret=blahblahblah</notify>
</request>
</webthumb>
I’m including a secret variable in the URL so that if someone found the URL of my notify script they couldn’t DOS me by making me download 100’s of different thumbnails from the server.
I’ve written a basic notify script to get you started. Feel free to use this script as a basis for whatever you need.
Update: Paths to thumbnails have changed so the download code listed here wont’ work. The new directory hash is:
<?php
substr($id,-2).'/'.substr($id,-4,-2).'/'.substr($id,-6,-4)
?>
Which means if your job id is: wt4761c8f914559 then the directory is: http://webthumb.bluga.net/data/59/45/91/
<?php
// this is a really simple notify script
// it downloads the specified thumbs for a job and puts them in the storage dir
// if you want to store files based urls etc you'll need to store the id at request time
// then do a mapping in this script
// download options
// zip - all sizes in a zip
// zipAuto - zip download and auto uncompress
// large - 640x480
// medium2 - 320x240
// medium - 160x120
// small - 80x60
$downloadType = 'zipAuto';
// secret id im using to make sure no one has me download every thumb etc
$mysecret = 'changeme';
// directory to write files too
$storageDir = 'tmp';
// webthumb base url
$url = 'http://webthumb.bluga.net/data/';
// unzip command
$unzipCommand = 'unzip';
// END CONFIG
if (!isset($_GET['id']) || !isset($_GET['secret'])) {
exit;
}
if ($mysecret == 'changeme') {
echo "Configure notify script";
exit;
}
$jobId = $_GET['id'];
$secret = $_GET['secret'];
if ($secret !== $mysecret) {
echo "bad secret";
exit;
}
$jobDir = substr($jobId,-4);
switch($downloadType) {
case 'zip':
case 'zipAuto':
$file = "$jobId.zip";
break;
default:
$file = "$jobId-thumb_$downloadType.jpg";
break;
}
// this is the simplest possible download code, curl, PEAR http_request might be better
// will only work if allow_url_fopen is on
$contents = file_get_contents($url.$jobDir."/$file");
file_put_contents($storageDir."/$file",$contents);
if ($downloadType == 'zipAuto') {
exec("cd $storageDir && $unzipCommand $file");
unlink($storageDir."/$file");
}
?>
Let me know if you find any major bugs in the code. There are always going to be cases where the polling API makes more sense (command line utils etc) but I think this notify API should work great for any application integration.
Ok so one this to remember when using API is don’t use it stupidly.
So for some reason i got a couple users who are making 20 requests to webthumb a second.
This hurts performance for everyone and doesn’t get you results any faster.
No matter how often you poll webthumb you images won’t get completed any faster.
Also if you making a large number of requests, please combine the requests, so you make a request for the status of 20 jobs at once instead of making 20 requests.
I’ll be fixing these problems at the firewall level, if you end up blocked send me an email and will work out a solution that doesn’t involve making my server cry.
The webthumb API seems to be running pretty stable these days. But it doesn’t have a lot of users yet, so nows your chance to change that.
I’m running a short contest giving out webthumb accounts with higher API limits as prizes. You can win one of these prizes by adding the best webthumb integration to any open source software. One options might be to add a wordpress plugin for making thumbnail links.
The top 5 projects will recieve lifetime accounts with a 5000 per month limit
I will also be giving away 500 per month accounts to anyone who finds a bug in Webthumb and lets me know about it.
To submit your project just add a comment to this message. Also if you need it, I can provider SVN and wiki hosting for any webthumb related projects.
Note:
Base webthumb accounts can generate 250 thumbnail accounts per month.
The only current known bug is that some urls aren’t validated correctly, you won’t get an upgraded account for submitting this again. This is a bug in PEAR::Validate so if anyone is knows of a replacement or wants to figure out exactly what is going on let me know.
Over the weekend Hansin wrote a PHP Webthumb API wrapper and released in under open source. Webthumb seems to be handling the load pretty well, its already generated 2600+ thumbnails today.
So if you want an easy way to generate webpage thumbnails right from your PHP applications now you have one.
I’ll be looking at some various rate limits and queue adjustments over the next week to keep one user from clogging up the queue too much. Until then be nice, and enjoy.
Webthumb now has an API. Its a simple REST API based on posting some simple XML.
It gives you the ability to request thumbnails, check on there status, and even download files.
To use the API register and then goto your user page, your API key is shown there.
Basic docs on the Webthumb API are available.
If you have any comments or suggestions leave them as comments.
The API is currently in beta so im sure there will be some changes and additions, but its a very usable state. If someone wraps the API for easy PHP access i’d be interested in hosting the code so if you write a wrapper please get in touch with me.
I’ve upgraded webthumb, adding in login support and given logged in users the ability to quickly submit multiple webthumb requests. Logged in users also have a recent thumb bar on their user page, and will be getting more historical features over time.
I’m also getting close on the API front, in reality its really already there, since you can submit requests using AJAX. Mainly I need to figure out a general authorization plan, and get a the code exposed in a formal way.
To check out the new webthumb features register and then login and visit the user page.
Also also still trying to figure out what the terms of service will be on the API. I’m currently thinking about 200 thumbnails free and then $.05 per thumbnail after that. I will be rolling things out for testing purposes without any commercial options, but your feedback would be appreciated if your interested in generating a large # of thumbnails.
Update
This came up in the comments, and I responded there, but i think it makes sense to update the post as well.
AWS sells thumbnails are a much cheaper rate. My reasons for the higher rate are:
What sort of premium do people think thats worth, or are these mainly the type of features people want for low use activities.
Webthumb uses a custom browser thats embeds the mozilla rendering engine to take its website screenshots.
We call that custom browser pageprint, and its svn repo is now available. The browser was originally written to generate PDF’s for Clearhealth so most of my recent development has centered around that. If anyone has tips on packaging or making the code work with newer embedding sources instead of mozilla 1.7.x i’d be most appreciative.
I know for certain the code won’t work with seamonkey 1.0.3
Its also pretty raw, but the binary in svn should work with centos and debian without any problems.
Webthumb got off the ground yesterday without too many problems. It ran into problems right out of the gate since a large number of sites don’t seem to want to finish loading correctly. I was able to fix this by adding an extra timeout handler to the browser code. If we stop making progress loading a site for 5 seconds we just give up and snapshot what is currently loaded.
I also updated the server config, Webthumb now supports flash (version 7), and has a lot more fonts installed so sites that need non-latin characters should be working correctly.
My current feature goals are:
Better Url’s, something like /thumbs/url/time
5 minute cache on thumb generation (if anyone else has requested the same site in the last 5 minutes just return those images)
A way to get a thumb of a site without going through the queue, if its already been generated
A way to search for current thumbs
User accounts with:
Snap at time
Snap recurring
I’m also looking at prototying and API for access, though it may require paying a fee to use.
If you have any feature suggestions or input on an API leave a comment.
So about 11 months (wow that long time ago, feels like last month) I used some code I had lying around to make some thumbnails of all the planet php blogs. Work priorities has finally allowed me to work on that code base again, and I was able to get things up to a usable state.
The result of this, is a new project. A website that lets you grab snapshots of any website at any time. The result is 3 different images, one at 800×600, one at 160×120 and one at 80×60.
I took a snapshot of mywebsite as an example.
Webthumb is still on the experimental side and the UI could use some improvements but its an interesting experiment so I thought I share it in its current state.
Now im guessing you’ll have a couple questions, so i’ll attempt to guess at a couple and answer them now.
How does it work:
I have a custom mozilla browser written in c++ that takes a url as a command line input and outputs a snapshot. This is wrapped in a bit of PHP code to generate 1 thumbnail at a time from a queue of requests.
Are you releasing the code:
The custom mozilla browser will be released under the GPL license by Uversa at op-en.org, but its not a priority right now. Im not planning on release my hack of a PHP website that it runs under at any time :-).
Is there a webservice API:
I’d really like to make one, but I don’t have deep pockets and I don’t have a solution to making a system like that pay for itself.
I’ve been working on reviewing copyedits and finishing up my last writing tasks this week.
I’m just finishing up my 4th chapter of copyedit reviews tonight, and I got one of the Appendices and most of the preface done last night. That just leaves 2 more appendices (Standalong JavaScript AJAX Library, and Server Tied AJAX libraries) some preface clean ups and a couple hundred pages of copy edit reviews before the book is done.
So close but yet so far away.
Updating my blog today got me thinking about what the other planet-php blog out there looked like. I was in a great spot to do a quick overview since I happened to have a script that makes the thumbnails using an embedded mozilla (this script is a long story but sufice it to say this was a nice simple one in python, but I have have a little c++ app that uses mozilla to make pdfs).
After that it was just a matter of getting a list of all the planet-php websites, oddly enough planet-php doesn’t export this list in any xml format that I could find, but since the DOM extension in php5 can load up HTML it wasn’t a big deal. I just copy and pasted the source for the list into a file and I was ready to go.
After that it was just waiting, and fixing problems. The main ones where George Schlossnagle’s blog which always even though I could load it in my browser and Harry Fuecks’ phpPatterns site since it seems to be dead. Other then that it was just cleaning up escaping.
And now for the fun part, take a look at thumbnails of everyones sites.
At some point I’d like to move this python code over to my C version and then setup a page to give anyone the ability to make thumbnails but, since each running script requires its own Gecko engine i’m afraid i’ll kill myself performance wise.
This little project was the beginning of a long journey, I have a webpage thumbnail service now with an API, if you want to do something similar check it out.