|
http://www.troywolf.com/articles
I've written what I think is a high-quality PHP class for screen-scraping external (or internal) web content. The class includes features to cache scraped content for any number of seconds. So for example, if you want to show stock market data on your site that you scrape from a third-party, you can easily set it so your site hits the source site for fresh content no more than once every 5 minutes. This caching is seamless to your script--you don't have to worry about it.
The class includes a companion script named image_cache.php that can be used as the src attribute in img elements to cache images from external sites. This is useful if you want to incorporate an image from an external site that is dynamically updated on a regular basis. For example--stock charts that are generated every 60 seconds on many websites. This allows you to use their image on your site, but with the caching feature, you don't have to hit their site everytime somebody hits yours.
The class also has 2 static methods that make it simple to extract data from HTML tables. One extracts a table into an array and the other into XML.
The class can perform basic authentication allowing you to scrape protected content. It also cloaks itself as the User Agent of the user requesting your script. This allows you to access content that may normally be blocked to non-browser agents.
The article explains in detail how to use the class, and is itself, a good tutorial for many techniques in PHP.
If you have any comments or questions about this class or the article, let's discuss it here. I'm always wanting to learn more, so lets discuss. I hope you find the class and the article useful.
|
|
|
|
|
For your added convenience, you can tell Word to save your files in the same location as the .htm file, instead of a folder in the Web Options window. The .htm file contains everything you need to know about the document, from text and properties to formatting information and information about the embedded images free business cards.
It’s likely that the image file has been resized to place within the Word document. In that case, the folder will have both versions – the original and the resized version. They will be saved right next to each other, with the original placed first. Word will keep the file format of the images, but will rename the files in the order in which they appear in the document insurance quote.
If you only care about the resized images, however, you can save the Word document as a “Web Page, Filtered” instead of a “Web Page.” Both are available as options in the “Save As” drop-down.
When the document is saved from a Word document to a web page format, depending on the settings of your Word program, the images may be saved in a different file format, such as .gif, in addition to the original file format short term health insurance. You can select the file you want, and transfer it to a different file location for use.
Whichever method you are able to use, you will find it to be a quick and easy task when you want to save images embedded in Microsoft Word documents as separate files real estate listings.
|
|
|
|
|
|
|
|