WebSwoon Documentation
DisclaimerProgram requirements
Installation
Known limitations
Web sites list
Program's usage
Configuration window
Disclaimer
WebSwoon comes with ABSOLUTELY NO WARRANTY.
This program is free software, you can redistribute it and/or modify it under the terms of the
GNU General Public License v2 (GPL) as published
by the Free Software Foundation. Program requirements
WebSwoon is written in Python and uses wxWidgets. Sorry about the package size,
but several runtime libraries must be installed with program (they will be installed only in program's installation folder).The current version of WebSwoon relies on Internet Explorer and thus requires Windows and IE to be installed. It has been tested with IE6 but should run with older versions. If you want to use a proxy, desactivate javascript, or modify any option related to the browser, do it from the usual IE configuration panel. WebSwoon will use them automatically. Everything, except popup windows which are blocked, will run in WebSwoon browser as if you were using IE alone.
WebSwoon has been tested succesfully with Internet Explorer 6 and 7. It is however currently not possible to use Firefox (Gecko) engine for rendering or any other browser.
Installation
Install the program package in the folder of your choice, then run "WebSwoon" from from the
Windows Start menu. A console version of program is also available in the installation folder,
use webswoon_console.exe --help for more information.Advanced users : All configuration information is in the "webswoon.cfg" file and can be tweaked if you respect the file format and letter case (A is not a).
WebSwoon does not read/write anything in Windows registry. Only some Windows system DLLs used indirectly by program/Python may do this, but it shoudn't happen. WebSwoon will survive to a Windows reinstallation if you install it in a safe folder.
Known limitations
Some WebSwoon limits are known, they are not "bugs" :- Some web sites appear to be longer to capture than others, even if the page seems to be completed. Using the ActiveX IE control makes sometimes hard to know when the page is really fully loaded and displayed. In some rare cases, program has to wait for a delay to expires when no data are received to ensure to get the full page. You can configure this delay as the "Timeout delay" in the configuration, but a too small value may results in captures with missing graphics on some web pages.
- Popup windows from browser window are blocked. However it seems that sometimes some popups are not blocked and can appear when the program switches to another web site.
- Sometimes the browser may show an alert dialog requesting an user choice (from browser itself or from Javascript alerts) These dialogs will block the captures until they are closed manually. Options are available in configuration window to disable these alerts, but it may results in incomplete captures.
Web sites list
The web sites list containing all URLs used to build captures is stored in the file "websites_list.txt".
You can edit it by hand, generate it from a database using an appropriate program, or fill it
directly from WebSwoon.Format is simple : one url by line ending with \r\n.
Program's usage
As the web sites captures are done via InternetExplorer ActiveX control, the browser integrated in
program is compatible with everything that IE can handle on your system (Javascript, Java, Flash...).
It will also use automatically the cookies stored on your IE to access pages requiring them.You can choose to show or hide the browser window. If you want to make some action on a web site before the capture, you can choose to open the browser window and click on a link to skip the front page for example, or close an ads. The capture will be delayed until you don't do anything.
With the browser window opened, you can also ensure that all captures are done correctly. It can also help you in some cases to find and remove broken web sites in your list.
Configuration window
Program panel :- Delete all existing captures before to start : allow you to delete all captured images before to start the captures. Warning ! Enabling this option will delete all files in captures folder.
- Update existing captures older than x days x hours x mins : allow you to specify when program must refresh an existing capture. If you want fresh updates, you can set 1 minute and run program in loop-mode.
- Automatically restart captures when finished after xxx minutes : allow you to run program in loop-mode. When all captures are finished, program will wait for this delay and then restart all captures from the begining. You can abort captures using the "stop capture" option in menu or the "abort captures" option in browser window if it is displayed.
- Interface language : allow you to select language used for program interface. Modifying this option will require program to restart.
Browser panel :
- Open browser window during captures : display the browser window and allow some actions by user during captures.
- Start in auto-capture mode : allow you to disable automatic capture by default in browser.
- Browser width/height : specify size of browser window. Remember that a 800x600 screen is still the standard for building web sites.
- Canva width/height : specify size of browser canva in full page mode only. Web page will be loaded in this large blank area and limits of content will be detected automatically.
[this option has been suspended until further notice] - Ignored URLS : specify which addresses must not be loaded and displayed in the browser window, it's used typically to hide adverts. Multiple URLs must be separated with the ";" character.
- Wait delay after complete page : how many time program must wait before to capture and save the image after that the web site is fully loaded. If you want to wait for a Flash animation to start, you can set 2 or 3 seconds.
- Timeout delay : how many time program must wait for a web site to load. After this delay the capture will be done even if datas on the page are not fully loaded.
- Display browser error window (Javascript warnings) : Allow you to enable warning displayed by browser when there are errors or questions.
- Disable Javascript in browser (avoid blocking Javascript alerts) : Allow to disable Javascript in browser to ignore alerts with yes/no choice for example which are blocking captures.
Captures panel :
- Capture method : Standard view / Full page : allow you to choose the capture method. Standard view will save only what is displayed in browser. Full page will the full height of the page which needs to be scrolled to be viewed. The content limits are discovered automatically. The Full page mode is slower as it requires two passes for each web page to find limits of content correctly.
- Resize capture image : allow you to resize capture image to a specified size, often used to generate thumbnails. In full page mode only the width will used, height will be calculated automatically.
- Capture image width/height : specify the size in pixels of the resized capture image.
- Remove window border and scrollbar in captures : allow you to crop the visible content of the browser window to hide the window border (some pixels around the content) and the vertical scrollbar. Program is however unable to detect if scrollbar is really visible or not and will crop the image anyway.
- Keep margins around content of xxx pixels : allow to keep a blank space around the page content in full page capture mode only.
- Capture image format BMP/JPEG/GIF/PNG : specify image format of saved captures.
- Capture file name format : allow you to personnalize the file name of captures, you can use these options that will be replaced with data in the final files names :
%p : protocol of URL (cleaned of incompatible symbols**)
%u : URL (cleaned of incompatible symbols**)
%e : extention of file (jpg, gif or png)
%y : year date in format YYYY
%m : month date in format MM
%d : day date in format DD
%i : number of the current web site (first web site is 1, second is 2,...)
%z : md5 hash of url (32 bytes in hexadecimal)
** All parasite characters \/:?~&=<> and %2C are replaced with the underscore character "_". To avoid some problems on some web servers, .pl and .php are replaced with _pl and _php in the url.
For example, with the file name format "%p%u_%y-%m-%d.%e" (without quotes), if URL is "http://www.yahoo.com", the capture file name will be "http___www.yahoo.com_2004-06-15.jpg"
Note : the "://" characters after "http" have been replaced with 3 underscores "_" to be compatible with file system.
- Save captures in folder : specify where to save captured pictures.
You can use the Default button to restore default settings in all options. Using the Cancel button will restore your previous configuration.