Top page > Computer relation Documents > table2csv.pl

[japanese] [english]

The Perl script which changes a table2csv.pl-table tag into csv

DATE:2006/03/11
UPDATE:2006/03/20
WRITTEN BY chihiro <at> dream <dot> com

the start - is it in this?

It is the Perl script which changes into csv the table constructed with the table tag in the html document.
The table currently drawn with the HTML table tag was made in order to change into csv data.

Although he wanted the script of this sort and it searched, since it was not found easily, it creates reluctantly.
a disposable script -- it is -- although made suitably -- to this extent -- also coming out -- he thinks that it can use for a while, and since it was also wasteful to have thrown away, it opens to the public.

The usage

Please let me read an HTML document from standard input as follows.
csv data is outputted.

# cat <html file name> | nkf -e | perl ./table2csv.pl
# w3m -dump_source <URL>| nkf -e | perl ./table2csv.pl

Command option

There is no option.
Only the standard input of a HTML tag corresponds 2006/03/20 now.

Installation

Special installation work is unnecessary.
Please develop and enable it to perform suitably as follows.

  1. Deployment of an archive
  2. The path of perl of the 1st line is adjusted (#!/usr/bin/perl).
  3. Grant of execution authority ($ chmod 755 table2csv.pl)
  4. Execution check ($./table2csv.pl)

About the form of CSV outputted

FAQ

Q) When intension of two or more tables is separately carried out into HTML, what does CSV outputted become?
A) It is outputted at a time in [ one ] the continuous text.

Q) How is the table tag which became a nest outputted?
A) Although it is unidentified, since it is not assuming and making, it cannot surely process appropriately.

Q) The double quotation mark in data has changed to two double quotation marks.
A) At the time of a csv output, correct such and store.

csv The data part is divided with the double quotation mark like "-", "-", and "-."
In order to distinguish the thing in data, and the thing of a pause of data, since it will seemingly be common to make it such, it makes it such by csv generation.

Q) Environment where of operation it checks.
A) The author is checking in the following environments.

Distribution: Gentoo Linux
Kernel: 2.4.31-gentoo-r1
Perl: v5.8.7

Q) It is not anyhow outputted to CSV well...
A) It gives a help and is regrettable.

Supposing this script does not operate well, it is in <table>- </table>. It is considered the case where <tr>-<td>-</td>- </tr> is not described. If even description is carried out appropriately, CSV-ization will be carried out and should check description of a tag. about honest on a browser -- if it が, although it table-constructs, and he will think that it is OK

Type mistake of a tab (ex.<td>-> <th>)
It closes and is un-describing [ of a tab ].

If many things are seen, が It seems that it may have come to be alike or there may be a type mistake.
If there is a regular misprint, it may be able to avoid by setting up substitution with a pretreatment.
Since the contents of a script are not difficult, either, please see, and KOPIPE and add the place appropriate for it.

In addition, probably, it will also be good to have you adjust so that it may move well, since a script is simple.

Guaranteeing [ no ].

"Software" and this license document are offered in a present condition owner figure (as is), and do not have a guarantee of any kinds. A design, salability, and the conformity to the specific purpose are included in this.

Download

Revision history

counter: can not open datafile