Creating reports with Tanagra
Reporting is a true test of differentiation between software data mining vocational and those derived from research. To a practitioner (eg research officer), it is important to be able to easily recover the result of his work in a process text or in a slideshow. The case becomes particularly interesting when it already has an output in spreadsheet format. Indeed the results are often presented in the form of various tables and possibly graphics. The ultimate is to define in advance models of relationships that are fed only at the end of the calculations and that can be printed directly. For the researcher who develops tools, it is all well and good, but this is absolutely not recoverable academically. I see very bad for me to propose an article in a magazine showing that I am able to automatically integrate 3D pie charts in a PDF file. Of Indeed, the tools developed by researchers often simply outputs text, certainly comprehensive, but not presentable in the state reports to be disseminated widely. The outputs of R or Weka are a good example.
Tanagra, created by a teacher researcher, follows the same approach. Nothing was initially planned for the reporting. And yet, paradoxically, he suggested in one of its menus (DIAGRAM / CREATE REPORT) a tool for creating reports. This is the happy consequence of technology choice made when writing the specifications of the software.
Let's go back to understand the process. When I wrote SIPINA (version 3.x), I realized that the construction of the display windows of the results took me a lot of time, rather than writing algorithms for calculations. In my view, this was not a good thing because I am away from my main concern: to understand the methods, implement, evaluate, discuss. When I thought about the specifications of Tanagra, I thought it was absolutely necessary to define a display standardized necessarily with text output, but nevertheless have a relatively attrayante. Et là, j'ai redécouvert le HTML. C'est un peu amusant à dire, surtout en 2003. Le HTML permet de faire un effort minimum de description des sorties, une seule méthode dans la classe de calcul suffit (un peu comme Weka pour ceux qui sont allés voir le code source), tout en obtenant une présentation avenante. De plus, il est possible de mettre en évidence les informations importantes à lire en priorité. Par exemple, rien que pouvoir attribuer des codes couleurs à des tranches de p-value est infiniment précieux.
Par la suite, j'ai réalisé que le choix du HTML allait s'avérer doublement judicieux. En effet, c'est un standard largement répandu. Sans effort de programmation further, we can firstly get the output into an Excel spreadsheet on the other hand, we can export the display windows in an external file and view the results in a web browser, regardless of the software Tanagra. In fact, their distribution is greatly facilitated.
These are the features of "reporting" Tanagra we present in this tutorial.
Keywords: report, reporting, decision tree, C4.5, logistic regression, coding disjunctive, ROC curve, learning sample, test sample, variable selection
Components: GROUP CHARACTERIZATION, SAMPLING, C4.5, TEST, O_1_BINARIZE, FORWARD-LOGIT, BINARY LOGISTIC REGRESSION, SCORING, ROC CURVE
Tutorial: fr_Tanagra_Reporting.pdf
Data : heart disease
0 comments:
Post a Comment