Command line interface

Scrapple is primarily run through a command line interface. The CLI is used to execute the commands supported by Scrapple. For a description of the usage of the Scrapple CLI, the help option can be used.

$ scrapple --help

This presents the usage description and an explanation of the optional arguments provided by the commands.


scrapple (-h | –help | –version)

scrapple genconfig <projectname> <url> [–type=<type>] [–selector=<selector>]

scrapple run <projectname> <output_filename> [–output_type=<output_type>]

scrapple generate <projectname> <output_filename> [–output_type=<output_type>]

scrapple web

-h, --help Show this help message and exit
--version Display the version of Scrapple
--type=<type>, -t <type>
 Specifies if the script generated is a page scraper or a crawler [default: scraper]
--selector=<selector>, -s <selector>
 Specifies if XPath expressions or CSS selectors are used [default: xpath]
--output_type=<output_type>, -o <output_type>
 Specifies if the generated output is stored as CSV or JSON [default: json]

The scrapple tool on the command line is included in the system path when Scrapple is installed. When the tool is run, the input on the CLI is parsed by the runCLI() function.


The starting point for the execution of the Scrapple command line tool.

runCLI uses the docstring as the usage description for the scrapple command. The class for the required command is selected by a dynamic dispatch, and the command is executed through the execute_command() method of the command class.

This functionality is implemented by the following code block -

command_list = ['genconfig', 'run', 'generate', 'web']
select = itemgetter('genconfig', 'run', 'generate', 'web')
selectedCommand = command_list[select(args).index(True)]
cmdClass = get_command_class(selectedCommand)
obj = cmdClass(args)