PINE64 - ROCKPRO64 for SCRAPING

Hi all,

I am a new on forum, and sorry in advance of my speack english which is so bad.

My project is to create a small device to scrap web site like amazon, this device will be set ON 24/24h and each day it will scrap again all webpage list.
To scrap webpage, I used Python3.6 with Selenium and BeautifulSoup.

I have trying Rapsberry Pi3+ and an Odroid XU4 to scrap more 10000 web site.
Pi3+ : it's scraps 100 webpage/H
XU4 : 360 webpage/H

The goal is to get 600webpage/H Angel

There are few days, I found RockPro64 with its multi quad core, and I thinks this device can be my solution to get my objective.

------------
My cart:
ROCKPro64 2GB/4GB
7″ LCD TOUCH SCREEN PANEL
ROCKPro64 2×2 MIMO Dual Band WIFI 802.11AC/BLUETOOTH 4.2 MODULE
ROCKPro64 PLAYBOX ENCLOSURE
ROCKPro64 12V 5A POWER SUPPLY or ROCKPro64 12V 3A POWER SUPPLY
16GB eMMC Module
------------

Before validate my cart, I want know how many time while take a Rockpro64 to open x webpage/H

Someone can help me and run this test (python) on RockPro64? And share me the elapsed time Smile

If it is work, I will add in my cart x10 Rockpro64 Big Grin

Thanks you very much Heart

---------------PYTHON CODE------------
from selenium import webdriver
import time
from urllib.request import urlopen
from bs4 import BeautifulSoup

#---CHROME OPTION AND OPEN DRIVER---
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
#driver = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver',chrome_options=chrome_options)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.set_window_size(2000, 1000)

#---LIST of WEBPAGE---
list=['https://www.amazon.fr/dp/B07SJFD9N4','https://www.amazon.fr/dp/B07S6FB64X','https://www.amazon.fr/dp/B07Q2J5NCZ','https://www.amazon.fr/dp/B07QR9MDXS','https://www.amazon.fr/dp/B07Q2LSS1P','https://www.amazon.fr/dp/B07RBTKJ7H','https://www.amazon.fr/dp/B07PZFNG1F','https://www.amazon.fr/dp/B07HG3XG7Q','https://www.amazon.fr/dp/B07DDFC9B9','https://www.amazon.fr/dp/B07Q4216T1']

#---START TIME---
start = time.time()

#---Show WEBPAGE---
h='url'
for h in list:
          driver.get(h)
          driver.execute_script('window.scrollBy(0,10000);')
          print('-----------> '+str(h))
          time.sleep(1)
          html = driver.page_source
          soup = BeautifulSoup(html, 'html.parser')
          print('-----> BS4 OK')

#---END TIME---
end = time.time()

#---ELAPSED---
elapsed = end - start

print (elapsed)

(07-30-2019, 06:47 AM)Ulthor_31 Wrote: [ -> ]I have trying Rapsberry Pi3+ and an Odroid XU4 to scrap more 10000 web site.
Pi3+ : it's scraps 100 webpage/H
XU4 : 360 webpage/H

The goal is to get 600webpage/H

I ran your code on a pristine buster install of armbian and it output the following:

Code:
# python3 test.py 

test.py:11: DeprecationWarning: use options instead of chrome_options

  driver = webdriver.Chrome(chrome_options=chrome_options)

-----------> https://www.amazon.fr/dp/B07SJFD9N4

-----> BS4 OK

-----------> https://www.amazon.fr/dp/B07S6FB64X

-----> BS4 OK

-----------> https://www.amazon.fr/dp/B07Q2J5NCZ

-----> BS4 OK

-----------> https://www.amazon.fr/dp/B07QR9MDXS

-----> BS4 OK

-----------> https://www.amazon.fr/dp/B07Q2LSS1P

-----> BS4 OK

-----------> https://www.amazon.fr/dp/B07RBTKJ7H

-----> BS4 OK

-----------> https://www.amazon.fr/dp/B07PZFNG1F

-----> BS4 OK

-----------> https://www.amazon.fr/dp/B07HG3XG7Q

-----> BS4 OK

-----------> https://www.amazon.fr/dp/B07DDFC9B9

-----> BS4 OK

-----------> https://www.amazon.fr/dp/B07Q4216T1

-----> BS4 OK

102.66483640670776

There is a few things to note, firstly I was ssh'd into the system, so the gui was forwarded over X. I'm in Australia so .fr lookups may faster if you are in France

Thanks very much EvilBunny Heart

Can you try again with this Python, I modify just with com.au

---------------PYTHON CODE------------
from selenium import webdriver
import time
from urllib.request import urlopen
from bs4 import BeautifulSoup

#---CHROME OPTION AND OPEN DRIVER---
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
#driver = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver',chrome_options=chrome_options)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.set_window_size(2000, 1000)

#---LIST of WEBPAGE---
list=['https://www.amazon.com.au/dp/B07HPKWGJH','https://www.amazon.com.au/dp/B07FTN21JL','https://www.amazon.com.au/dp/B07HPCDHQS','https://www.amazon.com.au/dp/B07FTN21JL','https://www.amazon.com.au/dp/B079DQ7JK6','https://www.amazon.com.au/dp/B07CBPS16T','https://www.amazon.com.au/dp/B07CBP38HS','https://www.amazon.com.au/dp/B073R3MJ87','https://www.amazon.com.au/dp/B01LZG4KPC','https://www.amazon.com.au/dp/B077DX1NFH']

#---START TIME---
start = time.time()

#---Show WEBPAGE---
h='url'
for h in list:
          driver.get(h)
          driver.execute_script('window.scrollBy(0,10000);')
          print('-----------> '+str(h))
          time.sleep(1)
          html = driver.page_source
          soup = BeautifulSoup(html, 'html.parser')
          print('-----> BS4 OK')

#---END TIME---
end = time.time()

#---ELAPSED---
elapsed = end - start

print (elapsed)

-----
Thanks you again Smile

(08-01-2019, 03:51 AM)Ulthor_31 Wrote: [ -> ]Thanks very much EvilBunny

Can you try again with this Python, I modify just with com.au

Was slightly faster: 87.21881079673767

(08-01-2019, 04:01 AM)evilbunny Wrote: [ -> ]
(08-01-2019, 03:51 AM)Ulthor_31 Wrote: [ -> ]Thanks very much EvilBunny

Can you try again with this Python, I modify just with com.au

Was slightly faster: 87.21881079673767

Thanks you very much Rolleyes

I will go in holidays... So I will validate my cart at September Big Grin