PINE64

Full Version: ROCKPRO64 for SCRAPING
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi all,
 
I am a new on forum, and sorry in advance of my speack english which is so bad.
 
My project is to create a small device to scrap web site like amazon, this device will be set ON 24/24h and each day it will scrap again all webpage list.
To scrap webpage, I used Python3.6 with Selenium and BeautifulSoup.
 
I have trying Rapsberry Pi3+ and an Odroid XU4 to scrap more 10000 web site.
Pi3+ : it's scraps 100 webpage/H
XU4 : 360 webpage/H
 
The goal is to get 600webpage/H   Angel
 
There are few days, I found RockPro64 with its multi quad core, and I thinks this device can be my solution to get my objective.
 
------------
My cart:
ROCKPro64 2GB/4GB
7″ LCD TOUCH SCREEN PANEL
ROCKPro64 2×2 MIMO Dual Band WIFI 802.11AC/BLUETOOTH 4.2 MODULE
ROCKPro64 PLAYBOX ENCLOSURE
ROCKPro64 12V 5A POWER SUPPLY or ROCKPro64 12V 3A POWER SUPPLY
16GB eMMC Module
------------
 
Before validate my cart, I want know how many time while take a Rockpro64 to open x webpage/H
 
Someone can help me and run this test (python) on RockPro64? And share me the elapsed time Smile
 
If it is work, I will add in my cart x10 Rockpro64  Big Grin

Thanks you very much  Heart

---------------PYTHON CODE------------
from selenium import webdriver
import time
from urllib.request import urlopen
from bs4 import BeautifulSoup
 
#---CHROME OPTION AND OPEN DRIVER---
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
#driver = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver',chrome_options=chrome_options)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.set_window_size(2000, 1000)
 
#---LIST of WEBPAGE---
list=['https://www.amazon.fr/dp/B07SJFD9N4','https://www.amazon.fr/dp/B07S6FB64X','https://www.amazon.fr/dp/B07Q2J5NCZ','https://www.amazon.fr/dp/B07QR9MDXS','https://www.amazon.fr/dp/B07Q2LSS1P','https://www.amazon.fr/dp/B07RBTKJ7H','https://www.amazon.fr/dp/B07PZFNG1F','https://www.amazon.fr/dp/B07HG3XG7Q','https://www.amazon.fr/dp/B07DDFC9B9','https://www.amazon.fr/dp/B07Q4216T1']
 
#---START TIME---
start = time.time()
 
#---Show WEBPAGE---
h='url'
for h in list:
          driver.get(h)
          driver.execute_script('window.scrollBy(0,10000);')
          print('-----------> '+str(h))
          time.sleep(1)
          html = driver.page_source
          soup = BeautifulSoup(html, 'html.parser')
          print('-----> BS4 OK')
         
#---END TIME---
end = time.time()
 
#---ELAPSED---
elapsed = end - start
 
print (elapsed)
(07-30-2019, 06:47 AM)Ulthor_31 Wrote: [ -> ]I have trying Rapsberry Pi3+ and an Odroid XU4 to scrap more 10000 web site.
Pi3+ : it's scraps 100 webpage/H
XU4 : 360 webpage/H
 
The goal is to get 600webpage/H   Angel 

I ran your code on a pristine buster install of armbian and it output the following:

Code:
# python3 test.py
test.py:11: DeprecationWarning: use options instead of chrome_options
 driver = webdriver.Chrome(chrome_options=chrome_options)
-----------> https://www.amazon.fr/dp/B07SJFD9N4
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07S6FB64X
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07Q2J5NCZ
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07QR9MDXS
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07Q2LSS1P
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07RBTKJ7H
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07PZFNG1F
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07HG3XG7Q
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07DDFC9B9
-----> BS4 OK
-----------> https://www.amazon.fr/dp/B07Q4216T1
-----> BS4 OK
102.66483640670776


There is a few things to note, firstly I was ssh'd into the system, so the gui was forwarded over X. I'm in Australia so .fr lookups may faster if you are in France
Thanks very much EvilBunny  Heart

Can you try again with this Python, I modify just with com.au

---------------PYTHON CODE------------
from selenium import webdriver
import time
from urllib.request import urlopen
from bs4 import BeautifulSoup
 
#---CHROME OPTION AND OPEN DRIVER---
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
#driver = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver',chrome_options=chrome_options)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.set_window_size(2000, 1000)
 
#---LIST of WEBPAGE---
list=['https://www.amazon.com.au/dp/B07HPKWGJH','https://www.amazon.com.au/dp/B07FTN21JL','https://www.amazon.com.au/dp/B07HPCDHQS','https://www.amazon.com.au/dp/B07FTN21JL','https://www.amazon.com.au/dp/B079DQ7JK6','https://www.amazon.com.au/dp/B07CBPS16T','https://www.amazon.com.au/dp/B07CBP38HS','https://www.amazon.com.au/dp/B073R3MJ87','https://www.amazon.com.au/dp/B01LZG4KPC','https://www.amazon.com.au/dp/B077DX1NFH']
 
#---START TIME---
start = time.time()
 
#---Show WEBPAGE---
h='url'
for h in list:
          driver.get(h)
          driver.execute_script('window.scrollBy(0,10000);')
          print('-----------> '+str(h))
          time.sleep(1)
          html = driver.page_source
          soup = BeautifulSoup(html, 'html.parser')
          print('-----> BS4 OK')
          
#---END TIME---
end = time.time()
 
#---ELAPSED---
elapsed = end - start
 
print (elapsed)


-----
Thanks you again Smile
(08-01-2019, 03:51 AM)Ulthor_31 Wrote: [ -> ]Thanks very much EvilBunny  Heart

Can you try again with this Python, I modify just with com.au


Was slightly faster: 87.21881079673767
(08-01-2019, 04:01 AM)evilbunny Wrote: [ -> ]
(08-01-2019, 03:51 AM)Ulthor_31 Wrote: [ -> ]Thanks very much EvilBunny  Heart

Can you try again with this Python, I modify just with com.au


Was slightly faster: 87.21881079673767

Thanks you very much  Rolleyes

I will go in holidays... So I will validate my cart at September  Big Grin