在 PHP 中使用 ScraperAPI 爬取 Google 网页搜索

当你手动做事时，在网上抓取数据是一个令人厌烦的过程。例如，如果碰巧使用 PHP 进行操作，则需要执行以下步骤。

使用 file_get_contents() 函数获取网站内容
使用 DOMDocument 类解析内容
然后使用 loadHTML() 函数加载数据
最后，使用 getElementsByTagName() 函数遍历 DOM 树

虽然这是有效的，但这是一个乏味的过程，而且效率不高。如果 DOM 树很复杂，它也很容易出错。如果不正确解析 HTML，可能无法获得所需的数据。

这就是像 ScraperAPI 这样的解决方案的用武之地。

ScraperAPI 是什么?

本质上，ScraperAPI 提供了一种从网络上抓取数据的巧妙方法。我们将看看如何解析谷歌搜索结果…

首先，使用 API 的正常方式。即通过使用 ScraperAPI 加载内容，然后使用 DOMDocument 解析 HTML。
其次，使用 ScraperAPI 的专有解决方案“结构化数据”，使用起来更容易、更高效。

使用 ScraperAPI 遍历谷歌搜索结果

首先，要在 ScraperAPI 上创建一个免费帐户，然后从仪表板中获取您的 API 密钥。

此密钥是访问 API 所必需的。

现在，让我们看看如何使用 ScraperAPI 来抓取谷歌搜索结果。

// Function to scrape Google search results
function scrapeGoogleSearch($query) {
    // Your ScraperAPI API key
    $apiKey = "your_scraperapi_api_key";
    
    // Craft the Google search URL
    $url = "https://www.google.com/search?q=" . urlencode($query);

    // Craft the request URL with ScraperAPI
    $requestUrl = "https://api.scraperapi.com?api_key={$apiKey}&url={$url}";

    // Initialize cURL session
    $curl = curl_init();

    // Set cURL options
    curl_setopt($curl, CURLOPT_URL, $requestUrl);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

    // Execute the request
    $response = curl_exec($curl);

    // Check if the request was successful
    if ($response === false) {
        echo "Failed to fetch data from Google.";
    } else {
        // Parse the HTML response to extract search results
        $dom = new DOMDocument();
        libxml_use_internal_errors(true); // Disable libxml errors
        $dom->loadHTML($response);
        libxml_clear_errors(); // Clear libxml errors

        // Find elements with class 'tF2Cxc' which typically represent search result titles
        $searchResults = $dom->getElementsByTagName('h3');
        $results = [];
        $count = 0;
        foreach ($searchResults as $result) {
            $results[] = $result->textContent;
            $count++;
            if ($count >= 5) {
                break; // Stop after collecting 5 results
            }
        }
        
        // Output the search results (you may want to process this data further)
        print_r($results);
    }

    // Close cURL session
    curl_close($curl);
}

// Example usage
$query = "scraping data"; // Example search query
scrapeGoogleSearch($query);
/*
Output:

Array
(
    [0] => What is data scraping?
    [1] => What Is Data Scraping And How Can You Use It?
    [2] => Images
    [3] => Description
    [4] => What Is Data Scraping | Techniques, Tools & Mitigation
)
*/

如你所知，ScraperAPI 提供了一个 API URL，我们可以在这里传递想要抓取的网站的 URL 和 API 密钥。访问 API 需要 API 密钥。

本例中，我们使用了 “https://www.google.com/search?q=searchterm“ 从谷歌获取搜索结果的 URL。

我们可以通过 cURL 使用该 URL 调用 ScraperAPI 端点，并解析 HTML 响应以提取搜索结果。

使用 ScraperAPI 的结构化数据节点

然而，ScraperAPI 提供了一种更巧妙的方式。它被称为结构化数据。

从本质上讲，ScraperAPI 提供了特殊的数据节点，可用于收集 JSON 格式的结构化数据。因此，不再遍历 DOM 树来提取数据！以下是我们如何使用 ScraperAPI 的结构化数据节点重写前面的示例。

// Function to scrape Google search results using ScraperAPI's Structured Data Solution
function scrapeGoogleSearchStructuredData($query) {
    // Your ScraperAPI API key
    $apiKey = "your_scraperapi_api_key";

    // Craft the request URL with ScraperAPI's Structured Data endpoint
    $requestUrl = "https://api.scraperapi.com/structured/google/search?api_key={$apiKey}&country=US&query=" . urlencode($query);

    // Initialize cURL session
    $curl = curl_init();

    // Set cURL options
    curl_setopt($curl, CURLOPT_URL, $requestUrl);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

    // Execute the request
    $response = curl_exec($curl);

    // Check if the request was successful
    if ($response === false) {
        echo "Failed to fetch data from Google.";
    } else {
        // Parse the JSON response to extract search results
        $data = json_decode($response, true);
        
        // Extract the search results
        $results = [];
        if (isset($data['organic_results'])) {
            $count = 0;
            foreach ($data['organic_results'] as $result) {
                if (isset($result['title'])) {
                    $results[] = $result['title'];
                    $count++;
                }
                if ($count >= 5) {
                    break; // Stop after collecting 5 results
                }
            }
        }
        
        // Output the search results (you may want to process this data further)
        print_r($data);
    }

    // Close cURL session
    curl_close($curl);
}

// Example usage
$query = "OpenAI GPT-3"; // Example search query
scrapeGoogleSearchStructuredData($query);

/*
Output:

Array
(
    [0] => GPT-3 powers the next generation of apps
    [1] => Product
    [2] => OpenAI
    [3] => Images
    [4] => Description
)
*/

如你所见，我们现在有了一个新的结构化数据节点，其形式如下 URL。

$requestUrl = "https://api.scraperapi.com/structured/google/search?api_key={$apiKey}&country=US&query=" . urlencode($query);

该几点特定于检索 Google 搜索结果，我们需要将 API 密钥、国家和搜索词作为查询参数传入。country 表示该节点应返回与指定国家/地区相关的结果。

之后，我们可以使用 cURL 到达该结点，并解析 JSON 响应以提取搜索结果。搜索结果将位于 JSON 响应中的 organic_results 键中。

响应中还返回了许多其他数据。例如相关问题、相关搜索、分页等。你可以通过打印整个 JSON 响应来查看所有字段。

结语

就这样！这是如何使用 ScraperAPI 从网络上抓取数据的简要介绍。我认为这是一个很好的工具，对于任何想从网络上抓取数据的人来说。

我发现它使用起来很方便。尤其是结构化数据端点。因此，如果你正在寻找一种从网络上抓取数据的工具，我强烈推荐 ScraperAPI。

在 PHP 中使用 ScraperAPI 爬取 Google 网页搜索

ScraperAPI 是什么?

使用 ScraperAPI 遍历谷歌搜索结果

使用 ScraperAPI 的结构化数据节点

结语

相关推荐：

最新文章：