Top php Questions

List of Tags
1213
Andrew G. Johnson

If user input is inserted into an SQL query directly, the application becomes vulnerable to SQL injection, like in the following example:

$unsafe_variable = $_POST['user_input'];

mysql_query("INSERT INTO table (column) VALUES ('" . $unsafe_variable . "')");

That's because the user can input something like value'); DROP TABLE table;--, making the query:

INSERT INTO table (column) VALUES('value'); DROP TABLE table;--')

What should one do to prevent this?

Answered By: Theo ( 1282)

Use prepared statements and parameterized queries. These are SQL statements that are sent to and parsed by the database server separately from any parameters. This way it is impossible for an attacker to inject malicious SQL.

You basically have two options to achieve this:

  1. Using PDO:

    $stmt = $pdo->prepare('SELECT * FROM employees WHERE name = :name');
    
    $stmt->execute(array(':name' => $name));
    
    foreach ($stmt as $row) {
        // do something with $row
    }
    
  2. Using mysqli:

    $stmt = $dbConnection->prepare('SELECT * FROM employees WHERE name = ?');
    $stmt->bind_param('s', $name);
    
    $stmt->execute();
    
    $result = $stmt->get_result();
    while ($row = $result->fetch_assoc()) {
        // do something with $row
    }
    

PDO

Note that when using PDO to access a MySQL database real prepared statements are not used by default. To fix this you have to disable the emulation of prepared statements. An example of creating a connection using PDO is:

$dbConnection = new PDO('mysql:dbname=dbtest;host=127.0.0.1;charset=utf8', 'user', 'pass');

$dbConnection->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$dbConnection->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

In the above example the error mode isn't strictly necessary, but it is advised to add it. This way the script will not stop with a Fatal Error when something goes wrong. And gives the developer the chance to catch any error(s) which are thrown as PDOExceptions.

What is mandatory however is the setAttribute() line, which tells PDO to disable emulated prepared statements and use real prepared statements. This makes sure the statement and the values aren't parsed by PHP before sending it the the MySQL server (giving a possible attacker no chance to inject malicious SQL).

Although you can set the charset in the options of the constructor it's important to note that 'older' versions of PHP (< 5.3.6) silently ignored the charset parameter in the DSN.

Explanation

What happens is that the SQL statement you pass to prepare is parsed and compiled by the database server. By specifying parameters (either a ? or a named parameter like :name in the example above) you tell the database engine where you want to filter on. Then when you call execute the prepared statement is combined with the parameter values you specify.

The important thing here is that the parameter values are combined with the compiled statement, not a SQL string. SQL injection works by tricking the script into including malicious strings when it creates SQL to send to the database. So by sending the actual SQL separately from the parameters you limit the risk of ending up with something you didn't intend. Any parameters you send when using a prepared statement will just be treated as strings (although the database engine may do some optimization so parameters may end up as numbers too, of course). In the example above, if the $name variable contains 'Sarah'; DELETE * FROM employees the result would simply be a search for the string "'Sarah'; DELETE * FROM employees", and you will not end up with an empty table.

Another benefit with using prepared statements is that if you execute the same statement many times in the same session it will only be parsed and compiled once, giving you some speed gains.

Oh, and since you asked about how to do it for an insert, here's an example (using PDO):

$preparedStatement = $db->prepare('INSERT INTO table (column) VALUES (:column)');

$preparedStatement->execute(array(':column' => $unsafeValue));
594
anon

What is this?

This is a collection of questions that come up every now and then about syntax in PHP. This is also a Community Wiki, so everyone is invited to participate in maintaining this list.

Why is this?

Stack Overflow does not allow searching for particular characters. As a consequence, many questions about operators and other syntax tokens are not found easily when searching for them. This also makes closing duplicates more difficult. The list below is to help with this issue.

The main idea is to have links to existing questions on Stack Overflow, so it's easier for us to reference them, not to copy over content from the PHP Manual.

What should I do here?

If you have been pointed here by someone because you have asked such a question, please find the particular syntax below. The linked pages to the PHP manual along with the linked questions will likely answer your question then. If so, you are encouraged to upvote the answer. This list is not meant as a substitute to the help others provided.

The List

If your particular token is not listed below, you might find it in the List of Parser Tokens.


& Bitwise Operators or References


=& References


&= Bitwise Operators


&& Logical Operators


% Arithmetic Operators


!! Logical Operators


@ Error Control Operators


?: Ternary Operator


: Alternative syntax for control structures, Ternary Operator


:: Scope Resolution Operator


\ Namespaces


-> Classes And Objects


=> Arrays


^ Bitwise Operators


>> Bitwise Operators


<< Bitwise Operators


<<< Heredoc or Nowdoc


= Assignment Operators


== Comparison Operators


=== Comparison Operators


!== Comparison Operators


!= Comparison Operators


<> Comparison Operators


| Bitwise Operators


|| Logical Operators


~ Bitwise Operators


+ Arithmetic Operators, Array Operators


+= Assignment Operators


++ Incrementing/Decrementing Operators


.= Assignment Operators


. String Operators


, Function Arguments


$$ Variable Variables


` Execution Operator


<?= Short Open Tags


[] Arrays


<? Opening and Closing tags


Answered By: Peter Ajtai ( 131)

Incrementing / Decrementing Operators

++ increment operator

-- decrement operator

Example    Name              Effect
---------------------------------------------------------------------
++$a       Pre-increment     Increments $a by one, then returns $a.
$a++       Post-increment    Returns $a, then increments $a by one.
--$a       Pre-decrement     Decrements $a by one, then returns $a.
$a--       Post-decrement    Returns $a, then decrements $a by one.

These can go before or after the variable. Putting this operator before the variable is slightly faster.

If put before the variable, the increment / decrement operation is done to the variable first then the result is returned. If put after the variable, the variable is first returned, then the increment / decrement operation is done.

For example:

$apples = 10;
for ($i = 0; $i < 10; ++$i)
{
    echo 'I have ' . $apples-- . " apples. I just ate one.\n";
}

Live example

In the case above ++$i is used, since it is faster. $i++ would have the same results.

However, you must use $apples--, since first you want to display the current number of apples, and then you want to subtract one from it.

You can also increment letters in PHP:

$i = "a";
while ($i < "c")
{
    echo $i++;
}

Once z is reached aa is next, and so on.

Note that character variables can be incremented but not decremented and even so only plain ASCII characters (a-z and A-Z) are supported.


Stack Overflow Posts:

511
RobertPitt

How can one parse HTML/XML and extract information from it? What libraries exist for that purpose? What are their strengths and drawbacks?

This is a General Reference question for the tag

Answered By: Gordon ( 553)

Native XML Extensions

I prefer using one of the native XML extensions since they come bundled with PHP, are usually faster than all the 3rd party libs and give me all the control I need over the markup.

DOM

The DOM extension allows you to operate on XML documents through the DOM API with PHP 5. It is an implementation of the W3C's Document Object Model Core Level 3, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents.

DOM is capable of parsing and modifying real world (broken) HTML and it can do XPath queries. It is based on libxml.

It takes some time to get productive with DOM, but that time is well worth IMO. Since DOM is a language agnostic interface, you'll find implementations in many languages, so if you need to change your programming language, chances are you will already know how to use that language's DOM API then.

A basic usage example can be found in Grabbing the href attribute of an A element and a general conceptual overview can be found at Noob question about DOMDocument in php

How to use the DOM extension has been covered extensively on StackOverflow, so if you choose to use it, you can be sure most of the issues you run into can be solved by searching/browsing Stack Overflow.

XMLReader

The XMLReader extension is an XML Pull parser. The reader acts as a cursor going forward on the document stream and stopping at each node on the way.

XMLReader, like DOM, is based on libxml. I am not aware on how to trigger the HTML Parser Module, so chances are using XMLReader for parsing broken HTML might be less robust than using DOM where you can explicitly tell it to use libxml's HTML Parser Module.

A basic usage example can be found at getting all values from h1 tags using php

SimpleXml

The SimpleXML extension provides a very simple and easily usable toolset to convert XML to an object that can be processed with normal property selectors and array iterators.

SimpleXML is an option when you know the HTML is valid XHTML. If you need to parse broken HTML, don't even consider SimpleXml because it will choke.

A basic usage example can be found at A simple program to CRUD node and node values of xml file and there is lots of additional examples in the PHP Manual.


3rd Party Libraries (libxml based)

If you prefer to use a 3rd party lib, I'd suggest to use a lib that actually uses DOM/libxml underneath instead of String Parsing.

phpQuery

phpQuery is a server-side, chainable, CSS3 selector driven Document Object Model (DOM) API based on jQuery JavaScript Library written in PHP5 and provides additional Command Line Interface (CLI).

Zend_Dom

Zend_Dom provides tools for working with DOM documents and structures. Currently, we offer Zend_Dom_Query, which provides a unified interface for querying DOM documents utilizing both XPath and CSS selectors.

QueryPath

QueryPath is a PHP library for manipulating XML and HTML. It is designed to work not only with local files, but also with web services and database resources. It implements much of the jQuery interface, but it is heavily tuned for server-side use.

FluentDom

FluentDOM ist a jQuery like fluent XML interface for the DOMDocument in PHP.

fDOMDocument

fDOMDocument extends the standard DOM to use exceptions at all occasions of errors instead of PHP warnings or notices. They also add various custom methods and shortcuts for convinience and to simplify the usage of DOM.


3rd Party (not libxml based)

The benefit of building upon DOM/libxml is that you get good performance out of the box because you are based on a native extension. However, not all 3rd party libs go down this route, some of them listed below

SimpleHtmlDom

  • A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
  • Require PHP 5+.
  • Supports invalid HTML.
  • Find tags on an HTML page with selectors just like jQuery.
  • Extract contents from HTML in a single line.

I generally do not recommend this parser. The codebase is horrible and the parser itself is rather slow and memory hungry. Any of the libxml based libraries should outperform this easily.

Ganon

  • A universal tokenizer and HTML/XML/RSS DOM Parser
    • Ability to manipulate elements and their attributes
    • Supports invalid HTML and UTF8
  • Can perform advanced CSS3-like queries on elements (like jQuery -- namespaces supported)
  • A HTML beautifier (like HTML Tidy)
    • Minify CSS and Javascript
    • Sort attributes, change character case, correct indentation, etc.
  • Extensible
    • Parsing documents using callbacks based on current character/token
    • Operations separated in smaller functions for easy overriding
  • Fast and Easy

Never used it. Can't tell if it's any good.


HTML 5

You can use the above for parsing HTML5, but there can be quirks due to the markup HTML5 allows. So for HTML5 you want to consider using a dedicated parser, like

html5lib

A Python and PHP implementations of a HTML parser based on the WHATWG HTML5 specification for maximum compatibility with major desktop web browsers.

We might see more dedicated parsers once HTML5 is finalized. There is also a blogpost by the W3's titled How-To for html 5 parsing that is worth checking out.


WebServices

If you don't feel like programming PHP, you can also utilizes Web Services. In general, I found very little utility for these, but that's just me and my Use Cases.

YQL

The YQL Web Service enables applications to query, filter, and combine data from different sources across the Internet. YQL statements have a SQL-like syntax, familiar to any developer with database experience.

ScraperWiki.

ScraperWiki's external interface allows you to extract data in the form you want for use on the web or in your own applications. You can also extract information about the state of any scraper.


Regular Expressions

Last and least recommended, you can extract data from HTML with Regular Expressions. In general using Regular Expressions on HTML is discouraged.

Most of the snippets you will find on the web to match markup are brittle. In most cases they are only working for a very particular piece of HTML. Tiny markup changes, like adding a space somewhere, can make the Regex fails when it's not properly written. You should know what you are doing before using Regex on HTML.

HTML parsers already know the syntactical rules of HTML. Regular expression have to be taught them with each new Regex you write. Regex are fine in some cases, but it really depends on your UseCase.

You can write more reliable parsers, but writing a complete and reliable custom parser with Regular Expressions is a waste of time when the aforementioned libraries already exist and do a much better job on this.

Also see Parsing Html The Cthulhu Way


Books

If you want to spend some money, have a look at

I am not affiliated with PHP Architects or the authors.

459
php
Casey Watson

In PHP 5, what is the difference between using self and $this?

When is each appropriate?

Answered By: John Millikin ( 400)

From http://www.phpbuilder.com/board/showthread.php?t=10354489:

Use $this to refer to the current object. Use self to refer to the current class. In other words, use $this->member for non-static members, use self::$member for static members.

As I continue to build more and more websites and web applications I am often asked to store user's passwords in a way that they can be retrieved if/when the user has an issue (either to email a forgotten password link, walk them through over the phone, etc.) When I can I fight bitterly against this practice and I do a lot of ‘extra’ programming to make password resets and administrative assistance possible without storing their actual password.

When I can’t fight it (or can’t win) then I always encode the password in some way so that it at least isn’t stored as plaintext in the database—though I am aware that if my DB gets hacked that it won’t take much for the culprit to crack the passwords as well—so that makes me uncomfortable.

In a perfect world folks would update passwords frequently and not duplicate them across many different sites—unfortunately I know MANY people that have the same work/home/email/bank password, and have even freely given it to me when they need assistance. I don’t want to be the one responsible for their financial demise if my DB security procedures fail for some reason.

Morally and ethically I feel responsible for protecting what can be, for some users, their livelihood even if they are treating it with much less respect. I am certain that there are many avenues to approach and arguments to be made for salting hashes and different encoding options, but is there a single ‘best practice’ when you have to store them? In almost all cases I am using PHP and MySQL if that makes any difference in the way I should handle the specifics.

Additional Information for Bounty

I want to clarify that I know this is not something you want to have to do and that in most cases refusal to do so is best. I am, however, not looking for a lecture on the merits of taking this approach I am looking for the best steps to take if you do take this approach.

In a note below I made the point that websites geared largely toward the elderly, mentally challenged, or very young can become confusing for people when they are asked to perform a secure password recovery routine. Though we may find it simple and mundane in those cases some users need the extra assistance of either having a service tech help them into the system or having it emailed/displayed directly to them.

In such systems the attrition rate from these demographics could hobble the application if users were not given this level of access assistance, so please answer with such a setup in mind.

Thanks to Everyone

This has been a fun questions with lots of debate and I have enjoyed it. In the end I selected an answer that both retains password security (I will not have to keep plain text or recoverable passwords), but also makes it possible for the user base I specified to log into a system without the major drawbacks I have found from normal password recovery.

As always there were about 5 answers that I would like to have marked correct for different reasons, but I had to choose the best one--all the rest got a +1. Thanks everyone!

Also, thanks to everyone in the Stack community who voted for this question and/or marked it as a favorite. I take hitting 100 up votes as a compliment and hope that this discussion has helped someone else with the same concern that I had.

Answered By: Michael Burr ( 316)

How about taking another approach or angle at this problem. Ask why the password is required to be in plaintext - if it's so that the user can retrieve the password, then strictly speaking you don't really need to retrieve the password they set (they don't remember what it is anyway), you need to be able to give them a password they can use.

Think about it - if the user needs to retrieve the password, it's because they've forgotten it. In which case a new password is just as good as the old one. But, one of the drawbacks of common password reset mechanisms used today is that the generated passwords produced in a reset operation are generally a bunch of random characters, so they're difficult for the user to simply type in correctly unless they copy-n-paste. That can be a problem for less savvy computer users.

One way around that problem is to provide auto-generated passwords that are more or less natural language text. While natural language strings might not have the entropy that a string of random characters of the same length has, there's nothing that says that your auto-generated password needs to have only 8 (or 10 or 12) characters. Get a high-entropy auto-generated passphrase by stringing together several random words (leave a space between them, so they're still recognizable and typeable by anyone who can read). Six words random words of varying length are probably easier to type correctly and with confidence than 10 random characters, and they can have a higher entropy as well. For example, the entropy of a 10 character password that drew randomly from uppercase, lowercase, digits and 10 punctuation symbols (for a total of 72 valid symbols) would have an entropy of 61.7 bits. Using a dictionary of 7776 words (as Diceware uses) that could be randomly selected for a six word passphrase, the passphrase would have an entropy of 77.4 bits. See the Diceware FAQ for more info.

  • a passphrase with about 77 bits of entropy: "admit prose flare table acute flair"

  • a password with about 74 bits of entropy: "K:&$R^tt~qkD"

I know I'd prefer typing the phrase, and with copy-n-paste, the phrase is no less easy to use that the password either, so no loss there. Of course if your website (or whatever the protected asset is) doesn't need 77 bits of entropy for an auto-generated passphrase, generate fewer words (which I'm sure your users would appreciate).

I understand the arguments that there are password protected assets that really don't have a high level of value, so the breach of a password might not be the end of the world. For example, I probably wouldn't care if 80% of the passwords I use on various websites was breached - all that could happen is a someone spamming or posting under my name for a while. That wouldn't be great, but it's not like they'd be breaking into my bank account. However, given the fact that many people use the same password for their web forum sites as they do for their bank accounts (and probably national security databases), I think it would be best to handle even those 'low-value' passwords as non-recoverable.

Every now and then I hear the advice "Use bcrypt for storing passwords in PHP, bcrypt rules".

But what is bcrypt? PHP doesn't offer any such functions, Wikipedia babbles about a file-encryption utility and Web searches just reveal a few implementations of Blowfish in different languages. Now Blowfish is also available in PHP via mcrypt, but how does that help with storing passwords? Blowfish is a general purpose cipher, it works two ways. If it could be encrypted, it can be decrypted. Passwords need a one-way hashing function.

Could anyone explain?

Answered By: Andrew Moore ( 324)

You do:

bcrypt is an hashing algorithm which is scalable with hardware (via a configurable number of rounds). Its slowness and multiple rounds ensures that an attacker must deploy massive funds and hardware to be able to crack your passwords. Add to that per-password salts (bcrypt REQUIRES salts) and you can be sure that an attack is virtually unfeasible without either ludicrous amount of funds or hardware.

bcrypt uses the Eksblowfish algorithm to hash passwords. While the encryption phase of Eksblowfish and Blowfish are exactly the same, the key schedule phase of Eksblowfish ensures that any subsequent state depends on both salt and key (user password), and no state can be precomputed without the knowledge of both. Because of this key difference, bcrypt is a one-way hashing algorithm. You cannot retrieve the plain text password without already knowing the salt, rounds and key (password). [Source]

You can use crypt() function to generate bcrypt hashes of input strings. This class can automatically generate salts and verify existing hashes against an input.

class Bcrypt {
  private $rounds;
  public function __construct($rounds = 12) {
    if(CRYPT_BLOWFISH != 1) {
      throw new Exception("bcrypt not supported in this installation. See http://php.net/crypt");
    }

    $this->rounds = $rounds;
  }

  public function hash($input) {
    $hash = crypt($input, $this->getSalt());

    if(strlen($hash) > 13)
      return $hash;

    return false;
  }

  public function verify($input, $existingHash) {
    $hash = crypt($input, $existingHash);

    return $hash === $existingHash;
  }

  private function getSalt() {
    $salt = sprintf('$2a$%02d$', $this->rounds);

    $bytes = $this->getRandomBytes(16);

    $salt .= $this->encodeBytes($bytes);

    return $salt;
  }

  private $randomState;
  private function getRandomBytes($count) {
    $bytes = '';

    if(function_exists('openssl_random_pseudo_bytes') &&
        (strtoupper(substr(PHP_OS, 0, 3)) !== 'WIN')) { // OpenSSL slow on Win
      $bytes = openssl_random_pseudo_bytes($count);
    }

    if($bytes === '' && is_readable('/dev/urandom') &&
       ($hRand = @fopen('/dev/urandom', 'rb')) !== FALSE) {
      $bytes = fread($hRand, $count);
      fclose($hRand);
    }

    if(strlen($bytes) < $count) {
      $bytes = '';

      if($this->randomState === null) {
        $this->randomState = microtime();
        if(function_exists('getmypid')) {
          $this->randomState .= getmypid();
        }
      }

      for($i = 0; $i < $count; $i += 16) {
        $this->randomState = md5(microtime() . $this->randomState);

        if (PHP_VERSION >= '5') {
          $bytes .= md5($this->randomState, true);
        } else {
          $bytes .= pack('H*', md5($this->randomState));
        }
      }

      $bytes = substr($bytes, 0, $count);
    }

    return $bytes;
  }

  private function encodeBytes($input) {
    // The following is code from the PHP Password Hashing Framework
    $itoa64 = './ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';

    $output = '';
    $i = 0;
    do {
      $c1 = ord($input[$i++]);
      $output .= $itoa64[$c1 >> 2];
      $c1 = ($c1 & 0x03) << 4;
      if ($i >= 16) {
        $output .= $itoa64[$c1];
        break;
      }

      $c2 = ord($input[$i++]);
      $c1 |= $c2 >> 4;
      $output .= $itoa64[$c1];
      $c1 = ($c2 & 0x0f) << 2;

      $c2 = ord($input[$i++]);
      $c1 |= $c2 >> 6;
      $output .= $itoa64[$c1];
      $output .= $itoa64[$c2 & 0x3f];
    } while (1);

    return $output;
  }
}

You may use this code as such:

$bcrypt = new Bcrypt(15);

$hash = $bcrypt->hash('password');
$isGood = $bcrypt->verify('password', $hash);

Alternatively, you may also use the Portable PHP Hashing Framework.


NOTE for PHP5.5: (Still in alpha)

This answer is now relatively old and even though it is still valid, password hashing functions have now been built directly into PHP >= 5.5. Whenever possible, it is recommended that you use built-in functions rather than trying to create your own. They are tested by a team of security experts and if a vulnerability is found (either now or in the future) it will be patched for future versions and backported to previous ones if possible.

<?php
// Usage 1:
echo password_hash("rasmuslerdorf", PASSWORD_DEFAULT)."\n";
// $2y$10$.vGA1O9wmRjrwAVXD98HNOgsNpDczlqm3Jq7KnEd1rVAGv3Fykk1a

// Usage 2:
$options = [
    'cost' => 7,
    'salt' => 'BCryptRequires22Chrcts',
];
echo password_hash("rasmuslerdorf", PASSWORD_BCRYPT, $options)."\n";
// $2y$07$BCryptRequires22Chrcte/VlQH0piJtjXl.0t1XkA8pw9dMXTpOq
?>

If you are using PHP < 5.5 but >= 5.3.7, there is also a Github compatibility library created based on the source code of the above functions originally written in C, which provides the same functionality. The code is actually very similar to the original answer below.

332
DaveRandom

Let me prefix this by saying that I know what foreach is, does and how to use it. This question concerns how it works under the bonnet, and I don't want any answers along the lines of "this is how you loop an array with foreach".


For a long time I assumed that foreach worked with the array itself. Then I found many references to the fact that it works with a copy of the array, and I have since assumed this to be the end of the story. But I recently got into a discussion on the matter, and after a little experimentation found that this was not in fact 100% true.

Let me show what I mean. For the following test cases, we will be working with the following array:

$array = array(1, 2, 3, 4, 5);

Test case 1:

foreach ($array as $item) {
  echo "$item\n";
  $array[] = $item;
}
print_r($array);

/* Output in loop:    1 2 3 4 5
   $array after loop: 1 2 3 4 5 1 2 3 4 5 */

This clearly shows that we are not working directly with the source array - otherwise the loop would continue forever, since we are constantly pushing items onto the array during the loop. But just to be sure this is the case:

Test case 2:

foreach ($array as $key => $item) {
  $array[$key + 1] = $item + 2;
  echo "$item\n";
}

print_r($array);

/* Output in loop:    1 2 3 4 5
   $array after loop: 1 3 4 5 6 7 */

This backs up our initial conclusion, we are working with a copy of the source array during the loop, otherwise we would see the modified values during the loop. But...

If we look in the manual, we find this statement:

When foreach first starts executing, the internal array pointer is automatically reset to the first element of the array.

Right... this seems to suggest that foreach relies on the array pointer of the source array. But we've just proved that we're not working with the source array, right? Well, not entirely.

Test case 3:

// Move the array pointer on one to make sure it doesn't affect the loop
var_dump(each($array));

foreach ($array as $item) {
  echo "$item\n";
}

var_dump(each($array));

/* Output
  array(4) {
    [1]=>
    int(1)
    ["value"]=>
    int(1)
    [0]=>
    int(0)
    ["key"]=>
    int(0)
  }
  1
  2
  3
  4
  5
  bool(false)
*/

So, despite the fact that we are not working directly with the source array, we are working directly with the source array pointer - the fact that the pointer is at the end of the array at the end of the loop shows this. Except this can't be true - if it was, then test case 1 would loop forever.

The PHP manual also states:

As foreach relies on the internal array pointer changing it within the loop may lead to unexpected behavior.

Well, lets find out what that "unexpected behavior" is (technically, any behavior is unexpected since I no longer know what to expect).

Test case 4:

foreach ($array as $key => $item) {
  echo "$item\n";
  each($array);
}

/* Output: 1 2 3 4 5 */

Test case 5:

foreach ($array as $key => $item) {
  echo "$item\n";
  reset($array);
}

/* Output: 1 2 3 4 5 */

...nothing that unexpected there, in fact it seems to support the "copy of source" theory.


The Question

Please can someone explain what is going on here? My C-fu is not good enough for me to able to extract a proper conclusion simply by looking at the PHP source code, I would appreciate it if someone could translate it into english for me.

It seems to me that foreach works with a copy of the array, but sets the array pointer of the source array to the end of the array after the loop.

  • Is this correct and the whole story?
  • If not, what is it really doing?
  • Is there any situation where using functions that adjust the array pointer (each(), reset() et al.) during a foreach could affect the outcome of the loop?
Answered By: NikiC ( 388)

Note: This answer assumes that you have some basic knowledge about how zvals work in PHP, in particular you should know what a refcount is and what is_ref means.

foreach works with all kinds of traversables, i.e. with arrays, with plain objects (where the accessible properties are traversed) and Traversable objects (or rather objects that define the internal get_iterator handler). This answer will mainly focus on arrays, I'll just mention the others at the end.

But before getting into that some background on arrays and their iteration that is important in this context:

Background on array iteration

Arrays in PHP are ordered hashtables (i.e. the hash buckets are part of a doubly linked list) and foreach will traverse the array according to that order.

PHP internally has two mechanisms to traverse an array: The first one is the internal array pointer. This pointer is part of the HashTable structure and is basically just a pointer to the current hashtable Bucket. The internal array pointer is safe against modification, i.e. if the current Bucket is removed, then the internal array pointer will be updated to point to the next bucket.

The second iteration mechanism is an external array pointer, called HashPosition. This is basically the same as the internal array pointer, but not stored in the HashTable itself. This external iteration mechanism is not safe against modification. If you remove the bucket that the HashPosition currently points to, then you'll leave behind a dangling pointer, leading to a segmentation fault.

As such external array pointers can only be used when you are absolutely, positively sure that during the iteration no user code will be executed. And user code can be run through a lot of ways, e.g. in an error handler, a cast or a zval destruction. That's why in most cases PHP has to use the internal array pointer instead of an external one. If it would not PHP could segfault when the user started doing weird things.

The issue with the internal array pointer is that it is part of the HashTable. So when you modify it, you are modifying the HashTable and as such the array. And as PHP arrays have by-value (rather than by-reference) semantics that means you need to copy the array whenever you want to iterate an array.

A simple example of why the copying really is necessary (and not just some puristic concern) is a nested iteration:

foreach ($array as $a) {
    foreach ($array as $b) {
        // ...
    }
}

Here you want both loops to be independent and not share the same array pointer in some weird way.

And this leads us to foreach:

Array iteration in foreach

Now you know why foreach has to perform an array copy before traversing the array. But that's obviously not the whole story. Whether or not PHP will actually do the copy depends on a few factors:

  • If the iterated array is a reference, then the copy will not happen, instead only an addref will be done:

    $ref =& $array; // $array has is_ref=1 now
    foreach ($array as $val) {
        // ...
    }
    

    Why? Because any change of the array should also propagate to the reference, including the internal array pointer. If foreach did a copy in this case it would break reference semantics.

  • If the array has a refcount of 1, the copy will not be done either. refcount=1 means that the array isn't used anywhere else so foreach can use it directly. If the refcount is higher than 1, it means that the array is shared with other variables and in order to avoid modification foreach has to copy it (apart from the reference case mentioned above).

  • If the array is iterated by-reference (foreach ($array as &$ref)), then - apart from the copy/no-copy behavior from above - the array will be made a reference afterwards.

So this is the first part of the mystery: the copying behavior. The second part is how the actual iteration is done, which is also a bit odd. The "usual" iteration pattern that you know (and that is also usually used in PHP - apart from foreach) is something like this (pseudocode):

reset();
while (get_current_data(&data) == SUCCESS) {
    code();
    move_forward();
}

foreach iteration looks a tiny bit different:

reset();
while (get_current_data(&data) == SUCCESS) {
    move_forward();
    code();
}

The difference is that move_forward() is not called at the end of one cycle, but at the beginning. So when your userland code is working on the element $i, then the internal array pointer is already at element $i+1.

This behavior of foreach is also the reason why the internal array pointer is set to the next bucket if the current one is removed, rather than the previous one (as you would expect). It's done so it works nicely with foreach (but it obviously won't work nicely with everything else and will skip array elements in that case).

Implications for code

The first implication of the behavior described above is that foreach has to copy the array it iterates in many cases (slow). But fear not: I actually tried out removing the requirement to copy it and I couldn't really see a performance difference in anything but artificial benchmarks (in those iteration got two times faster). Seems like people aren't iterating enough :P

The second implication is that usually there shouldn't be any implications. The foreach behavior is usually rather transparent to the user and just works as it should do. You don't need to worry about if and how a copy has been made and just when exactly a pointer is advanced.

The third implication is - and now we're getting to your problems - that sometimes you do get to see some very weird and hard to understand behavior. This happens in particular when you try to modify the array within the foreach.

A large collection of various edge-case behaviors that occur when you modify an array during iteration can be found in the PHP testsuite. Starting with this test and then incrementing the number 012 to 013 etc you can see how foreach will behave in various situations (different combinations of references etc).

But now to your actual examples:

  • Test case 1: Here $array has refcount=1 before the loop, so it won't be copied, but it'll get an addref. Once you do the $array[] the zval will be separated, so the array that you are pushing the elements to and the one in the iteration will be different ones.

  • Test case 2: Here the same applies as in test case 1.

  • Test case 3: Again, same story. At the foreach loop you have refcount=1, so you only get an addref and as such the internal pointer of $array will be modified. So at the end of the loop the pointer is NULL (meaning iteration done). each indicated this by returning false.

  • Test cases 4 and 5: Both each and reset are by-reference functions. The $array has a refcount=2 when it is passed to them, so it has to be separated. So here again foreach will be working on a separate array.

But those test cases are lame. The behavior only starts to get really unintuitive when you use a function like current in the loop:

foreach ($array as $val) {
    var_dump(current($array));
}
/* Output: 2 2 2 2 2 */

Here you should know that current is also a by-ref function, even though it does not modify the array. It has to be in order to play nice with all the other functions like next which are all by-ref. (current is actually a prefer-ref function. It can also take a value, but will use a ref if it can.) Reference means that the array has to be separated and $array and the foreach-array will be different. The reason you get 2 instead of 1 is also mentioned above: foreach advances the array pointer before running the user code, not after. So even though the code is at the first element, foreach already advanced the pointer to the second.

Now lets try a small modification:

$ref = &$array;
foreach ($array as $val) {
    var_dump(current($array));
}
/* Output: 2 3 4 5 false */

Here we have the is_ref=1 case, so the array is not copied (just like above). But now that it is_ref the array has no longer to be separated when passing to the by-ref current function. So now current and foreach work on the same array. You still see the off-by-one behavior though, due to the way foreach advances the pointer.

You get the same behavior when doing by-ref iteration:

foreach ($array as &$val) {
    var_dump(current($array));
}
/* Output: 2 3 4 5 false */

Here the important part is that foreach will make $array an is_ref=1 when it is iterated by reference, so basically you have the same situation as above.

Another small variation, this time we'll assign the array to another variable:

$foo = $array;
foreach ($array as $val) {
    var_dump(current($array));
}
/* Output: 1 1 1 1 1 */

Here the refcount of the $array is 2 when the loop is started, so for once we actually have to do the copy upfront. Thus $array and the array used by foreach will be completely separate from the starts. That's why you get the position of the internal array pointer wherever it was before the loop (in this case it was at the first position).

Iteration of objects

When an object is iterated there are two cases:

a) The object is not Traversable (or rather: Does not specify the internal get_iterator handler). In this case the iteration happens very similar to arrays. The same copying semantics apply. The only real difference is that foreach will run some additional code to skip properties that are not visible from the current scope. A few more random facts:

  • For declared properties PHP optimizes the property hashtable away. If you are iterating over an object though it has to reconstruct this hashtable (increasing memory usage). [Not that this should bother you, just a bit of trivia]

  • The hashtable for the properties is refetched on every iteration, i.e. PHP will call the get_properties handler again and again and again. For "normal" properties this makes little difference, but if the properties are dynamically created in the handler (this is something internal classes quite commonly do) then it means that the properties table will be recomputed every time.

b) The object is Traversable. In this case pretty much all what has been said above does not apply in any way. In this case PHP will not do copying and it will also not do any "advance pointer before loop already" tricks. I think that the iteration behavior on Traversables is a lot more predictable and doesn't really require explanation :)

Substituting the iterated entity during the loop

Another odd case that I didn't mention yet is that PHP allows you to substitute the iterated entity during the loop. So you can start iterating on one array and then replace it with another array halfway through. Or start iterating on an array and then replace it with an object:

$arr = [1, 2, 3, 4, 5];
$obj = (object) [6, 7, 8, 9, 10];

$ref =& $arr;
foreach ($ref as $val) {
    echo "$val\n";
    if ($val == 3) {
        $ref = $obj;
    }
}
/* Output: 1 2 3 6 7 8 9 10 */

As you can see in this case PHP will just start iterating the other entity from the start once the substitution has happened.

Changing the internal array pointer during iteration

One last detail of the foreach behavior that I did not yet mention (because it can be used to get really weird behavior) is what happens when you try to change the internal array pointer during iteration.

It may not do what you expect: When you call next or prev in the loop body (in the by-ref case) you can see that the internal array pointer is moved, but it will still not change the iteration behavior. The reason is that foreach will back up the current position and the hash of the current element into a HashPointer after every iteration. On the next iteration foreach will check whether the internal position changed and try to restore it back to the old element (based on the hash).

Let's see what "try" means with a few examples. First, here is an example that shows how a change of the internal pointer does not change the foreach behavior:

$array = [1, 2, 3, 4, 5];
$ref =& $array;
foreach ($array as $value) {
    var_dump($value);
    reset($array);
}
// output: 1, 2, 3, 4, 5

Now lets unset the element that foreach will be at on the first iteration (key 1):

$array = [1, 2, 3, 4, 5];
$ref =& $array;
foreach ($array as $value) {
    var_dump($value);
    unset($array[1]);
    reset($array);
}
// output: 1, 1, 3, 4, 5

You can see that the reset happened this time (double 1) because the element with the backed up hash was removed.

Now remember that a hash is just that: A hash. I.e. it has collisions. So, let's first try the following snippet:

$array = ['EzEz' => 1, 'EzFY' => 2, 'FYEz' => 3];
$ref =& $array;
foreach ($array as $value) {
    unset($array['EzFY']);
    $array['FYFZ'] = 4;
    reset($array);
    var_dump($value);
}
// output: 1 1 3 4

This behaves as expected. We removed the EzFY key (where foreach currently was), so the reset happens. Also we set an additional key, so the 4 is added at the end of the iteration.

But now comes the odd part. What happens if we set the FYFY key instead of FYFZ? Lets try:

$array = ['EzEz' => 1, 'EzFY' => 2, 'FYEz' => 3];
$ref =& $array;
foreach ($array as $value) {
    unset($array['EzFY']);
    $array['FYFY'] = 4;
    reset($array);
    var_dump($value);
}
// output: 1 4

Now the loop went directly to the new element, skipping everything else. The reason is that the FYFY key collides with EzFY (actually all keys in that array collide). Furthermore the Bucket for FYFY happens to be at the same memory address as the Bucket of EzFY that was just dropped. So for PHP it will look like it is still the same position, with the same hash. So the position is "restored" to it, thus jumping to the end of the array.

326
dbr

I can find lots of information on how Long Polling works (For example, this, and this), but no simple examples of how to implement this in code.

All I can find is cometd, which relies on the Dojo JS framework, and a fairly complex server system..

Basically, how would I use Apache to serve the requests, and how would I write a simple script (say, in PHP) which would "long-poll" the server for new messages?

The example doesn't have to be scaleable, secure or complete, it just needs to work!

Answered By: dbr ( 266)

It's simpler than I initially thought.. Basically you have a page that does nothing, until the data you want to send is available (say, a new message arrives).

Here is a really basic example, which sends a simple string after 2-10 seconds. 1 in 3 chance of returning an error 404 (to show error handling in the coming Javascript example)

msgsrv.php

<?php
if(rand(1,3) == 1){
    /* Fake an error */
    header("HTTP/1.0 404 Not Found");
    die();
}

/* Send a string after a random number of seconds (2-10) */
sleep(rand(2,10));
echo("Hi! Have a random number: " . rand(1,10));
?>

Note: With a real site, running this on a regular web-server like Apache will quickly tie up all the "worker threads" and leave it unable to respond to other requests.. There are ways around this, but it is recommended to write a "long-poll server" in something like Python's twisted, which does not rely on one thread per request. cometD is an popular one (which is available in several languages), and Tornado is a new framework made specifically for such tasks (it was built for FriendFeed's long-polling code)... but as a simple example, Apache is more than adequate! This script could easily be written in any language (I chose Apache/PHP as they are very common, and I happened to be running them locally)

Then, in Javascript, you request the above file (msg_srv.php), and wait for a response. When you get one, you act upon the data. Then you request the file and wait again, act upon the data (and repeat)

What follows is an example of such a page.. When the page is loaded, it sends the initial request for the msgsrv.php file.. If it succeeds, we append the message to the #messages div, then after 1 second we call the waitForMsg function again, which triggers the wait.

The 1 second setTimeout() is a really basic rate-limiter, it works fine without this, but if msgsrv.php always returns instantly (with a syntax error, for example) - you flood the browser and it can quickly freeze up. This would better be done checking if the file contains a valid JSON response, and/or keeping a running total of requests-per-minute/second, and pausing appropriately.

If the page errors, it appends the error to the #messages div, waits 15 seconds and then tries again (identical to how we wait 1 second after each message)

The nice thing about this approach is it is very resilient. If the clients internet connection dies, it will timeout, then try and reconnect - this is inherent in how long polling works, no complicated error-handling is required

Anyway, the long_poller.htm code, using the jQuery framework:

<html>
<head>
    <title>BargePoller</title>
    <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.2.6/jquery.min.js" type="text/javascript" charset="utf-8"></script>

    <style type="text/css" media="screen">
      body{ background:#000;color:#fff;font-size:.9em; }
      .msg{ background:#aaa;padding:.2em; border-bottom:1px #000 solid}
      .old{ background-color:#246499;}
      .new{ background-color:#3B9957;}
    .error{ background-color:#992E36;}
    </style>

    <script type="text/javascript" charset="utf-8">
    function addmsg(type, msg){
        /* Simple helper to add a div.
        type is the name of a CSS class (old/new/error).
        msg is the contents of the div */
        $("#messages").append(
            "<div class='msg "+ type +"'>"+ msg +"</div>"
        );
    }

    function waitForMsg(){
        /* This requests the url "msgsrv.php"
        When it complete (or errors)*/
        $.ajax({
            type: "GET",
            url: "msgsrv.php",

            async: true, /* If set to non-async, browser shows page as "Loading.."*/
            cache: false,
            timeout:50000, /* Timeout in ms */

            success: function(data){ /* called when request to barge.php completes */
                addmsg("new", data); /* Add response to a .msg div (with the "new" class)*/
                setTimeout(
                    waitForMsg, /* Request next message */
                    1000 /* ..after 1 seconds */
                );
            },
            error: function(XMLHttpRequest, textStatus, errorThrown){
                addmsg("error", textStatus + " (" + errorThrown + ")");
                setTimeout(
                    waitForMsg, /* Try again after.. */
                    15000); /* milliseconds (15seconds) */
            }
        });
    };

    $(document).ready(function(){
        waitForMsg(); /* Start the inital request */
    });
    </script>
</head>
<body>
    <div id="messages">
        <div class="msg old">
            BargePoll message requester!
        </div>
    </div>
</body>
</html>
312
luiscubal

It is currently said that MD5 is partially unsafe. Taking this into consideration, I'd like to know which mechanism to use for password protection.

Is “double hashing” a password less secure than just hashing it once? Suggests that hashing multiple times may be a good idea. How to implement password protection for individual files? Suggests using salt.

I'm using PHP. I want a safe and fast password encryption system. Hashing a password a million times may be safer, but also slower. How to achieve a good balance between speed and safety? Also, I'd prefer the result to have a constant number of characters.

  1. The hashing mechanism must be available in PHP
  2. It must be safe
  3. It can use salt (in this case, are all salts equally good? Is there any way to generate good salts?)

Also, should I store two fields in the database(one using MD5 and another one using SHA, for example)? Would it make it safer or unsafer?

In case I wasn't clear enough, I want to know which hashing function(s) to use and how to pick a good salt in order to have a safe and fast password protection mechanism.

EDIT: The website shouldn't contain anything too sensitive, but still I want it to be secure.

EDIT2: Thank you all for your replies, I'm using hash("sha256",$salt.":".$password.":".$id) (EDIT 3: I haven't used sha256 in some time now, see the accepted answer for a better mechanism)

Questions that didn't help: What's the difference between SHA and MD5 in PHP
Simple Password Encryption
Secure methods of storing keys, passwords for asp.net
How would you implement salted passwords in Tomcat 5.5

Answered By: Robert K ( 301)

TL;DR

Don'ts

  • Don't limit what characters users can enter for passwords. Only idiots do this.
  • Don't limit the length of a password. If your users want a sentence with supercalifragilisticexpialidocious in it, don't prevent them from using it.
  • Never store your user's password in plain-text.
  • Never email a password to your user except when they have lost theirs, and you sent a temporary one.
  • Never, ever log passwords in any manner.
  • Never hash passwords with SHA1 or MD5! Modern crackers can exceed 60 and 180 billion hashes/second (respectively).

Do's

  • Use scrypt when you can; bcrypt if you cannot.
  • Use PBKDF2 if you cannot use either bcrypt or scrypt, with SHA2 hashes.
  • Reset everyone's passwords when the database is compromised.
  • Implement a reasonable 8-10 character minimum length, plus require at least 1 upper case letter, 1 lower case letter, a number, and a symbol. This will improve the entropy of the password, in turn making it harder to crack. (See the "What makes a good password?" section for some debate.)

Why hash passwords anyway?

The objective behind hashing passwords is simple: preventing malicious access to user accounts by compromising the database. So the goal of password hashing is to deter a hacker or cracker by costing them too much time or money to calculate the plain-text passwords. And time/cost are the best deterrents in your arsenal.

Another reason that you want a good, robust hash on a user accounts is to give you enough time to change all the passwords in the system. If your database is compromised you will need enough time to at least lock the system down, if not change every password in the database.

Jeremiah Grossman, CTO of Whitehat Security, stated on his blog after a recent password recovery that required brute-force breaking of his password protection:

Interestingly, in living out this nightmare, I learned A LOT I didn’t know about password cracking, storage, and complexity. I’ve come to appreciate why password storage is ever so much more important than password complexity. If you don’t know how your password is stored, then all you really can depend upon is complexity. This might be common knowledge to password and crypto pros, but for the average InfoSec or Web Security expert, I highly doubt it.

(Emphasis mine.)

What makes a good password anyway?

Entropy. (Not that I fully subscribe to Randall's viewpoint.)

In short, entropy is how much variation is within the password. When a password is only lowercase roman letters, that's only 26 characters. That isn't much variation. Alpha-numeric passwords are better, with 36 characters. But allowing upper and lower case, with symbols, is roughly 96 characters. That's a lot better than just letters. One problem is, to make our passwords memorable we insert patterns—which reduces entropy. Oops!

Password entropy is approximated easily. Using the full range of ascii characters (roughly 96 typeable characters) yields an entropy of 6.6 per character, which at 8 characters for a password is still too low (52.679 bits of entropy) for future security. But the good news is: longer passwords, and passwords with unicode characters, really increase the entropy of a password and make it harder to crack.

There's a longer discussion of password entropy on the Crypto StackExchange site. A good Google search will also turn up a lot of results.

In the comments I talked with @popnoodles, who pointed out that enforcing a password policy of X length with X many letters, numbers, symbols, etc, can actually reduce entropy by making the password scheme more predictable. I do agree. Randomess, as truly random as possible, is always the safest but least memorable solution.

So far as I've been able to tell, making the world's best password is a Catch-22. Either its not memorable, too predictable, too short, too many unicode characters (hard to type on a Windows/Mobile device), too long, etc. No password is truly good enough for our purposes, so we must protect them as though they were in Fort Knox.

Best practices

Bcrypt and scrypt are the current best practices. Scrypt will be better than bcrypt in time, but it hasn't seen adoption as a standard by Linux/Unix or by webservers, and hasn't had in-depth reviews of its algorithm posted yet. But still, the future of the algorithm does look promising. If you are working with Ruby there is an scrypt gem that will help you out, and Node.js now has its own scrypt package.

I highly suggest reading the documentation for the crypt function if you want to roll your own use of bcrypt, or finding yourself a good wrapper or use something like PHPASS for a more legacy implementation. I recommend a minimum of 12 rounds of bcrypt, if not 15 to 18.

I changed my mind about using bcrypt when I learned that bcrypt only uses blowfish's key schedule, with a variable cost mechanism. The latter lets you increase the cost to brute-force a password by increasing blowfish's already expensive key schedule.

Average practices

I almost can't imagine this situation anymore. PHPASS supports PHP 3.0.18 through 5.3, so it is usable on almost every installation imaginable—and should be used if you don't know for certain that your environment supports bcrypt.

But suppose that you cannot use bcrypt or PHPASS at all. What then?

Try an implementation of PDKBF2 with the minimum number of rounds that your environment/application/user-perception can tolerate. The lowest number I'd recommend is 2500 rounds. Also, make sure to use hash_hmac() if it is available to make the operation harder to reproduce.

Future Practices

Coming in PHP 5.5 is a full password protection library that abstracts away any pains of working with bcrypt. While most of us are stuck with PHP 5.2 and 5.3 in most common environments, especially shared hosts, @ircmaxell has built a compatibility layer for the coming API that is backward compatible to PHP 5.3.7.

Cryptography Recap & Disclaimer

The computational power required to actually crack a hashed password doesn't exist. The only way for computers to "crack" a password is to recreate it and simulate the hashing algorithm used to secure it. The speed of the hash is linearly related to its ability to be brute-forced. Worse still, most hash algorithms can be easily parallelized to perform even faster. This is why costly schemes like bcrypt and scrypt are so important.

You cannot possibly foresee all threats or avenues of attack, and so you must make your best effort to protect your users up front. If you do not, then you might even miss the fact that you were attacked until it's too late... and you're liable. To avoid that situation, act paranoid to begin with. Attack your own software (internally) and attempt to steal user credentials, or modify other user's accounts or access their data. If you don't test the security of your system, then you cannot blame anyone but yourself.

Lastly: I am not a cryptographer. Whatever I've said is my opinion, but I happen to think it's based on good ol' common sense ... and lots of reading. Remember, be as paranoid as possible, make things as hard to intrude as possible, and then, if you are still worried, contact a white-hat hacker or cryptographer to see what they say about your code/system.

298
Henrik Paul

I know that PHP doesn't have native Enumerations. But I have become accustomed to them from the Java world. I would love to use enums as a way to give predefined values which IDEs' auto completion features could understand.

Constants do the trick, but there's the namespace collision problem and (or actually because) they're global. Arrays don't have the namespace problem, but they're too vague, they can be overwritten at runtime and IDEs rarely (never?) know how to autofill their keys.

Are there any solutions/workarounds you commonly use? Does anyone recall whether the PHP guys have had any thoughts or decisions around enums?

Answered By: Brian Cline ( 452)

I use something like the following:

class DaysOfWeek
{
    const Sunday = 0;
    const Monday = 1;
    // etc.
}

var $today = DaysOfWeek::Sunday;